dev.org 33.5 KB
Newer Older
1
#+TITLE: Vidjil -- Developer Documentation
2
#+AUTHOR: The Vidjil team (Mathieu, Mikaël, Aurélien, Florian, Marc, Ryan and Tatiana)
3
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="org-mode.css" />
4 5 6 7 8 9

# This manual can be browsed online:
#     http://www.vidjil.org/doc/dev.html               (last stable release)
#     http://git.vidjil.org/blob/master/doc/dev.org    (development version)

# Vidjil -- High-throughput Analysis of V(D)J Immune Repertoire -- [[http://www.vidjil.org]]
10
# Copyright (C) 2011-2017 by Bonsai bioinformatics
11 12 13
# at CRIStAL (UMR CNRS 9189, Université Lille) and Inria Lille
# contact@vidjil.org

Mathieu Giraud's avatar
Mathieu Giraud committed
14 15
Here are aggregated notes forming the developer documentation on the Vidjil components, algorithm,
web application client and server.
Cyprien Borée's avatar
Cyprien Borée committed
16
This documentation is a work-in-progress, it is far from being as polished as the user documentation.
Mathieu Giraud's avatar
Mathieu Giraud committed
17 18
Help can also be found in the source code and in the commit messages.

19

20 21 22
* Algorithm
** Code organisation
   The algorithm follows roughly those steps:
Cyprien Borée's avatar
Cyprien Borée committed
23
   1. The germlines are read. Germlines are in the fasta format and are read
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
      by the Fasta class (=core/fasta.h=). Germlines are built using the
      Germline (or MultiGermline) class (=core/germline.h=)
   2. The input sequence file (.fasta, .fastq, .gz) is read by an OnlineFasta
      (=core/fasta.h=). The difference with the Fasta class being that all the
      data is not stored in memory but the file is read online, storing only
      the current entry.
   3. Windows must be extracted from the read, which is done by the
      WindowExtractor class (=core/windowExtractor.h=). This class has an
      =extract= method which returns a WindowsStorage object
      (=core/windows.h=) in which windows are stored.
   4. To save space consumption, all the reads linked to a given window are
      not stored. Only the longer ones are kept. The BinReadStorage class is
      used for that purpose (=core/read_storage.h=).
   5. In the WindowStorage, we now have the information on the clusters and on
      the abundance of each cluster. However we lack a sequence representative
      of the cluster. For that purpose the class provides a
      =getRepresentativeComputer= method that provides a
      KmerRepresentativeComputer (=core/representative.h=). This class can
      compute a representative sequence using the (long) reads that were
      stored for a given window.
   6. The representative can then be segmented to determine what V, D and J
      genes are at play. This is done by the FineSegmenter (=core/segment.h=).
46 47 48 49 50 51 52 53 54
** The xxx germline
   - All germlines are inserted in one index using =build_with_one_index()= and
     the segmentation method is set to =SEG_METHOD_MAX12= to tell that the
     segmentation must somehow differ.
   - So that the FineSegmenter correctly segments the sequence, the =rep_5= and
     =rep_3= members (class =Fasta=) of the xxx germline are modified by the
     FineSegmenter. The =override_rep5_rep3_from_labels()= method from the
     Germline is the one that overwrites those members with the Fasta
     corresponding to the affectation found by the KmerSegmenter.
55 56 57 58
* Browser

** Installation

59 60
Run a =make= into =browser/= to get the necessary files.
This will in particular get the germline files as well as the icon files.
61

62 63
Opening the =browser/index.html= file is then enough to get a functionning browser,
able to open =.vidjil= files with the =import/export= menu.
64 65 66

To work with actual data, the easiest way is to copy =js/conf.js.sample= to =js/conf.js=.
This will unlock the =patients= menu and allow your local browser
67
to access the public server at http://app.vidjil.org/.
68 69


70
** Client API and permanent URLs
71

72
The client can be opened on a data file specified from a =data= attribute,
73 74 75
and optionally on an analysis file specified from a =analysis= attribute,
as in the following URLs on our test server:

76 77 78
- http://app.vidjil.org/browser/?data=test.vidjil
- http://app.vidjil.org/browser/?data=test.vidjil&analysis=test.analysis
- http://app.vidjil.org/browser/?data=http://app.vidjil.org/browser/test.vidjil
79 80 81 82 83

Both GET and POST requests are accepted.
Note that the =browser/index.html= file and the =.vidjil/.analysis= files should be hosted on the same server.
Otherwise, the server hosting the =.vidjil/.analysis= files must accept cross-domain queries.

84
The client can also load data from a server (see below, requires logging), as in http://app.vidjil.org/?set=3241&config=39
85 86
| =set=xx=    | sample set id |
| =config=xx= | config id     |
87 88

Older formats (patients, run...) are also supported for compatibility but deprecated.
89
Moreover, the state of the client can be encoded in the URL, as in http://app.vidjil.org/?set=3241&config=39&plot=v,size,bar&clone=11,31
90

91 92 93 94
| =plot=x,y,m=     | plot (x axis, y axis) |
| =clone=xx,xx,xx= | selected clone ids    |

For =plot= the axis names are found in =browser/js/axes.js=. =m= is optional, and defines the type of plot (either =grid= or =bar=).
95 96 97 98

We intend to encode more parameters in the URL.


99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114


** Architecture

The Vidjil browser is a set of /views/ linked to a same /model/.
The model keeps the views in sync on some global properties,
most notably dealing with the selection of clones, with the clone filtering,
as well with the locus selection.

- The model (=js/model.js=) is the main object of the Vidjil browser.
  It loads and saves =.vidjil= json data (either directly from data, or from a local file, or from some url).
  It provides function to access and edit information on the clones and on the global parameters
  It keeps all the views in sync.

- Each of the views (=Graph=, =ScatterPlot=, =List=, =Segment=) is rendered inside one or several =<div>= elements,
  and kept sync with the model. All the views are optional, and several views of the same type can be added.
115
  See =js/main.js= for the invocation
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146

- The link with the patient database/server is done with the =Database= object (=js/database.js=)

- Other objects: =Report=, =Shortcut=
  Extends functionalities but requires elements from the full =index.html=.


** Integrating the browser

*** HTML and CSS
  - The =index.html= contains the =<div>= for all views and the menus
  - The CSS (=css/light.css=) is generated by =less= from =css/vidjil.less=

  - The =small_example.html= is a minimal example embedding basic HTML, CSS, as well as some data.
    As the menus are not embedded in this file, functionalities should be provided by direct calls to the models and the views.

*** Javascript
  - The wonderful library =require.js= is used, so there is only one file to include
    <script data-main="js/app.js" src="js/lib/require.js"></script>

  - =js/main.js= creates the different views and binds them to the model.
    Another option is to directly define a function named =main()=, as in =small_example.html=.

*** JSON .vidjil data

Clone lists can be passed to the model through several ways:
  - directly by the user (import/export)
  - from a patient database (needs a database)
  - trough the API (see below)
  - or by directly providing data through Javascript (as in =small_example.html=)

147
The first three solutions need some further elements from the full =index.html=.
148 149


150 151 152 153
** Notifications
*** Priority
#<<browser:priority>>
    The priority determines how the notification are shown and what action the
154
    user should do. The priorities can be between 0 and 3.
155 156 157 158 159 160 161
    - 0 :: The notification is not shown
    - 1 :: The notification is shown (usually on green background) and
         automatically disappears
    - 2 :: The notification is shown (usually on yellow background) and
         automatically disappears
    - 3 :: The notification is shown (usually on red background) and doesn't
         disappear until the user clicks on it.
162

163
    In the =console.log=, the field =priority= takes one of those priorities.
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182
** Plots
*** How to add something to be plotted
    You want to add a dimension in the scatterplot or as a color? Read the
    following.
**** Scatterplot
     In [[file:../browser/js/scatterPlot.js][scatterPlot.js]], the =available_axis= object defines the dimensions that
     can be displayed. It suffices to add an entry so that it will be proposed
     in the X and Y axis. This kind of way of doing should be generalized to
     the other components.

     The presets are defined in the =preset= object.
**** Color
     Adding a color needs slightly more work than adding a dimension in the
     scatterplot.

     The function =updateColor= in file [[file:../browser/js/clone.js][clone.js]] must be modified to add our color method.
     The variable =this.color= must contain a color (either in HTML or RGB, or…).

     Then a legend must be displayed to understand what the color represents.
183
     For this sake, modify the =build_info_color= method in [[file:../browser/js/info.js][info.js]] file. By
184 185 186 187 188
     default four spans are defined (that can be used) to display the legend:
     =span0=, ..., =span3=.

     Finally modify the [[file:../browser/index.html][index.html]] file to add the new color method in the
     select box (which is under the =color_menu= ID).
189 190 191 192 193 194 195
** Classes
*** Clone
**** Info box
     In the info box all the fields starting with a _ are put. Also all the
     fields under the =seg= field are displayed as soon as they have a =start= and
     =stop=. Some of them can be explicitly not displayed by filling the
     =exclude_seg_info= array in =getHtmlInfo=.
196

197 198
* Server
** Notifications
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214

The news system is a means of propagating messages to the users of a vidjil server installation.
Messages are propagated in near-realtime for users interacting directly with the server and at a slightly slower rate for users simply using the browser but for which the server is configured.

*** Message Retrieval
The browser by default periodically queries the server to retrieve any new messages and are displayed on a per user basis. This means that any message having already been viewed by the user is not displayed in the browser.
Older messages can be viewed from the index of news items.

*** Caching
News items are kept in cache in order to relieve the database from a potentially large amount of queries.
The cache is stored for each user and is updated only when a change occurs (message read, message created or message edited).

*** Formatting
   Messages can be formatted by using the Markdown syntax. Syntax details are
   available here: http://commonmark.org/help/

215 216 217
*** Priority
    The priority determines how the notification is shown (see [[browser:priority][here for more
    details]]). From the server we have two ways of modifiying the priority.
218
    Either by defining the =success= field to ='true'= or to ='false'=, or
219 220 221
    by explicitly specifying the priority in the field =priority=.

    For more details see 35054e4
222 223 224 225 226 227 228 229 230 231 232
** Getting data and analysis
   How the data files (.vidjil) and analysis files are retrieved from the server?
*** Retrieving the data file
    This is done in the =default.py= controller under the =get_data= function.
    However the .vidjil file is not provided as its exact copy on the
    server. Several informations coming from the DB are fed to the file
    (original filename, time stamps, information on each point, …)
*** Retrieving the analysis file
    This is done in the =default.py= controller under the =get_analysis= function.
    Actually the real work is done in the =analysis_file.py= model, in the
    =get_analysis_data= function.
233 234 235
** Permissions
   Permissions are handled by Web2py's authentication mechanism which is
   specialised to Vidjil's characteristics through the =VidjilAuth= class.
236 237 238 239
*** Login
**** Redirect after login
     The URL at which we access after login is defined in the controllers
     =sample_set/all= and in =default/home=.
240 241
** Database
*** Export
242
    mysqldump -u <user> -p <database> -c --no-create-info > <file>
243
*** Import
244
    In order to import the data from another server, you need to ensure
245 246 247 248 249 250 251 252 253 254 255
    there will be no key collision, or the import will fail.
    If the database contains data, the easiest is to drop the database and
    create a new empty database.
    This will require you to delete the .table file in web2py/applications/vidjil/databases
    In order to create the tables you should then load a page from the
    webapp, but DO NOT init the database, because this will raise the problem
    of colliding primary keys again.

    Then run:
    mysql -u <user> -p <database> < file

256 257 258 259 260 261 262 263 264 265
*** VidjilAuth
   One VidjilAuth is launched for a given user when a controller is called.
   During that call, we cache as much as possible the calls to the DB.  For
   doing so the =get_permission= method is defined (overriding the native
   =has_permission=). It calls the native =has_permission= only when that call
   hasn't already been done (this is particularly useful for DB intensive
   queries, such as the compare patients).

   Also some user characteristics are preloaded (groups and whether the person
   is an admin), which also prevents may DB calls.
266
* Tests
267 268 269 270 271 272 273
** Algorithm
*** Unit
    Unit tests are managed using an internal lightweight poorly-designed
    library that outputs a TAP file. They are organised in the directory
    [[../algo/tests][algo/tests]].

    All the tests are defined in the [[../algo/tests/tests.cpp][tests.cpp]] file. But, for the sake of
274 275
    clarity, this file includes other =cpp= files that incorporate all the
    tests. A call to =make= compiles and launches the =tests.cpp= file, which
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
    outputs a TAP file (in case of total success) and creates a =tests.cpp.tap=
    file (in every case).
**** Tap test library
     The library is defined in the [[../algo/tests/testing.h][testing.h]] file.

     Tests must be declared in the [[../algo/tests/tests.h][tests.h]] file:
     1. Define a new macro (in the enum) corresponding to the test name
     2. In =declare_tests()= use =RECORD_TAP_TEST= to associate the macro with a
        description (that will be displayed in the TAP output file).

     Then testing can be done using the =TAP_TEST= macro. The macro takes three
     arguments. The first one is a boolean that is supposed to be true, the
     second is the test name (using the macro defined in =tests.h=) and the
     third one (which can be an empty string) is something which is displayed
     when the test fails.


293
** Browser
294 295 296
*** Code Quality
    Quality of code is checked using [[http://jshint.com/][JSHint]], by
    running =make quality= from the =browser= directory.
Mathieu Giraud's avatar
Mathieu Giraud committed
297 298 299

    Install with =npm install -g jshint=

300 301
*** Unit
    The unit tests in the browser are managed by QUnit and launched using
302
    [[http://www.nightmarejs.org/][nightmare]], by launching =make unit= from the =browser/test= directory.
303
    The tests are organised in the directory
304 305 306
    [[../browser/test/QUnit/testFiles][browser/test/QUnit/testFiles]]. The file [[../browser/test/QUnit/testFiles/data_test.js][data_test.js]] contains a toy
    dataset that is used in the tests.

307
    Unit tests can be launched using a real browser (instead of nightmare). It
308 309 310 311 312 313
    suffices to open the file [[../browser/test/QUnit/test_Qunit.html][test_Qunit.html]]. In this HTML webpage it is
    possible to see the coverage. It is important that all possible functions
    are covered by unit tests. Having the coverage displayed under Firefox
    needs to display the webpage using a web server for security
    reasons. Under Chromium/Chrome this should work fine by just opening the
    webpage.
314 315 316 317 318 319

**** Installation
     Nightmare is distributed withing =node= and can therefore be installed with it.

     #+BEGIN_SRC sh
     apt-get install nodejs-legacy npm
320 321
     npm install nightmare -g # make -C browser/test unit will automatically
     link to global nightmare installation
322 323 324 325
     #+END_SRC

     Note that using =nightmare= for our unit testing
     requires the installation of =xvfb=.
326 327 328 329 330 331 332 333 334 335 336

**** Debugging
     If there is a problem with the nightmare or electron (nightmare
     dependency), you may encounter a lack of output or error messages.
     To address this issue, run:

     #+BEGIN_SRC sh
     cd browser/test/QUnit
     DEBUG=nightmare*,electron:* node nightmare.js
     #+END_SRC

337 338
*** Functional

339 340
**** Architecture
    The browser functional testing is done in the directory
341
    =browser/tests/functional=, with Watir.
342 343 344 345 346 347 348 349
    The functional tests are built using two base files:
    - vidjil_browser.rb :: abstracts the vidjil browser (avoid using IDs or
         class names that could change in the test). The tests must rely as
         much as possible on vidjil_browser. If access to some
         data/input/menus are missing they must be addded there.
    - browser_test.rb :: prepares the environment for the tests. Each test
         file will extend this class (as can be seen in test_multilocus.rb)

350 351 352
    The file =segmenter_test.rb= extends the class in =browser_test.rb= to adapt
    it to the purpose of testing the analyze autonomous app.

353
    The tests are in the files whose name matches the pattern =test*.rb=. The
354
    tests are launched by the script in =../launch_functional_tests= which launches
355 356
    all the files matching the previous pattern. It also backs up the test
    reports as =ci_reporter= removes them before each file is run.
357 358 359

**** Installation

Mathieu Giraud's avatar
Mathieu Giraud committed
360 361 362
The following instructions are for Ubuntu.
For OS X, see https://github.com/watir/watirbook/blob/master/manuscript/installation/mac.md.

363 364 365 366 367 368 369 370 371 372 373
***** Install rvm

  #+BEGIN_SRC sh
 \curl -sSL https://get.rvm.io | bash
  #+END_SRC

  Afterwards you may need to launch:
  #+BEGIN_SRC sh
  source /etc/profile.d/rvm.sh
  #+END_SRC

Mikaël Salson's avatar
Mikaël Salson committed
374
***** Install ruby 2.6.1
375 376

#+BEGIN_SRC sh
Mikaël Salson's avatar
Mikaël Salson committed
377
rvm install 2.6.1
378 379 380
#+END_SRC


Mikaël Salson's avatar
Mikaël Salson committed
381
***** Switch to ruby 2.6.1
382 383

#+BEGIN_SRC sh
Mikaël Salson's avatar
Mikaël Salson committed
384
rvm use 2.6.1
385 386 387 388 389 390
#+END_SRC


***** Install necessary gems

#+BEGIN_SRC sh
Mikaël Salson's avatar
Mikaël Salson committed
391
gem install minitest minitest-ci watir test-unit
392 393
#+END_SRC

Mikaël Salson's avatar
Mikaël Salson committed
394
The Firefox version used can be set with an environment variable (see
395
below). All Firefox releases are [[https://download-installer.cdn.mozilla.net/pub/firefox/releases/ ][available here]].
396 397 398 399 400 401 402

**** Launch browser tests

#+BEGIN_SRC sh
make functional
#+END_SRC

403 404 405 406 407 408 409 410
    By default the tests are launched on the Firefox installed on the system.
    This can be modified by providing the =FUNCTIONAL_CLIENT_BROWSER_PATH=
    environment variable (which can contain several pathes, separated with
    spaces) to the =launch_functional_tests= script.  Or, if one wants to launch
    individual test scripts, to set the =WATIR_BROWSER_PATH= environment
    variable.


411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429
**** Headless mode

   On servers without a X server the browser tests can be launched in headless
   mode.
   For this sake one needs to install a few more dependencies:

   #+BEGIN_SRC sh
   gem install headless
   #+END_SRC

   The virtual framebuffer X server (=xvfb=) must also be installed. Depending
   on the operating system the command will be different:
   #+BEGIN_SRC sh
   # On Debian/Ubuntu
   apt-get install xvfb
   # On Fedora/CentOS
   yum install xvfb
   #+END_SRC

430 431 432 433 434 435 436 437
   If you have set a configuration file (browser/js/conf.js), you should remove it during the headless testing. The easiest way to do it is to launch these commands before and after the test
   #+BEGIN_SRC sh
   # before tests
   mv browser/js/conf.js     browser/js/conf.js.bak 
   # after tests
   mv browser/js/conf.js.bak browser/js/conf.js    
   #+END_SRC
   
438 439 440 441 442
   Then the browser tests can be launched in headless mode with:
   #+BEGIN_SRC sh
   make headless
   #+END_SRC

443 444 445 446
   It is possible to view the framebuffer content of =Xvfb= using =vnc=. To do so,
   launch:
   1. =x11vnc -display :99 -localhost=
   2. =vncviewer :0=
447 448
**** Interactive mode
     For debugging purposes, it may be useful to launch Watir in interactive
449 450
     mode. In that case, you should launch =irb= in the =browser/tests/functional=
     directory.
451 452 453 454 455 456 457 458 459 460 461 462 463

     Then load the file =browser_test.rb= and create a =BrowserTest=:
     #+BEGIN_SRC ruby
       load 'browser_test.rb'
       bt = BrowserTest.new "toto"

       # Load the Vidjil browser with the given .vidjil file
       bt.set_browser("/doc/analysis-example.vidjil")
     #+END_SRC

     Finally you can directly interact with the =VidjilBrowser= using the =$b=
     variable.

464 465 466 467 468 469 470 471 472 473 474
     Another way of debugging interactively is by using (and installing) the
     =ripl= gem. Then you should add, in the =.rb= file to debug:
     #+BEGIN_SRC ruby
     require 'ripl'
     #+END_SRC
     Then if you want to stop launch an =irb= arrived at a given point in the
     code, the following command must be inserted in the code:
     #+BEGIN_SRC ruby
     Ripl.start :binding => binding
     #+END_SRC

475

476 477
     If you have to launch =irb= on a remote server without X (only using =Xvfb=)
     you may be interested to use the [[https://en.wikipedia.org/wiki/Xvfb#Remote_control_over_SSH][redirection over SSH]].
478 479
* Packaging

480 481 482 483 484 485 486 487 488 489 490 491 492 493 494
** Script driven building
   In order to make packaging Vidjil simple and facilitate releases scripts
   have been made and all meta data files required for the Debian packages
   can be found in the packaging directory in each package's subdirectory.

   In the packaging directory can be found the scripts for building each of
   the vidjil packages: germline, algo (named vidjil) and server.
   Note: build-generic.sh is a helper script that is used by the other
   build-* scripts to build a package.

   Executing one of the scripts will copy the necessary files to the
   corresponding packaging subdirectory (germline, vidjil and server)
   And build the package in the /tmp folder along with all the files needed
   to add the package to a repository

495 496 497 498
   It is worth noting that while all packages can be built directly from the
   project sources, the algorithm is actually built from the releases found
   at http://www.vidjil.org/releases.

499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517
** Packaging Vidjil into a Debian Binary Package
  In this section we will explain how to package a pre-compiled version of
  Vidjil that will allow easy installation although it will not meet all the
  requirements for a full Debian package and therefore cannot be added to the
  default Debian repositories.

  In this document we will not go over the fine details of debian packaging
  and the use of each file. For more information you can refer to this page
  from which this document was inspired:
  http://www.tldp.org/HOWTO/html_single/Debian-Binary-Package-Building-HOWTO/

  Being a binary package it will simply contain the vidjil binary which will
  be copied to the chosen location on installation.

*** Let's Get Started
   You will first and foremost need to compile vidjil. Refer to #TODO for
   more information.

   Create a base directory for the package and the folders to which the binary
518
   will be installed. Let's call our folder debian and copy the binary to /usr/bin/
519

520 521 522
   #+BEGIN_SRC sh
     mkdir -p debian/usr/bin
   #+END_SRC
523 524 525

   And copy the vidjil binary

526 527 528
   #+BEGIN_SRC sh
     cp vidjil debian/usr/bin
   #+END_SRC
529 530 531

   Now create the necessary control file. It should look something like this:

532 533 534 535 536 537 538 539 540 541 542 543
   #+BEGIN_EXAMPLE
     Package: vidjil
     Version: <version> (ie. 2016.03-1)
     Section: misc
     Priority: optional
     Architecture: all
     Depends: bash (>= 2.05a-11)
     Maintainer: Vidjil Team <team@vidjil.org>
     Description: Count lymphocyte clones
     vidjil parses a fasta or fastq file and produces an output with a list
     of clones and meta-data concerning these clones
   #+END_EXAMPLE
544 545 546

   And place it in the correct folder.

547 548 549 550
   #+BEGIN_SRC sh
     mkdir -p debian/DEBIAN
     cp control debian/DEBIAN/
   #+END_SRC
551 552 553

   Now build the package and rename it.

554 555 556 557
   #+BEGIN_SRC sh
     dpkg-deb --build debian
     mv debian.deb vidjil_<version>_all.deb
   #+END_SRC
558 559 560

   It can be installed but running

561 562 563
   #+BEGIN_SRC sh
     sudo dpkg -i vidjil_<version>_all.deb
   #+END_SRC
564 565 566 567 568 569 570 571 572

   # TODO Add Changelog, copyright, etc.


** Packaging Vidjil into a Debian Source Package

  Note: This document is currently incomplete. This process will not produce a
  working debian package. The package build will fail when attempting to
  emulate `make install`
573 574 575 576 577 578 579 580 581 582

*** Requirements
   - The release version of Vidjil you wish to package
   - Knowledge of Debian packaging
   In this documentation we will not go over all the specifics of creating a
   debian package. You can find the required information here:
   https://wiki.debian.org/HowToPackageForDebian
   and https://wiki.debian.org/Packaging/Intro?action=show&redirect=IntroDebianPackaging

*** Creating the orig archive
583
    In order to build a debian package, it is required to have a folder named
584
    debian with several files required for the package which contain meta
585 586 587 588 589
    data and permit users to have information on packages and updates for
    packages.

    In order to generate this folder run the following from the source base
    directory.
590 591 592
    #+BEGIN_SRC sh
      dh_make -n
    #+END_SRC
593

594 595
    You can remove all files from the debian folder that match the patterns *.ex, *.EX and README*

596 597 598 599 600 601
    Update debian/changelog, debian/control and debian/copyright to contain the correct
    information to reflect the most recent changes and metadata of Vidjil.

    Vidjil has no install rule so we need to use a debian packaging feature.
    Create a file named debian/install with the following line:

602 603 604
    #+BEGIN_EXAMPLE
      vidjil usr/bin/
    #+END_EXAMPLE
605 606 607

    Vidjil currently depends on some unpackaged files that need to be
    downloaded before compiling.
608 609 610 611 612 613

    #+BEGIN_SRC sh
      mkdir browser
      make germline
      make data
    #+END_SRC
614 615 616 617 618

    Debian packaging also requires archives of the original source. This is
    to manage people packaging software they haven't developed with changes
    they have made. To make things simpler, we simply package the current
    source as the reference archive and build the package with the script
619 620
    that can be obtained here: https://people.debian.org/~wijnen/mkdeb (Thanks
    to Bas Wijnen <wijnen@debian.org> for this script)
621 622 623 624

    From the source directory, run that script to create the package.

    You're done! You can now install the debian package with:
625 626 627 628

    #+BEGIN_SRC sh
      sudo dpkg -i path/to/package
    #+END_SRC
629 630

* Docker
631 632 633 634 635 636 637 638 639 640 641
 The vidjil Docker environment is managed by Docker Compose since it is
 composed of several different services this allows us to easily start and
 stop individual services.
 The services are as follows:
   - mysql        The database
   - uwsgi        The Web2py backend server
   - fuse         The XmlRPCServer that handles custom fuses (for comparing
     samples)
   - nginx        The web server
   - workers      The Web2py Scheduler workers in charge of executing vidjil
     users' samples
642
   - backup       Starts a cron job to schedule regular backups
643 644
   - reporter     A monitoring utility that can be configured to send
     monitoring information to a remote server
645

646 647
 For more information about Docker Compose and how to install it check out
 https://docs.docker.com/compose/
648

649 650 651 652 653
** Starting the environment
   Ensure your docker-compose.yml contains the correct reference to the
   vidjil image you want to use. Usually this will be vidjil/vidjil:latest,
   but more tags are available at https://hub.docker.com/r/vidjil/vidjil/tags/.

654 655 656 657
   You may also want to uncomment the volume in the fuse volume block "-
   ./vidjil/conf:/etc/vidjil" this will provide easier access to all of the
   configuration files, allowing for tweaks.

658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675
   Running the following command will automatically download any missing
   images and start the environment:

   #+BEGIN_SRC sh
     docker-compose up
   #+END_SRC

   If you are using the backup and reporter images, then you need to first
   build these from the image you are using by running the following:

   #+BEGIN_SRC sh
     docker-compose up --build
   #+END_SRC

   This will also start the environment for you.

** Building images for DockerHub
   Make sure your Dockerfile is up to date with any changes you may want to
676 677 678 679 680 681 682 683
   make to the containers. The Dockerfile accepts some build arguments:
   - build-env: TEST or PRODUCTION. If unspecified, PRODUCTION is assumed.
     The main difference is that TEST will build the image with an HTTP
     configuration whereas PRODUCTION uses HTTPS.
   - git_repo : The repository to build the image from. By default, our main
     repository is assumed.
   - git_branch : The git branch to clone from the repository. By default:
     dev.
684

685
   #+BEGIN_SRC sh
686 687
     docker build --build-arg build_env=PRODUCTION --build-arg git_branch=<my_feature_branch> docker/vidjil-client -t vidjil/client:<version>
     docker build --build-arg build_env=PRODUCTION --build-arg git_branch=<my_feature_branch> docker/vidjil-server -t vidjil/server:<version>
688
   #+END_SRC
689

690
   Tag the image you have just built:
691

692
   #+BEGIN_SRC sh
693 694
     docker tag vidjil:test vidjil/client:latest
     docker tag vidjil:test vidjil/server:latest
695
   #+END_SRC
696

697
   Push the image to DockerHub:
698

699
   #+BEGIN_SRC sh
700 701
     docker push vidjil/client:<tag>
     docker push vidjil/server:<tag>
702
   #+END_SRC
703

704 705 706
   You may be required to log in, in which case you can consult
   https://docs.docker.com/engine/reference/commandline/login/ for more
   information.
707

708 709 710 711 712 713 714
   If you encounter an issue where docker is unable to access
   archive.ubuntu.org then you may need to add your dns to /etc/docker/daemon.json
   #+BEGIN_SRC json
      {
          "dns":["dns1", "dns2"]
      }

715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852
* Migrating Data
** Database
   The easiest way to perform a database migration is to first extract the
   data with the following command:

   $ mysqldump -u <user> -p <db> -c --no-create-info > <file>

   An important element to note here is the --no-create-info we add this
   parameter because web2py needs to be allowed to create tables itself
   because it keeps track of database migrations and errors will occur if
   tables exist which it considers it needs to create.

   In order to import the data into an installation you first need to ensure
   the tables have been created by Web2py this can be achieved by simply
   accessing a non-static page.

   /!\ If the database has been initialised from the interface you will
   likely encounter primary key collisions or duplicated data, so it is best
   to skip the initialisation altogether.

   Once the tables have been created, the data can be imported as follows:

   $ mysql -u <user> -p <db> < <file>

   Please note that with this method you should have at least one admin user
   that is accessible in the imported data. Since the initialisation is being
   skipped, you will not have the usual admin account present.
   It is also possible to create a user directly from the database although
   this is not the recommended course of action.

** Files
   Files can simply be copied over to the new installation, their filenames
   are stored in the database and should therefore be accessible as long as
   they are in the correct directories.

** Filtering data
   When extracting data for a given user, the whole database should not be
   copied over.
   There are two courses of action:
     - create a copy of the existing database and remove the users that are
       irrelevant. The cascading delete should remove any unwanted data
       barring a few exceptions (notably fused_file, groups and sample_set_membership)

     - export the relevant data directly from the database. This method
       requires multiple queries which will not be detailed here.

  Once the database has been correctly extracted, a list of files can be
  obtained from sequence_file, fused_file, results_file and analysis_file
  with the following query:

  #+BEGIN_SRC sql
    SELECT <filename field>
    FROM <table name>
    INTO OUTFILE 'filepath'
    FIELDS TERMINATED BY ','
    ENCLOSED BY ''
    LINES TERMINATED BY '\n'
  #+END_SRC

  Note: We are managing filenames here which should not contain any
  character such as quotes or commas so we can afford to refrain from
  enclosing the data with quotes.

  This query will output a csv file containing a filename on each line.
  Copying the files is now just a matter of running the following script:

** Exporting sample sets
   The migrator script allows the export and import of data, whether it be a
   single patient/run/set or a list of them, or even all the sample sets
   associated to a group.

   #+BEGIN_EXAMPLE
    usage: migrator.py [-h] [-f FILENAME] [--debug] {export,import} ...

    Export and import data

    positional arguments:
    {export,import}  Select operation mode
      export         Export data from the DB into a JSON file
      import         Import data from JSON into the DB

    optional arguments:
      -h, --help       show this help message and exit
      -f FILENAME      Select the file to be read or written to
      --debug          Output debug information
   #+END_EXAMPLE

   Export:
   #+BEGIN_EXAMPLE
    usage: migrator.py export [-h] {sample_set,group} ...

    positional arguments:
      {sample_set,group}  Select data selection method
        sample_set        Export data by sample-set ids
        group             Extract data by groupid

    optional arguments:
      -h, --help          show this help message and exit
   #+END_EXAMPLE

   #+BEGIN_EXAMPLE
    usage: migrator.py export sample_set [-h] {patient,run,generic} ID [ID
    ...]

    positional arguments:
      {patient,run,generic}
                              Type of sample
        ID                    Ids of sample sets to be extracted

      optional arguments:
        -h, --help            show this help message and exit
   #+END_EXAMPLE

   #+BEGIN_EXAMPLE
    usage: migrator.py export group [-h] groupid

    positional arguments:
      groupid     The long ID of the group

    optional arguments:
      -h, --help  show this help message and exit
   #+END_EXAMPLE

   Import:
   #+BEGIN_EXAMPLE
    usage: migrator.py import [-h] [--dry-run] [--config CONFIG] groupid

    positional arguments:
      groupid     The long ID of the group

    optional arguments:
      -h, --help  show this help message and exit
      --dry-run   With a dry run, the data will not be saved to the database
      --config CONFIG  Select the config mapping file
   #+END_EXAMPLE
  #+BEGIN_SRC sh
    sh copy_files <file source> <file destination> <input file>
  #+END_SRC