Commit 65cae517 authored by Mathieu Giraud's avatar Mathieu Giraud

doc/vidjil-algo.md, doc/user.md: update links, update options...

parent 8c778582
...@@ -6,7 +6,7 @@ They are also useful markers of pathologies, and in leukemia, are used to quanti ...@@ -6,7 +6,7 @@ They are also useful markers of pathologies, and in leukemia, are used to quanti
High-throughput sequencing (NGS/HTS) now enables the deep sequencing of a lymphoid population with dedicated [Rep-Seq](http://omictools.com/rep-seq-c424-p1.html) methods and software. High-throughput sequencing (NGS/HTS) now enables the deep sequencing of a lymphoid population with dedicated [Rep-Seq](http://omictools.com/rep-seq-c424-p1.html) methods and software.
This is the help of the [Vidjil web application](http://app.vidjil.org/browser/). This is the help of the [Vidjil web application](http://app.vidjil.org/browser/).
Further help can always be asked to <support@vidjil.org>. We can also arrange phone or Skype meeting. Further help can always be asked to <support@vidjil.org>. We can also arrange phone or video meeting.
The Vidjil team (Mathieu, Mikaël, Aurélien, Florian, Marc, Ryan and Tatiana) The Vidjil team (Mathieu, Mikaël, Aurélien, Florian, Marc, Ryan and Tatiana)
...@@ -52,7 +52,7 @@ Otherwise, such `.vidjil` files can be obtained: ...@@ -52,7 +52,7 @@ Otherwise, such `.vidjil` files can be obtained:
- You can change the number of displayed clones by moving the slider “number of clones” (menu “filter”). - You can change the number of displayed clones by moving the slider “number of clones” (menu “filter”).
The maximal number of clones that can be displayed depends on the processing step before. The maximal number of clones that can be displayed depends on the processing step before.
See below "[Can I see all the clones ?](#can-i-see-all-the-clones)". See below "[Can I see all the clones ?](#can-i-see-all-the-clones-and-all-the-reads)".
- Clones can be selected by clicking on them either in the list, on the sample graph, - Clones can be selected by clicking on them either in the list, on the sample graph,
or the grid (simple selection or rectangle selection). or the grid (simple selection or rectangle selection).
......
...@@ -11,8 +11,8 @@ ...@@ -11,8 +11,8 @@
This is the help of vidjil-algo, for command-line usage. This is the help of vidjil-algo, for command-line usage.
This manual can be browsed online: This manual can be browsed online:
- <http://www.vidjil.org/doc/algo> (last stable release) - <http://www.vidjil.org/doc/vidjil-algo> (last stable release)
- <http://git.vidjil.org/blob/master/doc/algo.md> (development version) - <http://gitlab.vidjil.org/blob/dev/doc/vidjil-algo.md> (development version)
Other documentation (users and administrators of the web application, developpers) can be found from <http://www.vidjil.org/doc/>. Other documentation (users and administrators of the web application, developpers) can be found from <http://www.vidjil.org/doc/>.
...@@ -36,19 +36,17 @@ clones, or leave this to the user after a manual review in the web application. ...@@ -36,19 +36,17 @@ clones, or leave this to the user after a manual review in the web application.
The method is described in the following references: The method is described in the following references:
Marc Duez et al., - Marc Duez et al.,
“Vidjil: A web platform for analysis of high-throughput repertoire sequencing”, “Vidjil: A web platform for analysis of high-throughput repertoire sequencing”,
PLOS ONE 2016, 11(11):e0166126 PLOS ONE 2016, 11(11):e0166126
<http://dx.doi.org/10.1371/journal.pone.0166126> <http://dx.doi.org/10.1371/journal.pone.0166126>
Mathieu Giraud, Mikaël Salson, et al., - Mathieu Giraud, Mikaël Salson, et al.,
"Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing", "Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing",
BMC Genomics 2014, 15:409 BMC Genomics 2014, 15:409
<http://dx.doi.org/10.1186/1471-2164-15-409> <http://dx.doi.org/10.1186/1471-2164-15-409>
Vidjil-algo is open-source, released under GNU GPLv3 license. Vidjil-algo is open-source, released under GNU GPLv3 license.
This is the help of vidjil-algo, for command-line usage.
Other documentation (users and administrators of the web application, developpers) can be found from <http://www.vidjil.org/doc/>.
# Requirements and installation # Requirements and installation
...@@ -74,9 +72,10 @@ The development team internally uses [Gitlab CI](http://gitlab.vidjil.org/pipeli ...@@ -74,9 +72,10 @@ The development team internally uses [Gitlab CI](http://gitlab.vidjil.org/pipeli
## Build requirements (optional) ## Build requirements (optional)
This paragraph details the requirements to build Vidjil-algo from source. This paragraph details the requirements to build Vidjil-algo from source.
You can also download a static binary (see next paragraph, 'Installation'). You can also download a static binary, see [installation](#installation).
To compile Vidjil-algo, make sure: To compile Vidjil-algo, make sure:
- to be on a POSIX system ; - to be on a POSIX system ;
- to have a C++11 compiler (as `g++` 4.8 or above, or `clang` 3.3 or above). - to have a C++11 compiler (as `g++` 4.8 or above, or `clang` 3.3 or above).
- to have the `zlib` installed (`zlib1g-dev` package under Debian/Ubuntu, - to have the `zlib` installed (`zlib1g-dev` package under Debian/Ubuntu,
...@@ -166,8 +165,8 @@ Xcode should be installed first. ...@@ -166,8 +165,8 @@ Xcode should be installed first.
### Compiling ### Compiling
Running 'make' from the extracted archive should be enough to install vidjil-algo with germline and demo files. Running `make` from the extracted archive should be enough to install vidjil-algo with germline and demo files.
It runs the three following 'make' commands. It runs the three following `make` commands.
``` bash ``` bash
...@@ -364,7 +363,7 @@ Limits to further analyze some clones (second pass) ...@@ -364,7 +363,7 @@ Limits to further analyze some clones (second pass)
-X INT maximal number of reads to process ('all': no limit, default), sampled reads -X INT maximal number of reads to process ('all': no limit, default), sampled reads
``` ```
The `-r/-%` options are strong thresholds: if a clone does not have The `-r/--ratio` options are strong thresholds: if a clone does not have
the requested number of reads, the clone is discarded (except when the requested number of reads, the clone is discarded (except when
using `-l`, see below). using `-l`, see below).
The default `-r 5` option is meant to only output clones that The default `-r 5` option is meant to only output clones that
...@@ -372,6 +371,8 @@ have a significant read support. **You should use** `-r 1` **if you ...@@ -372,6 +371,8 @@ have a significant read support. **You should use** `-r 1` **if you
want to detect all clones starting from the first read** (especially for want to detect all clones starting from the first read** (especially for
MRD detection). MRD detection).
The `--max-clones` option limits the number of output clones, even without consensus sequences.
The `-y` option limits the number of clones for which a consensus The `-y` option limits the number of clones for which a consensus
sequence is computed. Usually you do not need to have more sequence is computed. Usually you do not need to have more
consensus (see below), but you can safely put `-y all` if you want consensus (see below), but you can safely put `-y all` if you want
...@@ -385,11 +386,10 @@ to display the clones on the grid (otherwise they are displayed on the ...@@ -385,11 +386,10 @@ to display the clones on the grid (otherwise they are displayed on the
If you want to analyze more clones, you should use `-z 200` or If you want to analyze more clones, you should use `-z 200` or
`-z 500`. It is not recommended to use larger values: outputting more `-z 500`. It is not recommended to use larger values: outputting more
than 500 clones is often not useful since they can not be visualized easily than 500 clones is often not useful since they can not be visualized easily
in the web application, and takes large computation time (full dynamic programming in the web application, and takes more computation time.
with all germline sequences), possibly reduced when using `-Z` (see below).
Note that even if a clone is not in the top 100 (or 200, or 500) but Note that even if a clone is not in the top 100 (or 200, or 500) but
still passes the `-r`, `-%` options, it is still reported in both the `.vidjil` still passes the `-r`, `--ratio` options, it is still reported in both the `.vidjil`
and `.vdj.fa` files. If the clone is at some MRD point in the top 100 (or 200, or 500), and `.vdj.fa` files. If the clone is at some MRD point in the top 100 (or 200, or 500),
it will be fully analyzed/segmented by this other point (and then it will be fully analyzed/segmented by this other point (and then
collected by the `fuse.py` script, using consensus sequences computed at this collected by the `fuse.py` script, using consensus sequences computed at this
...@@ -401,17 +401,18 @@ The `-A` option disables all these thresholds. This option should be ...@@ -401,17 +401,18 @@ The `-A` option disables all these thresholds. This option should be
used only for test and debug purposes, on very small datasets, and used only for test and debug purposes, on very small datasets, and
produce large file and takes huge computation times. produce large file and takes huge computation times.
The experimental `-Z` option speeds up the full analysis by a pre-processing step, The `-Z` option speeds up the full analysis by a pre-processing step,
again based on k-mers, to select a subset of the V germline genes to be compared to the read. again based on k-mers, to select a subset of the V germline genes to be compared to the read.
The option gives the typical size of this subset (it can be larger when several V germlines The option gives the typical size of this subset (it can be larger when several V germlines
genes are very similar, or smaller when there are not enough V germline genes). genes are very similar, or smaller when there are not enough V germline genes).
Setting `-Z 5` is generally safe. With the default option, `-Z all`, this The default `-Z 3` is generally safe.
pre-processing step is not activated. Setting `-Z all` removes this pre-processing step, running a full dynamic programming
with all germline sequences that is much slower.
## Sequences of interest ## Sequences of interest
Vidjil-algo allows to indicate that specific sequences should be followed and output, Vidjil-algo allows to indicate that specific sequences should be followed and output,
even if those sequences are 'rare' (below the `-r/-%` thresholds). even if those sequences are 'rare' (below the `-r/--ratio` thresholds).
Such sequences can be provided either with `-W <sequence>`, or with `-l <file>`. Such sequences can be provided either with `-W <sequence>`, or with `-l <file>`.
The file given by `-l` should have one sequence by line, as in the following example: The file given by `-l` should have one sequence by line, as in the following example:
...@@ -489,7 +490,7 @@ The main output of Vidjil-algo (with the default `-c clones` command) are two fo ...@@ -489,7 +490,7 @@ The main output of Vidjil-algo (with the default `-c clones` command) are two fo
The web application takes this `.vidjil` file ([possibly merged with `fuse.py`](#following-clones-in-several-samples)) for the *visualization and analysis* of clones and their The web application takes this `.vidjil` file ([possibly merged with `fuse.py`](#following-clones-in-several-samples)) for the *visualization and analysis* of clones and their
tracking along different samples (for example time points in a MRD tracking along different samples (for example time points in a MRD
setup or in a immunological study). setup or in a immunological study).
Please see the [br](browser.org).org for more information on the web application. Please see the [user manual](user.md) for more information on the web application.
- The `.vdj.fa` file is *a FASTA file for further processing by other bioinformatics tools*. - The `.vdj.fa` file is *a FASTA file for further processing by other bioinformatics tools*.
The sequences are at least the windows (and their count in the headers) or The sequences are at least the windows (and their count in the headers) or
...@@ -587,7 +588,7 @@ Some datasets may give reads with many low `UNSEG too few` reads: ...@@ -587,7 +588,7 @@ Some datasets may give reads with many low `UNSEG too few` reads:
Vidjil-algo detects a “window” including the CDR3. By default this window is 50bp long, Vidjil-algo detects a “window” including the CDR3. By default this window is 50bp long,
so the read needs be that long centered on the junction. so the read needs be that long centered on the junction.
See [browser.org](http://git.vidjil.org/blob/master/doc/browser.org) for information on the biological or sequencing causes that can lead to few segmented reads. See the [user manual](user.md) for information on the biological or sequencing causes that can lead to few segmented reads.
## Filtering reads ## Filtering reads
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment