Commit d9b2caca authored by Mathieu Giraud's avatar Mathieu Giraud

doc/algo.org, algo/vidjil.cpp: update documentation on V(D)J designation

Since the 2015.07 release, many changes improved the V(D)J designation (core/segment.cpp,
in particular c5c781ec, 77af56b8, 185c3bd9, 60a15629), and this is now better tested
(.should-vdj.fa tests, with new curated sequences and an improved test mechanism).
parent 60a15629
......@@ -149,7 +149,7 @@ void usage(char *progname, bool advanced)
<< " -c <command>"
<< "\t" << COMMAND_CLONES << " \t locus detection, window extraction, clone gathering (default command, most efficient, all outputs)" << endl
<< " \t\t" << COMMAND_WINDOWS << " \t locus detection, window extraction" << endl
<< " \t\t" << COMMAND_SEGMENT << " \t detailed V(D)J segmentation (not recommended)" << endl
<< " \t\t" << COMMAND_SEGMENT << " \t detailed V(D)J designation (not recommended)" << endl
<< " \t\t" << COMMAND_GERMLINES << " \t statistics on k-mers in different germlines" << endl
<< endl ;
......@@ -207,7 +207,7 @@ void usage(char *progname, bool advanced)
<< "Limits to further analyze some clones" << endl
<< " -y <nb> maximal number of clones computed with a representative ('" << NO_LIMIT << "': no limit) (default: " << DEFAULT_MAX_REPRESENTATIVES << ")" << endl
<< " -z <nb> maximal number of clones to be segmented ('" << NO_LIMIT << "': no limit, do not use) (default: " << DEFAULT_MAX_CLONES << ")" << endl
<< " -z <nb> maximal number of clones to be analyzed with a full V(D)J designation ('" << NO_LIMIT << "': no limit, do not use) (default: " << DEFAULT_MAX_CLONES << ")" << endl
<< " -A reports and segments all clones (-r 0 -% 0 -y " << NO_LIMIT << " -z " << NO_LIMIT << "), to be used only on very small datasets (for example -AX 20)" << endl
<< " -x <nb> maximal number of reads to process ('" << NO_LIMIT << "': no limit, default), only first reads" << endl
<< " -X <nb> maximal number of reads to process ('" << NO_LIMIT << "': no limit, default), sampled reads" << endl
......@@ -795,8 +795,7 @@ int main (int argc, char **argv)
{
cout << "* Vidjil purpose is to extract very quickly windows overlapping the CDR3" << endl
<< "* and to gather reads into clones (-c clones), and not to provide an accurate V(D)J segmentation." << endl
<< "* The following segmentations are slow to compute and are provided only for convenience." << endl
<< "* They should be checked with other softwares such as IgBlast, iHHMune-align or IMGT/V-QUEST." << endl
<< "* The full V(D)J designations are slow to compute and are provided only for convenience." << endl
<< "* More information is provided in the 'doc/algo.org' file." << endl
<< endl ;
}
......
......@@ -255,7 +255,7 @@ Limits to report a clone (or a window)
Limits to further analyze some clones
-y <nb> maximal number of clones computed with a representative ('all': no limit) (default: 100)
-z <nb> maximal number of clones to be segmented ('all': no limit, do not use) (default: 20)
-z <nb> maximal number of clones to be analyzed with a full V(D)J designation ('all': no limit, do not use) (default: 20)
-A reports and segments all clones (-r 1 -% 0 -y all -z all), to be used only on very small datasets
#+END_EXAMPLE
......@@ -357,7 +357,7 @@ The main output of Vidjil (with the default =-c clones= command) are two followi
- The =.vidjil= file is /the file for the Vidjil browser/.
The file is in a =.json= format (detailed in [[file:format-analysis.org][format-analysis.org]])
describing the windows and their count, the representatives (=-y=),
the detailed segmentation (=-z=, see warning below), and possibly
the detailed V(D)J designation (=-z=, see warning below), and possibly
the results of the further clustering.
The browser takes this =.vidjil= file (possibly merged with
......@@ -370,7 +370,7 @@ The main output of Vidjil (with the default =-c clones= command) are two followi
The sequences are at least the windows (and their count in the headers) or
the representatives (=-y=) when they have been computed.
The headers include the count of each window, and further includes the
detailed segmentation (=-z=, see warning below), given in a '.vdj' format, see below.
detailed V(D)J designation (=-z=, see warning below), given in a '.vdj' format, see below.
The further clustering is not output in this file.
The =.vdj.fa= output enables to use Vidjil as a /filtering tool/,
......@@ -432,8 +432,8 @@ in the following situations:
- in a first pass, when requested with =-U= option, in a =.segmented.vdj.fa= file.
The goal of this ultra-fast segmentation, based on a seed
heuristics, is only to locate the w-window overlapping the
CDR3. This should not be taken as a real V(D)J segmentation, as
heuristics, is only to identify the locus and to locate the w-window overlapping the
CDR3. This should not be taken as a real V(D)J designation, as
the center of the window may be shifted up to 15 bases from the
actual center.
......@@ -441,13 +441,15 @@ in the following situations:
- at the end of the clones detection (default command =-c clones=)
- or directly when explicitly requiring segmentation (=-c segment=)
This segmentation obtained by full comparison (dynamic
programming) with all germline sequences. Such segmentation are
not at the core of the Vidjil clone gathering method (which
relies only on the 'window', see above). They are slow to compute
and are provided only for convenience.
They should be checked with other softwares such
as IgBlast, iHHMune-align or IMGT/V-QUEST.
These V(D)J designations are obtained by full comparison (dynamic programming)
with all germline sequences.
Note that these designations are relatively slow to compute. However, they
are not at the core of the Vidjil clone gathering method (which
relies only on the 'window', see above).
To check the quality of these designations, the automated test suite include
sequences with manually curated V(D)J designations (see [[http://git.vidjil.org/blob/master/doc/should-vdj.org][should-vdj.org]]).
Segmentations of V(D)J recombinations are displayed using a dedicated
.vdj format. This format is compatible with FASTA format. A line starting
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment