Commit 3a257df2 authored by Mathieu Giraud's avatar Mathieu Giraud
Browse files

algo: -e, update help and tests

parent b9e60bce
......@@ -20,8 +20,8 @@ from subprocess import Popen, PIPE, STDOUT
import os
import argparse
VIDJIL_FINE = '{directory}/vidjil -p 1e-6 -c segment -i -g {directory}/germline %s > %s'
VIDJIL_KMER = '{directory}/vidjil -p 1e-6 -b out -c windows -uU -i -g {directory}/germline %s > /dev/null ; cat out/out.segmented.vdj.fa out/out.unsegmented.vdj.fa > %s'
VIDJIL_FINE = '{directory}/vidjil -e 1e-6 -c segment -i -g {directory}/germline %s > %s'
VIDJIL_KMER = '{directory}/vidjil -e 1e-6 -b out -c windows -uU -i -g {directory}/germline %s > /dev/null ; cat out/out.segmented.vdj.fa out/out.unsegmented.vdj.fa > %s'
parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument('--program', '-p', default=VIDJIL_FINE, help='program to launch on each file (%(default)s)')
......
......@@ -159,7 +159,7 @@ Window prediction
-m <int> minimal admissible delta between last V and first J k-mer (default: -10) (default with -D: 0)
-M <int> maximal admissible delta between last V and first J k-mer (default: 20) (default with -D: 80)
-w <int> w-mer size used for the length of the extracted window (default: 40)(default with -D: 60)
-p <float> e-value like for determining if a window must be segmented or no (default: 100)
-e <float> maximal e-value for determining if a segmentation can be trusted (default: 'all', no limit)
#+END_EXAMPLE
The =-s=, =-k=, =-m= and =-M= options are the options of the seed-based heuristic. A detailed
......@@ -179,12 +179,11 @@ Setting =-w= to 30 for VJ and 50 for VDJ recombinations may "segment" (analyze)
few more reads, but may in some rare cases falsely cluster reads from
different clones. Setting =-w= to lower values is not recommended.
The =-p= option is used to determine what is the maximal e-value accepted for
segmenting a sequence. If a segmentation has a higher e-value, it will not be
segmented. The default value is 100 but it is *not* the recommended
value. The purpose is to keep Vidjil with the same behaviour. The
recommendation would be to use a value of 1e-6 (or lower with huge datasets).
The =-e= option sets the maximal e-value accepted for segmenting a sequence.
If a segmentation has a higher e-value, it will not be segmented.
The default value is 'all', but the recommended value is to use something
like 1e-6 for datasets with a billion of reads.
Further help on this point will come in next releases.
** Threshold on clone output
......@@ -264,7 +263,7 @@ browser.
Setting the =-n= option triggers an additional automatic
clustering using DBSCAN algorithm (Ester and al., 1996).
The =-e= option allows to specify a file for manually clustering two windows
The =-E= option allows to specify a file for manually clustering two windows
considered as similar. Such a file may be automatically produced by vidjil
(out/edges), depending on the option provided. Only the two first columns
(separed by one space) are important to vidjil, they only consist of the
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment