WIP: Feature a/3295 option names
Input --header-sep CHAR=' ' separator for headers in the reads file -x, --first-reads INT maximal number of reads to process ('all': no limit, default), only first reads -X, --sampled-reads INT maximal number of reads to process ('all': no limit, default), sampled reads Germline presets (at least one -g or -V/(-D)/-J option must be given) -g, --germline GERMLINES ... -g <.g FILE>(:FILTER) multiple locus/germlines, with tuned parameters. Common values are '-g germline/homo-sapiens.g' or '-g germline/mus-musculus.g' The list of locus/recombinations can be restricted, such as in '-g germline/homo-sapiens.g:IGH,IGK,IGL' -g PATH multiple locus/germlines, shortcut for '-g PATH/homo-sapiens.g', processes human TRA, TRB, TRG, TRD, IGH, IGK and IGL locus, possibly with incomplete/unusal recombinations -V FILE ... custom V germline multi-fasta file(s) -D FILE ... custom D germline multi-fasta file(s), analyze into V(D)J components -J FILE ... custom V germline multi-fasta file(s) -2 try to detect unexpected recombinations Recombination detection ("window" prediction, first pass) (use either -s or -k option, but not both) (using -k option is equivalent to set with -s a contiguous seed with only '#' characters) (all these options, except -w, are overriden when using -g) -q use Aho-Corasick-like automaton (experimental) -k, --kmer-size INT k-mer size used for the V/J affectation (default: 10, 12, 13, depends on germline) -w, --window-size INT w-mer size used for the length of the extracted window ('all': use all the read, no window clustering) -e, --e-value FLOAT=1 maximal e-value for determining if a V-J segmentation can be trusted -t, --trim INT trim V and J genes (resp. 5' and 3' regions) to keep at most <INT> nt (0: no trim) -s, --seed SEED=10s seed, possibly spaced, used for the V/J affectation (default: depends on germline), given either explicitely or by an alias 10s:#####-##### 12s:######-###### 13s:#######-###### 9c:######### Recombination detection, experimental options (do not use) -I ignore k-mers common to different germline systems (experimental, do not use) -1 use a unique index for all germline systems (experimental, do not use) -4 try to detect unexpected recombinations with translocations (experimental, do not use) --not-analyzed-as-clones consider not analyzed reads as clones, taking for junction the complete sequence, to be used on very small datasets (for example --keep -AX 20) Labeled sequences (windows related to these sequences will be kept even if -r/--ratio thresholds are not reached) --label SEQUENCE ... label the given sequence(s) --label-from-file FILE label a set of sequences given in <file> --label-filter filter -- keep only the windows related to the labeled sequences Limits to report and to XXXfurther analyzeXXX clones (second pass) -r, --min-reads INT=5 minimal number of reads supporting a clone --min-ratio FLOAT=0 minimal percentage of reads supporting a clone --max-clones INT maximal number of output clones ('all': no maximum, default) -y, --max-clones-with-consensus INT=100 maximal number of clones computed with a consensus sequence ('all': no limit) -z, --max-clones-with-analysis INT=100 maximal number of clones to be analyzed with a full V(D)J designation ('all': no limit, do not use) --all reports and analyzes all clones (--min-reads 1 --min-ratio 0 --max-clones all --max-clones-with-consensus all --max-clones-with-analysis all), to be used only on very small datasets (for example --all -X 20) Clone analysis (second pass) --analysis-cost COST use custom Cost for clone analysis: format "match, subst, indels, del_end, homo" (default "4, -6, -10, -1, -10") -E, --analysis-e-value-D FLOAT=0.05 maximal e-value for determining if a D segment can be trusted --analysis-filter INT=3 typical number of V genes, filtered by k-mer comparison, to compare to the read ('all': all genes) -d, --several-D try to detect several D (experimental) -3, --cdr3 CDR3/JUNCTION detection (requires gapped V/J germlines) --alternative-genes INT=0 number of alternative V(D)J genes to show beyond the most similar one Additional clustering (third pass, experimental) --cluster-epsilon INT=0 minimum required neighbors for automatic clustering. No automatic clusterisation if =0. --cluster-N INT=10 minimum required neighbors for automatic clustering --cluster-save-matrix generate and save comparative matrix for clustering --cluster-load-matrix load comparative matrix for clustering --cluster-forced-edges FILE manual clustering -- a file used to force some specific edges --cluster-cost COST use custom Cost for automatic clustering : format "match, subst, indels, del_end, homo" (default "1, -4, -4, 0, 0") Detailed output per read (generally not recommended, large files, but may be used for filtering, as in -uu -X 1000) -U, --out-analyzed output analyzed reads (in .segmented.vdj.fa file) -u, --out-not-analyzed -u output not analyzed reads, gathered by unsegmentation cause, except for very short and 'too few V/J' reads (in *.fa files) -uu output not analyzed reads, gathered by unsegmentation cause, all reads (in *.fa files) (use only for debug) -uuu output not analyzed reads, all reads, including a .unsegmented.vdj.fa file (use only for debug) -K, --out-affects output detailed k-mer affectation on all reads (in .affects file) (use only for debug, for example -KX 100) Output -o, --out-dir PATH=./out/ output directory -b, --out-base STRING output basename (by default basename of the input file) --out-reads-by-clone output all reads by clones (clone.fa-*), to be used only on small datasets -v, --verbose verbose mode Help -h, --help help -H, --help-advanced help, including advanced and experimental options The full help is available in the doc/algo.org file.