Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
vidjil
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
1,688
Issues
1,688
List
Boards
Labels
Service Desk
Milestones
Merge Requests
84
Merge Requests
84
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
vidjil
vidjil
Commits
4d3f4d95
Commit
4d3f4d95
authored
Mar 08, 2019
by
Mathieu Giraud
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
doc/vidjil-algo.md: update again
parent
27686080
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
12 deletions
+12
-12
doc/vidjil-algo.md
doc/vidjil-algo.md
+12
-12
No files found.
doc/vidjil-algo.md
View file @
4d3f4d95
...
...
@@ -379,7 +379,7 @@ Limits to further analyze some clones (second pass)
The
`-r/--ratio`
options are strong thresholds: if a clone does not have
the requested number of reads, the clone is discarded (except when
using
`-l`
, see below).
using
`-
-labe
l`
, see below).
The default
`-r 5`
option is meant to only output clones that
have a significant read support.
**You should use**
`-r 1`
**
if you
want to detect all clones starting from the first read
**
(especially for
...
...
@@ -415,20 +415,20 @@ The `-A` option disables all these thresholds. This option should be
used only for test and debug purposes, on very small datasets, and
produce large file and takes huge computation times.
The
`-
Z
`
option speeds up the full analysis by a pre-processing step,
The
`-
-analysis-filter
`
option speeds up the full analysis by a pre-processing step,
again based on k-mers, to select a subset of the V germline genes to be compared to the read.
The option gives the typical size of this subset (it can be larger when several V germlines
genes are very similar, or smaller when there are not enough V germline genes).
The default
`-
Z
3`
is generally safe.
Setting
`-
Z
all`
removes this pre-processing step, running a full dynamic programming
The default
`-
-analysis-filter
3`
is generally safe.
Setting
`-
-analysis-filter
all`
removes this pre-processing step, running a full dynamic programming
with all germline sequences that is much slower.
## Sequences of interest
Vidjil-algo allows to indicate that specific sequences should be followed and output,
even if those sequences are 'rare' (below the
`-r/--ratio`
thresholds).
Such sequences can be provided either with
`-
W <sequence>`
, or with
`-l
<file>`
.
The file given by
`-
l
`
should have one sequence by line, as in the following example:
Such sequences can be provided either with
`-
-label <sequence>`
, or with
`--label-file
<file>`
.
The file given by
`-
-label-file
`
should have one sequence by line, as in the following example:
```
diff
GAGAGATGGACGGGATACGTAAAACGACATATGGTTCGGGGTTTGGTGCT my-clone-1
...
...
@@ -440,7 +440,7 @@ The first column of the file is the sequence to be followed
while the remaining columns consist of the sequence's label.
In Vidjil-algo output, the labels are output alongside their sequences.
A sequence given
`-
W <sequence>`
or with
`-l
<file>`
can be exactly the size
A sequence given
`-
-label <sequence>`
or with
`-label-file
<file>`
can be exactly the size
of the window (
`-w`
, that is 50 by default). In this case, it is guaranteed that
such a window will be output if it is detected in the reads.
More generally, when the provided sequence differs in length with the windows
...
...
@@ -449,9 +449,9 @@ we will keep any window that is contained in the sequence of interest.
This filtering will work as expected when the provided sequence overlaps
(at least partially) the CDR3 or its close neighborhood.
With the
`-
F
`
option,
*only*
the windows related to the given sequences are kept.
With the
`-
-label-filter
`
option,
*only*
the windows related to the given sequences are kept.
This allows to quickly filter a set of reads, looking for a known sequence or window,
with the
`-
FaW
<sequence>`
options:
with the
`-
-out-reads --label-filter --label
<sequence>`
options:
All the reads with the windows related to the sequence will be extracted to
`out/seq/clone.fa-1`
.
## Clone analysis: VDJ assignation and CDR3 detection
...
...
@@ -465,7 +465,7 @@ The CDR3/JUNCTION detection won't work with custom non-gapped V/J repertoires.
CDR3 are reported as productive when they come from an in-frame recombination
and when the sequence does not contain any in-frame stop codons.
The advanced
`-
f
`
option sets the parameters used in the comparisons between
The advanced
`-
-analysis-cost
`
option sets the parameters used in the comparisons between
the clone sequence and the V(D)J germline genes. The default values should work.
The e-value set by
`-e`
is also applied to the V/J designation.
...
...
@@ -478,12 +478,12 @@ nor on the standard output. They instead add a `clusters` sections in the `.vidj
that will be visualized in the web application.
The
`--cluster-epsilon`
option triggers an automatic clustering using DBSCAN algorithm (Ester and al., 1996).
Using
`--cluster-epsilon 5`
usually cluster reads within a distance of 1 mismatch (default score
Using
`--cluster-epsilon 5`
usually cluster
s
reads within a distance of 1 mismatch (default score
being +1 for a match and -4 for a mismatch). However, more distant reads can also
be clustered when there are more than 10 reads within the distance threshold.
This behaviour can be controlled with the
`-cluster-N`
option.
The
`-
=
`
option allows to specify a file for manually clustering two windows
The
`-
-cluster-forced-edges
`
option allows to specify a file for manually clustering two windows
considered as similar. Such a file may be automatically produced by vidjil
(
`out/edges`
), depending on the option provided. Only the two first columns
(separed by one space) are important to vidjil, they only consist of the
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment