Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
vidjil
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
1,712
Issues
1,712
List
Boards
Labels
Service Desk
Milestones
Merge Requests
87
Merge Requests
87
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
vidjil
vidjil
Commits
74dccbd7
Commit
74dccbd7
authored
Mar 14, 2017
by
Mathieu Giraud
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
vidjil.cpp, doc/algo.org: rewording, use 'cluster'
parent
7c6a54d2
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
13 deletions
+13
-13
algo/vidjil.cpp
algo/vidjil.cpp
+2
-2
doc/algo.org
doc/algo.org
+11
-11
No files found.
algo/vidjil.cpp
View file @
74dccbd7
...
...
@@ -147,7 +147,7 @@ void usage(char *progname, bool advanced)
cerr
<<
"Command selection"
<<
endl
<<
" -c <command>"
<<
"
\t
"
<<
COMMAND_CLONES
<<
"
\t
locus detection, window extraction, clone
gath
ering (default command, most efficient, all outputs)"
<<
endl
<<
"
\t
"
<<
COMMAND_CLONES
<<
"
\t
locus detection, window extraction, clone
clust
ering (default command, most efficient, all outputs)"
<<
endl
<<
"
\t\t
"
<<
COMMAND_WINDOWS
<<
"
\t
locus detection, window extraction"
<<
endl
<<
"
\t\t
"
<<
COMMAND_SEGMENT
<<
"
\t
detailed V(D)J designation (not recommended)"
<<
endl
<<
"
\t\t
"
<<
COMMAND_GERMLINES
<<
"
\t
statistics on k-mers in different germlines"
<<
endl
...
...
@@ -800,7 +800,7 @@ int main (int argc, char **argv)
if
(
max_clones
==
NO_LIMIT_VALUE
||
max_clones
>
WARN_MAX_CLONES
||
command
==
CMD_SEGMENT
)
{
cout
<<
"* Vidjil's purpose is to efficiently extract windows overlapping the CDR3"
<<
endl
<<
"* to
gath
er reads into clones ('-c clones')."
<<
endl
<<
"* to
clust
er reads into clones ('-c clones')."
<<
endl
<<
"* Computing accurate V(D)J designations for many sequences ('-c segment' or large '-z' values)"
<<
endl
<<
"* is slow and should be done only on small datasets or for testing purposes."
<<
endl
<<
"* More information is provided in the 'doc/algo.org' file."
<<
endl
...
...
doc/algo.org
View file @
74dccbd7
...
...
@@ -284,8 +284,8 @@ explanation can be found in (Giraud, Salson and al., 2014).
The =-s= or =-k= option selects the seed used for the k-mer V/J affectation.
The =-w= option fixes the size of the "window" that is the main
identifier to
gath
er clones. The default value (=-w 50=) was selected
to ensure a high-quality clone
gath
ering: reads are clustered when
identifier to
clust
er clones. The default value (=-w 50=) was selected
to ensure a high-quality clone
clust
ering: reads are clustered when
they /exactly/ share, at the nucleotide level, a 50 bp-window centered
on the CDR3. No sequencing errors are corrected inside this window.
The center of the "window", predicted by the high-throughput heuristic, may
...
...
@@ -295,7 +295,7 @@ in >99% of cases). The extracted window should be large enough to
fully contain the CDR3 as well as some part of the end of the V and
the start of the J, or at least some specific N region, to uniquely identify a clone.
Setting =-w= to higher values (such as =-w 60= or =-w 100=) makes the clone
gath
ering
Setting =-w= to higher values (such as =-w 60= or =-w 100=) makes the clone
clust
ering
even more conservative, enabling to split clones with low specificity (such as IGH with very
large D, short or no N regions and almost no somatic hypermutations). However, such settings
may "segment" (analyze) less reads, depending on the read length of your data, and may also
...
...
@@ -533,7 +533,7 @@ Several [[https://en.wikipedia.org/wiki/Diversity_index][diversity indices]] are
- E (=index_E_equitability=): Shannon's equitability
- Ds (=index_Ds_diversity=): Simpson's diversity
E ans Ds values are between 0 (no diversity, one clone
gath
ers all analyzed reads)
E ans Ds values are between 0 (no diversity, one clone
clust
ers all analyzed reads)
and 1 (full diversity, each analyzed read belongs to a different clone).
These values are now computed on the windows, before any further clustering.
PCR and sequencing errors can thus lead to slighlty over-estimate the diversity.
...
...
@@ -620,7 +620,7 @@ in the following situations:
Note that these designations are relatively slow to compute, especially
for the IGH locus. However, they
are not at the core of the Vidjil clone
gath
ering method (which
are not at the core of the Vidjil clone
clust
ering method (which
relies only on the 'window', see above).
To check the quality of these designations, the automated test suite include
sequences with manually curated V(D)J designations (see [[http://git.vidjil.org/blob/master/doc/should-vdj.org][should-vdj.org]]).
...
...
@@ -685,7 +685,7 @@ require either the =-G germline/IGH= option, or the multi-germline =-g germline=
#+BEGIN_SRC sh
./vidjil -G germline/IGH -3 data/Stanford_S22.fasta
#
Gath
er the reads into clones, based on windows overlapping IGH CDR3s.
#
Clust
er the reads into clones, based on windows overlapping IGH CDR3s.
# Assign the VDJ genes and try to detect the CDR3 of each clone.
# Summary of clones is available both on stdout, in out/Stanford_S22.vdj.fa and in out/Stanford_S22.vidjil.
#+END_SRC
...
...
@@ -693,7 +693,7 @@ require either the =-G germline/IGH= option, or the multi-germline =-g germline=
#+BEGIN_SRC sh
./vidjil -g germline -i -2 -3 -d data/reads.fasta
# Detects for each read the best locus, including an analysis of incomplete/unusual and unexpected recombinations
#
Gath
er the reads into clones, again based on windows overlapping the detected CDR3s.
#
Clust
er the reads into clones, again based on windows overlapping the detected CDR3s.
# Assign the VDJ genes (including multiple D) and try to detect the CDR3 of each clone.
# Summary of clones is available both on stdout, in out/reads.vdj.fa and in out/reads.vidjil.
#+END_SRC
...
...
@@ -704,7 +704,7 @@ require either the =-G germline/IGH= option, or the multi-germline =-g germline=
#+BEGIN_SRC sh
./vidjil -g germline -i -2 -U data/reads.fasta
# Detects for each read the best locus, including an analysis of incomplete/unusual and unexpected recombinations
#
Gath
er the reads into clones, again based on windows overlapping the detected CDR3s.
#
Clust
er the reads into clones, again based on windows overlapping the detected CDR3s.
# Assign the VDJ genes and try to detect the CDR3 of each clone.
# The out/reads.segmented.vdj.fa include all reads where a V(D)J recombination was found
#+END_SRC
...
...
@@ -720,12 +720,12 @@ This file will be relatively small (a few kB or MB) and can be taken again as an
#+BEGIN_SRC sh
./vidjil -c clones -G germline/IGH -r 1 ./data/clones_simul.fa
# Extracts the windows with at least 1 read each (-r 1, the default being -r 5)
# then
gath
er them into clones
# then
clust
er them into clones
#+END_SRC
#+BEGIN_SRC sh
./vidjil -c clones -G germline/IGH -r 1 -n 5 ./data/clones_simul.fa
# Window extraction + clone
gath
ering,
# Window extraction + clone
clust
ering,
# with automatic clustering, distance five (-n 5)
# The result of the automatic clustering is in the .vidjil file
# and can been seen/edited in the web application.
...
...
@@ -733,7 +733,7 @@ This file will be relatively small (a few kB or MB) and can be taken again as an
#+BEGIN_SRC sh
./vidjil -c segment -g germline -i -2 -3 -d data/segment_S22.fa
# Detailed V(D)J designation, including multiple D, and CDR3 detection on all reads, without clone
gath
ering
# Detailed V(D)J designation, including multiple D, and CDR3 detection on all reads, without clone
clust
ering
# (this is slow and should only be used for testing, or on a small file)
#+END_SRC
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment