browser.org 15.4 KB
Newer Older
Mathieu Giraud's avatar
Mathieu Giraud committed
1
#+TITLE: Vidjil -- Browser Manual
2
#+AUTHOR: The Vidjil team (Mathieu, Mikaël, Marc and Tatiana)
3
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="../css/org-mode.css" />
4

5 6 7 8 9 10
Vidjil is an open-source platform for the analysis of high-throughput sequencing data from lymphocytes.
[[http://en.wikipedia.org/wiki/V(D)J_recombination][V(D)J recombinations]] in lymphocytes are essential for immunological diversity.
They are also useful markers of pathologies, and in leukemia, are used to quantify the minimal residual disease during patient follow-up.
High-throughput sequencing (NGS/HTS) now enables the deep sequencing of a lymphoid population with dedicated [[http://omictools.com/rep-seq-c424-p1.html][Rep-Seq]] methods and software.

This is the help of the [[http://rbx.vidjil.org/browser/][Vidjil browser]].
Mikaël Salson's avatar
Mikaël Salson committed
11
Further help can always be asked to [[mailto:contact@vidjil.org][contact@vidjil.org]]. We can also arrange phone or Skype meeting.
12

13
The Vidjil team (Mathieu, Mikaël, Marc and Tatiana)
14

Mathieu Giraud's avatar
Mathieu Giraud committed
15
* Requirements
16

17 18
** Browser

19
The Vidjil browser runs in any modern browser. It has been successfully tested on the following platforms
20 21 22
 - Firefox version >= 32
 - Chrome version >= 38
 - IE version >= 10.0 (Vidjil will not run on IE 9.0 or below)
23
 - Opera version >= XX
24
 - Safari version >= 6.0
25

26
** The .vidjil files
27

28
The vidjil browser displays =.vidjil= files that summarize the V(D)J
29 30
rearrangements and the sequences found in a run. 

31 32 33
The easiest way to get these files is to [[http://rbx.vidjil.org/browser][request an account]] on the Vidjil server.
You will then be able to upload,
manage, process your runs (=.fasta=, =.fastq=, =.gz= or =.clntab= files) directly on the browser 
34
(see below 'patient database'), and the server behind the patient
35 36
database computes these =.vidjil= files.
Otherwise, such =.vidjil= files can be obtained:
37
 - from the command-line version of Vidjil (starting from
38 39 40 41 42
   =.fasta=, =.fastq= or =.gz= files, see [[http://git.vidjil.org/blob/master/doc/algo.org][algo.org]]).
   To gather several =.vidjil= files, you have to use the [[http://git.vidjil.org/blob/master/tools/fuse.py][fuse.py]] script
 - or by any other V(D)J analysis pipelines able to output files
   respecting the =.vidjil= [[format-analysis.org][file format]] (contact us if you are interested)

43 44


Mathieu Giraud's avatar
Mathieu Giraud committed
45
* First aid
46

47
- Open data by:
48
    - either with “patients”/“open patient”  if you are connected to a patient database, such as on http://rbx.vidjil.org/
49
      (in this case, there are always some "Demo" datasets for demonstration purposes),
50
    - or with “file”/“import/export”, manually selecting a =.vidjil= file
51

52 53 54
- You can change the number of displayed clones by moving the slider “number of clones” (menu “filter”).
  The maximal number of clones that can be displayed depends on the processing step before.
  See below ("Can I see all the clones ?").
55

56 57 58 59 60
- Clones can be selected by clicking on them either in the list, on the time graph,
  or the grid (simple selection or rectangle selection).

- There are often very similar clones, coming from either somatic hypermutations or from sequencing errors.
  You can select such clones (for example those sharing a same V and a same J), then:
Mathieu Giraud's avatar
Mathieu Giraud committed
61
   - inspect the sequences in the lower panel (possibly using the “align” function),
62 63
   - remove some of these sequences from the selection (clicking on their name in the lower panel)
   - merge them (button “merge”) in a unique clone.
64
     Once several clones are merged, you can still visualize them by clicking on “+” in the list of clones.
65

66 67 68
- Your analysis (clone tagging, renaming, merging) can be saved:
    - either with “patients”/“save analysis” if you are connected to a patient database
    - or with “file”/“export .analysis”
69 70 71

* The elements of the Vidjil browser

Mikaël Salson's avatar
Mikaël Salson committed
72 73
** The info panel (upper left panel)
   - analysis :: name of the configuration file used for displaying the data
Mathieu Giraud's avatar
Mathieu Giraud committed
74 75 76
   - locus :: germline used for analyzing the data. In case of multi-locus
               data, you can select what locus should be displayed.
   - sample :: name of the current point (you can change the selected point by clicking on
77 78
              another point in the graph). The name can be edited (“edit”).
   - date :: when the run was performed (edit either with “...”, or with the database, on the patient tab)
Mathieu Giraud's avatar
Mathieu Giraud committed
79
   - segmented :: number of reads where Vidjil found a CDR3, for that sample
Mathieu Giraud's avatar
Mathieu Giraud committed
80
                  See [[Number of segmented reads]] below.
Mikaël Salson's avatar
Mikaël Salson committed
81
   - total :: total number of reads for that point
82 83 84

** The list of clones (left panel)

85
- You can assign other tags with colors to clones using the “★” button.
86
  The “filter” menu allows to further filter clones by tags.
Mathieu Giraud's avatar
Mathieu Giraud committed
87
- Under the “★” button it is possible to normalize clone concentrations
Mikaël Salson's avatar
Mikaël Salson committed
88
  according to this clone. You must specify the expected concentration in the
Mathieu Giraud's avatar
Mathieu Giraud committed
89
  “expected size” field (e.g. 0.01 for 1%). See [[Control with standard/spike]] below.
90

Mathieu Giraud's avatar
Mathieu Giraud committed
91
- The “i” button displays additional information on each clone.
92

Mathieu Giraud's avatar
Mathieu Giraud committed
93
- The list can be sorted on V genes, J genes or clone abundance.
Mathieu Giraud's avatar
Mathieu Giraud committed
94
  The “+” and “-” allow respectively to un-merge or re-merge all clones that have
Mikaël Salson's avatar
Mikaël Salson committed
95
  already been merged.
96

97 98 99
- Clones can be searched (“search” box) by either their name, their custom name, 
  or their DNA sequence.

100 101 102 103 104
** The time graph

The time graph is hidden with there is only one timepoint.

- The current point is highlighted with a vertical gray bar, you can change that by clicking on another point.
105

Mathieu Giraud's avatar
Mathieu Giraud committed
106
- The gray areas at the bottom of the graph show, for each point, the resolution (1 read / 5 reads).
Mikaël Salson's avatar
Mikaël Salson committed
107

108
- You can reorder the points by dragging them, and hide some points by dragging them on the “+” mark at the right of the points.
Mikaël Salson's avatar
Mikaël Salson committed
109
  If you want to recover some hidden points, you need to drag them from the “+” mark to the graph.
110

111
- If your dataset contains sampling dates (for example in a MRD setup), you can switch between point keys and dates in “settings > point key”
112 113


Mathieu Giraud's avatar
Mathieu Giraud committed
114
** The plot view
115

Mathieu Giraud's avatar
Mathieu Giraud committed
116 117
- The "plot" menu allow to change the (grid plot, bar plot) as well as the X and Y axes of these plot
  Some presets are available.
118

Mathieu Giraud's avatar
Mathieu Giraud committed
119
- In the bar plot mode, the Y axis corresponds to the order of clones inside each bar.
120 121

- The “focus“ button (bottom right) allows to further analyze a selection of clones.
Mathieu Giraud's avatar
Mathieu Giraud committed
122
  To exit the focus mode, click on the “X” near the search box.
123
  
124 125 126
To further analyze a set of clones sharing a same V and J, it is often useful
to focus on the clones, then to display them ones according to either their “clone length”
or their “N length” (that is N1-D-N2 in the case of VDJ rearrangements)
127

Mikaël Salson's avatar
Mikaël Salson committed
128
** The aligner (bottom panel)
129

130 131 132 133 134 135
The aligner display nucleotide sequences from selected clones.
   - See "What is the sequence displayed for each clone ?" below
   - Sequences can be aligned together (“align” button), identifying substitutions, insertions and deletions.
   - You can remove sequences from the aligner (and the selection) by clicking on their name
   - You can further analyze the sequences with IMGT/V-QUEST and IgBlast on the selected sequences. This opens another window/tab.
   - You can unselect all sequences by clicking on the background of the grid.
136

137
** The patient database and the server
138

139 140 141
If a server with a patient database is configured with your
installation of Vidjil (as on http://rbx.vidjil.org/browser), the
'patient' menu gives you access to the server.
142

143
With authentication, you can add patients, then add either
144
=.fasta=, =.fastq=, =.gz= or =.clntab= files, then process your
145
runs and save the results of your analysis.
146 147 148 149 150 151 152

*** Patients
      
Once you are authentified, this page show the patient list. Here you
can see your patients and patients whose permission has been given to you.

New patients can be added ('add patient'), edited ('e') or deleted ('X').
153 154
By default, you are the only one who can see and update this new patient.
If you have an admin access, you can grant access to other users ('p').
155 156 157 158

*** Samples

Clicking on a patient give acccess the "samples" page. Each sample is
159
a =.fasta=, =.fastq=, =.gz= or =.clntab= file that will be processed by one or several
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174
pipelines.
You can see which samples have been processed with the selected
config, and download the sequence files if they are available ("dl").

Depending on your granted accesses, you can 
add a new sample to the list ("add file"), 
schedule a processing for a sequence file (select a config and "run"),
or delete a sample ("X").

The processing can take a few seconds to a few hours, depending on the
software lauched, its options and the size of the sample.
Once the processing is finished, click on the button "see result" and
the browser will load the data of the processed files. The first click
on this button can take a few seconds.

175 176 177 178 179 180


* Can I see all the clones ?


The interest of NGS/Rep-Seq studies is to provide a deep view of any
181
V(D)J repertoire. The underlying analysis softwares (such as Vidjil)
182 183 184 185 186 187 188 189
try to analyze as much reads as possible (see below 'Number of segmented reads').
One often wants to "see all clones", but a complete list is difficult
to see in itself. In a typical dataset with about 10^6 reads, even in
the presence of a dominant clone, there can be 10^4 or 10^5 different
clones detected.

** The "top" slider in the "filter" menu

190 191
The "top 50" clones are the clones that are in the first 50 ones
in *at least one* sample. As soon as one clone is in this "top 50"
192 193
list, it is displayed for every sample, even if its concentration is
very low in other samples.
194 195
Most of the time, a "top 50" is enough. The hidden clones are thus the
one that never reach the 50 first clones. With a default installation,
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
the slider can be set to display clones until the "top 100" on the grid 
(and, on the graph, until "top 20").

However, in some cames, one may want to track some clones that are
never in the "top 100", as for example:
  - a standard/spike with low concentration
  - a clone in a MRD following of a patient without the diagnostic point

(Upcoming feature). If clone is "tagged" in the =.vidjil= or
in the =.analysis= file, it will always be shown even if it does not
meet the "top" filter.

** The "other" clone

This virtual clone in the clone list groups all clones that are hidden
(because of the "top" or because of hiding some tags). The sum of
ratios in the list of clones is always 100%: thus the "other" clone
changes when one use the "filter" menu.

Note that the ratios include the "other" clone: if a clone principal
is reported to have 10.54%, this 10.54% ratio relates to the number of
analyzed reads, including the hidden clones.



221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239

* What is the sequence displayed for each clone ?

The sequences displayed for each clone are not individual reads.  
The clones may gather thousands of reads, and all these reads can have
some differences. Depending on the sequencing technology, the reads
inside a clone can have different lengths or can be shifted,
especially in the case of overlapping paired-end sequencing. There can be also
some sequencing errors.
The =.vidjil= file has to give one consensus sequence per clone, and
Rep-Seq algorithms have to deal with great care to these difference in
order to not gather reads from different clones.

For the Vidjil algorithm, it is required that the window centered on
the CDR3 is /exactly/ shared by all the reads. The other positions in
the consensus sequence are guaranteed to be present in /at least half/
of the reads. The consensus sequence can thus be shorter than some reads.


240
* How can I assess the quality of the data and the analysis ?
241

Mikaël Salson's avatar
Mikaël Salson committed
242
To make sure that the PCR, the sequencing and the Vidjil analysis went well, several elements can be controlled.
243

Mikaël Salson's avatar
Mikaël Salson committed
244
** Number of segmented reads
245 246
A first control is to check the number of “segmented reads” in the info panel (top left box).
For each point, this shows the number of reads where Vidjil found a CDR3.
247 248
     
Ratios above 90% usually mean very good results. Smaller ratios, especially under 60%, often mean that something went wrong.
249
The “info“ button further detail the causes of non-segmentation (UNSEG).
250 251
There can be several causes leading to bad ratios: 

252 253 254 255
*** Analysis or biological causes

   - The data actually contains other germline/locus that what was searched for
      (solution: relauch Vidjil, or ask that we relaunch Vidjil, with the correct germline sequences).
Mikaël Salson's avatar
Mikaël Salson committed
256

257 258
   - There are incomplete/exceptional recombinations
     (Vidjil can process some of them, config =multi+inc= or command-line option =-i=).
Mikaël Salson's avatar
Mikaël Salson committed
259

260 261
   - There are too many hypersomatic mutations
     (usually Vidjil can process mutations until 10% mutation rate... above that threshold, some sequences may be lost).
262

263 264
   - There are chimeric sequences or translocations
     (Vidjil does not process these sequences).
265

Mikaël Salson's avatar
Mikaël Salson committed
266
*** PCR or sequencing causes
267

268 269
   - the read length is too short, the reads do not span the junction zone (UNSEG too few V/J).
      (Vidjil detects a “window” including the CDR3. By default this window is 40–60bp long, so the read needs be that long centered on the junction).
270

Mikaël Salson's avatar
Mikaël Salson committed
271
   - In particular, for paired-end sequencing, one of the ends can lead to reads not fully containing the CDR3 region
272
      (solution: ignore this end, or extend the read length, or merge the ends with very conservative parameters).
273

Mikaël Salson's avatar
Mikaël Salson committed
274 275
   - There were too many PCR or sequencing errors
      (this can be asserted by inspecting the related clones, checking if there is a large dispersion around the main clone)
276

Mikaël Salson's avatar
Mikaël Salson committed
277
** Control with standard/spike
278

Mikaël Salson's avatar
Mikaël Salson committed
279
   - If your sample included a standard/spike control, you should first
Mathieu Giraud's avatar
Mathieu Giraud committed
280 281
     identify the main standard sequence (if that is not already done) and
     specify its expected concentration (by clicking on the “★” button).
Mikaël Salson's avatar
Mikaël Salson committed
282 283
     Then the data is normalized according to that sequence.
   - You can (de)activate normalization in the settings menu.
284

Mikaël Salson's avatar
Mikaël Salson committed
285 286 287 288 289 290 291
** Steadiness verification
   - When assessing different PCR primers, PCR enzymes, PCR cycles, one may want to see how regular the concentrations are among the points.
   - When following a patient one may want to identify any clone that is emerging.
   - To do so, you may want to change the color system, in the “color” menu
     select “by abundance at selected timepoint”.  The color ranges from red
     (high concentration) to purple (low concentration) and allows to easily
     spot on the graph any large change in concentration.
Mathieu Giraud's avatar
Mathieu Giraud committed
292

293 294


295 296 297 298 299 300
* Keyboard shortcuts

** Browser 

  | =←= and =→=             | navigate between samples                            |
  | =Shift-←= and =Shift-→= | decrease or increase the number of displayed clones |
Mathieu Giraud's avatar
Mathieu Giraud committed
301
  | numeric keypad, =0-9=   | switch between available plot presets               |
302 303 304 305 306


  | =a=: TRA        |                                    |
  | =b=: TRB        |                                    |
  | =g=: TRG        |                                    |
Mathieu Giraud's avatar
Mathieu Giraud committed
307
  | =d=: TRD, TRD+  | change the selected germline/locus |
308 309
  | =h=: IGH, IGH+  |                                    |
  | =l=: IGL        |                                    |
310
  | =k=: IGK, IGK+  |                                    |
311 312
  Note: You can select just one locus by holding the Shift key while pressing
  the letter corresponding to the locus of interest
313 314 315 316 317 318 319 320 321


** Browser connected to a patient databse

 | =Ctrl-s=  | save the analysis         |
 | =Shift-p= | open the 'patient' window |



322

323 324 325 326 327
* Reference

If you use Vidjil for your research, please cite the following reference:

Mathieu Giraud, Mikaël Salson, et al.,
328
“Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing”,
329 330 331
BMC Genomics 2014, 15:409 
http://dx.doi.org/10.1186/1471-2164-15-409

Mathieu Giraud's avatar
Mathieu Giraud committed
332