browser.org 16.6 KB
Newer Older
Mathieu Giraud's avatar
Mathieu Giraud committed
1
#+TITLE: Vidjil -- Browser Manual
2
#+AUTHOR: The Vidjil team (Mathieu, Mikaël, Marc and Tatiana)
3
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="../css/org-mode.css" />
4

5
6
7
8
9
10
Vidjil is an open-source platform for the analysis of high-throughput sequencing data from lymphocytes.
[[http://en.wikipedia.org/wiki/V(D)J_recombination][V(D)J recombinations]] in lymphocytes are essential for immunological diversity.
They are also useful markers of pathologies, and in leukemia, are used to quantify the minimal residual disease during patient follow-up.
High-throughput sequencing (NGS/HTS) now enables the deep sequencing of a lymphoid population with dedicated [[http://omictools.com/rep-seq-c424-p1.html][Rep-Seq]] methods and software.

This is the help of the [[http://rbx.vidjil.org/browser/][Vidjil browser]].
Mikaël Salson's avatar
Mikaël Salson committed
11
Further help can always be asked to [[mailto:contact@vidjil.org][contact@vidjil.org]]. We can also arrange phone or Skype meeting.
12

13
The Vidjil team (Mathieu, Mikaël, Marc and Tatiana)
14

Mathieu Giraud's avatar
Mathieu Giraud committed
15
* Requirements
16

17
18
** Browser

19
The Vidjil browser runs in any modern browser. It has been successfully tested on the following platforms
20
21
22
 - Firefox version >= 32
 - Chrome version >= 38
 - IE version >= 10.0 (Vidjil will not run on IE 9.0 or below)
23
 - Opera version >= XX
24
 - Safari version >= 6.0
25

26
** The .vidjil files
27

28
The vidjil browser displays =.vidjil= files that summarize the V(D)J
29
30
rearrangements and the sequences found in a run. 

31
32
33
The easiest way to get these files is to [[http://rbx.vidjil.org/browser][request an account]] on the Vidjil server.
You will then be able to upload,
manage, process your runs (=.fasta=, =.fastq=, =.gz= or =.clntab= files) directly on the browser 
34
(see below 'patient database'), and the server behind the patient
35
36
database computes these =.vidjil= files.
Otherwise, such =.vidjil= files can be obtained:
37
 - from the command-line version of Vidjil (starting from
38
39
40
41
42
   =.fasta=, =.fastq= or =.gz= files, see [[http://git.vidjil.org/blob/master/doc/algo.org][algo.org]]).
   To gather several =.vidjil= files, you have to use the [[http://git.vidjil.org/blob/master/tools/fuse.py][fuse.py]] script
 - or by any other V(D)J analysis pipelines able to output files
   respecting the =.vidjil= [[format-analysis.org][file format]] (contact us if you are interested)

43
44


Mathieu Giraud's avatar
Mathieu Giraud committed
45
* First aid
46

47
- Open data by:
48
    - either with “patients”/“open patient”  if you are connected to a patient database, such as on http://rbx.vidjil.org/
49
      (in this case, there are always some "Demo" datasets for demonstration purposes),
50
    - or with “file”/“import/export”, manually selecting a =.vidjil= file
51

52
53
54
- You can change the number of displayed clones by moving the slider “number of clones” (menu “filter”).
  The maximal number of clones that can be displayed depends on the processing step before.
  See below ("Can I see all the clones ?").
55

56
57
58
59
60
- Clones can be selected by clicking on them either in the list, on the time graph,
  or the grid (simple selection or rectangle selection).

- There are often very similar clones, coming from either somatic hypermutations or from sequencing errors.
  You can select such clones (for example those sharing a same V and a same J), then:
Mathieu Giraud's avatar
Mathieu Giraud committed
61
   - inspect the sequences in the lower panel (possibly using the “align” function),
62
63
   - remove some of these sequences from the selection (clicking on their name in the lower panel)
   - merge them (button “merge”) in a unique clone.
64
     Once several clones are merged, you can still visualize them by clicking on “+” in the list of clones.
65

66
67
68
- Your analysis (clone tagging, renaming, merging) can be saved:
    - either with “patients”/“save analysis” if you are connected to a patient database
    - or with “file”/“export .analysis”
69
70
71

* The elements of the Vidjil browser

Mikaël Salson's avatar
Mikaël Salson committed
72
73
** The info panel (upper left panel)
   - analysis :: name of the configuration file used for displaying the data
Mathieu Giraud's avatar
Mathieu Giraud committed
74
   - locus :: germline used for analyzing the data. In case of multi-locus
75
               data, you can select what locus should be displayed (see [[http://git.vidjil.org/blob/master/doc/locus.org][locus.org]])
Mathieu Giraud's avatar
Mathieu Giraud committed
76
   - sample :: name of the current point (you can change the selected point by clicking on
77
78
              another point in the graph). The name can be edited (“edit”).
   - date :: when the run was performed (edit either with “...”, or with the database, on the patient tab)
Mathieu Giraud's avatar
Mathieu Giraud committed
79
   - segmented :: number of reads where Vidjil found a CDR3, for that sample
Mathieu Giraud's avatar
Mathieu Giraud committed
80
                  See [[Number of segmented reads]] below.
Mikaël Salson's avatar
Mikaël Salson committed
81
   - total :: total number of reads for that point
82
83
84

** The list of clones (left panel)

85
- You can assign other tags with colors to clones using the “★” button.
86
  The “filter” menu allows to further filter clones by tags.
Mathieu Giraud's avatar
Mathieu Giraud committed
87
- Under the “★” button it is possible to normalize clone concentrations
Mikaël Salson's avatar
Mikaël Salson committed
88
  according to this clone. You must specify the expected concentration in the
Mathieu Giraud's avatar
Mathieu Giraud committed
89
  “expected size” field (e.g. 0.01 for 1%). See [[Control with standard/spike]] below.
90

Mathieu Giraud's avatar
Mathieu Giraud committed
91
- The “i” button displays additional information on each clone.
92

Mathieu Giraud's avatar
Mathieu Giraud committed
93
- The list can be sorted on V genes, J genes or clone abundance.
Mathieu Giraud's avatar
Mathieu Giraud committed
94
  The “+” and “-” allow respectively to un-merge or re-merge all clones that have
Mikaël Salson's avatar
Mikaël Salson committed
95
  already been merged.
96

97
98
99
- Clones can be searched (“search” box) by either their name, their custom name, 
  or their DNA sequence.

100
101
102
103
104
** The time graph

The time graph is hidden with there is only one timepoint.

- The current point is highlighted with a vertical gray bar, you can change that by clicking on another point.
105

Mathieu Giraud's avatar
Mathieu Giraud committed
106
- The gray areas at the bottom of the graph show, for each point, the resolution (1 read / 5 reads).
Mikaël Salson's avatar
Mikaël Salson committed
107

108
- You can reorder the points by dragging them, and hide some points by dragging them on the “+” mark at the right of the points.
Mikaël Salson's avatar
Mikaël Salson committed
109
  If you want to recover some hidden points, you need to drag them from the “+” mark to the graph.
110

111
- If your dataset contains sampling dates (for example in a MRD setup), you can switch between point keys and dates in “settings > point key”
112
113


Mathieu Giraud's avatar
Mathieu Giraud committed
114
** The plot view
115

Mathieu Giraud's avatar
Mathieu Giraud committed
116
117
- The "plot" menu allow to change the (grid plot, bar plot) as well as the X and Y axes of these plot
  Some presets are available.
118

Mathieu Giraud's avatar
Mathieu Giraud committed
119
- In the bar plot mode, the Y axis corresponds to the order of clones inside each bar.
120
121

- The “focus“ button (bottom right) allows to further analyze a selection of clones.
Mathieu Giraud's avatar
Mathieu Giraud committed
122
  To exit the focus mode, click on the “X” near the search box.
123
  
124
125
126
To further analyze a set of clones sharing a same V and J, it is often useful
to focus on the clones, then to display them ones according to either their “clone length”
or their “N length” (that is N1-D-N2 in the case of VDJ rearrangements)
127

Mikaël Salson's avatar
Mikaël Salson committed
128
** The aligner (bottom panel)
129

130
131
132
133
134
135
The aligner display nucleotide sequences from selected clones.
   - See "What is the sequence displayed for each clone ?" below
   - Sequences can be aligned together (“align” button), identifying substitutions, insertions and deletions.
   - You can remove sequences from the aligner (and the selection) by clicking on their name
   - You can further analyze the sequences with IMGT/V-QUEST and IgBlast on the selected sequences. This opens another window/tab.
   - You can unselect all sequences by clicking on the background of the grid.
136

137
** The patient database and the server
138

139
140
141
If a server with a patient database is configured with your
installation of Vidjil (as on http://rbx.vidjil.org/browser), the
'patient' menu gives you access to the server.
142

143
With authentication, you can add patients, then add either
144
=.fasta=, =.fastq=, =.gz= or =.clntab= files, then process your
145
runs and save the results of your analysis.
146
147
148
149
150
151
152

*** Patients
      
Once you are authentified, this page show the patient list. Here you
can see your patients and patients whose permission has been given to you.

New patients can be added ('add patient'), edited ('e') or deleted ('X').
153
154
By default, you are the only one who can see and update this new patient.
If you have an admin access, you can grant access to other users ('p').
155
156
157
158

*** Samples

Clicking on a patient give acccess the "samples" page. Each sample is
159
a =.fasta=, =.fastq=, =.gz= or =.clntab= file that will be processed by one or several
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
pipelines.
You can see which samples have been processed with the selected
config, and download the sequence files if they are available ("dl").

Depending on your granted accesses, you can 
add a new sample to the list ("add file"), 
schedule a processing for a sequence file (select a config and "run"),
or delete a sample ("X").

The processing can take a few seconds to a few hours, depending on the
software lauched, its options and the size of the sample.
Once the processing is finished, click on the button "see result" and
the browser will load the data of the processed files. The first click
on this button can take a few seconds.

175
176
177
178
179
180


* Can I see all the clones ?


The interest of NGS/Rep-Seq studies is to provide a deep view of any
181
V(D)J repertoire. The underlying analysis softwares (such as Vidjil)
182
183
184
185
186
187
188
189
try to analyze as much reads as possible (see below 'Number of segmented reads').
One often wants to "see all clones", but a complete list is difficult
to see in itself. In a typical dataset with about 10^6 reads, even in
the presence of a dominant clone, there can be 10^4 or 10^5 different
clones detected.

** The "top" slider in the "filter" menu

190
191
The "top 50" clones are the clones that are in the first 50 ones
in *at least one* sample. As soon as one clone is in this "top 50"
192
193
list, it is displayed for every sample, even if its concentration is
very low in other samples.
194
195
Most of the time, a "top 50" is enough. The hidden clones are thus the
one that never reach the 50 first clones. With a default installation,
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
the slider can be set to display clones until the "top 100" on the grid 
(and, on the graph, until "top 20").

However, in some cames, one may want to track some clones that are
never in the "top 100", as for example:
  - a standard/spike with low concentration
  - a clone in a MRD following of a patient without the diagnostic point

(Upcoming feature). If clone is "tagged" in the =.vidjil= or
in the =.analysis= file, it will always be shown even if it does not
meet the "top" filter.

** The "other" clone

This virtual clone in the clone list groups all clones that are hidden
(because of the "top" or because of hiding some tags). The sum of
ratios in the list of clones is always 100%: thus the "other" clone
changes when one use the "filter" menu.

Note that the ratios include the "other" clone: if a clone principal
is reported to have 10.54%, this 10.54% ratio relates to the number of
analyzed reads, including the hidden clones.



221
222

* What is the sequence displayed for each clone ?
223
<<representative>>
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
The sequences displayed for each clone are not individual reads.  
The clones may gather thousands of reads, and all these reads can have
some differences. Depending on the sequencing technology, the reads
inside a clone can have different lengths or can be shifted,
especially in the case of overlapping paired-end sequencing. There can be also
some sequencing errors.
The =.vidjil= file has to give one consensus sequence per clone, and
Rep-Seq algorithms have to deal with great care to these difference in
order to not gather reads from different clones.

For the Vidjil algorithm, it is required that the window centered on
the CDR3 is /exactly/ shared by all the reads. The other positions in
the consensus sequence are guaranteed to be present in /at least half/
of the reads. The consensus sequence can thus be shorter than some reads.


240
* How can I assess the quality of the data and the analysis ?
241

Mikaël Salson's avatar
Mikaël Salson committed
242
To make sure that the PCR, the sequencing and the Vidjil analysis went well, several elements can be controlled.
243

Mikaël Salson's avatar
Mikaël Salson committed
244
** Number of segmented reads
245
246
A first control is to check the number of “segmented reads” in the info panel (top left box).
For each point, this shows the number of reads where Vidjil found a CDR3.
247
248
     
Ratios above 90% usually mean very good results. Smaller ratios, especially under 60%, often mean that something went wrong.
249
The “info“ button further detail the causes of non-segmentation (UNSEG).
250
251
There can be several causes leading to bad ratios: 

252
253
254
255
*** Analysis or biological causes

   - The data actually contains other germline/locus that what was searched for
      (solution: relauch Vidjil, or ask that we relaunch Vidjil, with the correct germline sequences).
256
      See [[http://git.vidjil.org/blob/master/doc/locus.org][locus.org]] for information on the analyzable locus.
Mikaël Salson's avatar
Mikaël Salson committed
257

258
259
   - There are incomplete/exceptional recombinations
     (Vidjil can process some of them, config =multi+inc= or command-line option =-i=).
Mikaël Salson's avatar
Mikaël Salson committed
260

261
262
   - There are too many hypersomatic mutations
     (usually Vidjil can process mutations until 10% mutation rate... above that threshold, some sequences may be lost).
263

264
265
   - There are chimeric sequences or translocations
     (Vidjil does not process these sequences).
266

Mikaël Salson's avatar
Mikaël Salson committed
267
*** PCR or sequencing causes
268

269
270
   - the read length is too short, the reads do not span the junction zone (UNSEG too few V/J).
      (Vidjil detects a “window” including the CDR3. By default this window is 40–60bp long, so the read needs be that long centered on the junction).
271

Mikaël Salson's avatar
Mikaël Salson committed
272
   - In particular, for paired-end sequencing, one of the ends can lead to reads not fully containing the CDR3 region
273
      (solution: ignore this end, or extend the read length, or merge the ends with very conservative parameters).
274

Mikaël Salson's avatar
Mikaël Salson committed
275
276
   - There were too many PCR or sequencing errors
      (this can be asserted by inspecting the related clones, checking if there is a large dispersion around the main clone)
277

Mikaël Salson's avatar
Mikaël Salson committed
278
** Control with standard/spike
279

Mikaël Salson's avatar
Mikaël Salson committed
280
   - If your sample included a standard/spike control, you should first
Mathieu Giraud's avatar
Mathieu Giraud committed
281
282
     identify the main standard sequence (if that is not already done) and
     specify its expected concentration (by clicking on the “★” button).
Mikaël Salson's avatar
Mikaël Salson committed
283
284
     Then the data is normalized according to that sequence.
   - You can (de)activate normalization in the settings menu.
285

Mikaël Salson's avatar
Mikaël Salson committed
286
287
288
289
290
291
292
** Steadiness verification
   - When assessing different PCR primers, PCR enzymes, PCR cycles, one may want to see how regular the concentrations are among the points.
   - When following a patient one may want to identify any clone that is emerging.
   - To do so, you may want to change the color system, in the “color” menu
     select “by abundance at selected timepoint”.  The color ranges from red
     (high concentration) to purple (low concentration) and allows to easily
     spot on the graph any large change in concentration.
Mathieu Giraud's avatar
Mathieu Giraud committed
293

294

295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
** Clone coverage
   The clone coverage is computed over the consensus sequence which is
   displayed for each clone (see [[representative][What is the sequence displayed for each clone?]]). 
   Its length should be representative of the read lengths among that clone. A
   clone can be constituted of thousands of reads of various lengths. We
   expect the consensus sequence to be close to the median read length of the
   clone. The clone coverage is such a measure: having a clone coverage
   between .85 and 1 is quite frequent. On the contrary, if it is .5 it means that the consensus sequence
  length is half shorter than the median read length in the clone.

  There is a bad clone coverage ($<.5$) when reads do share the same window
  (it is how Vidjil defines a clone) and when they have frequent discrepancies
  outside of the window. Such cases have been observed with chimeric reads
  which share the same V(D)J recombinations in their first half and have
  totally different and unknown sequences in their second half.
310
311
312
313
314
315
* Keyboard shortcuts

** Browser 

  | =←= and =→=             | navigate between samples                            |
  | =Shift-←= and =Shift-→= | decrease or increase the number of displayed clones |
Mathieu Giraud's avatar
Mathieu Giraud committed
316
  | numeric keypad, =0-9=   | switch between available plot presets               |
317
318
319
320
321


  | =a=: TRA        |                                    |
  | =b=: TRB        |                                    |
  | =g=: TRG        |                                    |
Mathieu Giraud's avatar
Mathieu Giraud committed
322
  | =d=: TRD, TRD+  | change the selected germline/locus |
323
324
  | =h=: IGH, IGH+  |                                    |
  | =l=: IGL        |                                    |
325
  | =k=: IGK, IGK+  |                                    |
326
327
  Note: You can select just one locus by holding the Shift key while pressing
  the letter corresponding to the locus of interest
328
329
330
331
332
333
334
335
336


** Browser connected to a patient databse

 | =Ctrl-s=  | save the analysis         |
 | =Shift-p= | open the 'patient' window |



337

338
339
340
341
342
* Reference

If you use Vidjil for your research, please cite the following reference:

Mathieu Giraud, Mikaël Salson, et al.,
343
“Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing”,
344
345
346
BMC Genomics 2014, 15:409 
http://dx.doi.org/10.1186/1471-2164-15-409

Mathieu Giraud's avatar
Mathieu Giraud committed
347