user.org 22.7 KB
Newer Older
1
2
#+TITLE: Vidjil -- Web Application Manual
#+AUTHOR: The Vidjil team (Mathieu, Mikaël, Florian, Marc, Ryan and Tatiana)
3
#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="../css/org-mode.css" />
4

5
6
7
8
9
Vidjil is an open-source platform for the analysis of high-throughput sequencing data from lymphocytes.
[[http://en.wikipedia.org/wiki/V(D)J_recombination][V(D)J recombinations]] in lymphocytes are essential for immunological diversity.
They are also useful markers of pathologies, and in leukemia, are used to quantify the minimal residual disease during patient follow-up.
High-throughput sequencing (NGS/HTS) now enables the deep sequencing of a lymphoid population with dedicated [[http://omictools.com/rep-seq-c424-p1.html][Rep-Seq]] methods and software.

10
This is the help of the [[http://app.vidjil.org/browser/][Vidjil web application]].
Mikaël Salson's avatar
Mikaël Salson committed
11
Further help can always be asked to [[mailto:contact@vidjil.org][contact@vidjil.org]]. We can also arrange phone or Skype meeting.
12

13
The Vidjil team (Mathieu, Mikaël, Florian, Marc, Ryan and Tatiana)
14

Mathieu Giraud's avatar
Mathieu Giraud committed
15
* Requirements
16

17
** Web application
18

19
The Vidjil web application runs in any modern browser. It has been successfully tested on the following platforms
20
21
22
 - Firefox version >= 32
 - Chrome version >= 38
 - IE version >= 10.0 (Vidjil will not run on IE 9.0 or below)
23
 - Opera version >= XX
24
 - Safari version >= 6.0
25

26
** The .vidjil files
27

28
The vidjil web application displays =.vidjil= files that summarize the V(D)J
Mathieu Giraud's avatar
Mathieu Giraud committed
29
recombinations and the sequences found in a run. 
30

31
The easiest way to get these files is to [[http://rbx.vidjil.org/browser][request an account]] on the public Vidjil test server.
32
You will then be able to upload,
33
manage, process your runs (=.fasta=, =.fastq=, =.gz= or =.clntab= files) directly on the web application
34
(see below 'patient database'), and the server behind the patient
35
36
database computes these =.vidjil= files.
Otherwise, such =.vidjil= files can be obtained:
37
 - from the command-line version of Vidjil (starting from
38
39
40
   =.fasta=, =.fastq= or =.gz= files, see [[http://git.vidjil.org/blob/master/doc/algo.org][algo.org]]).
   To gather several =.vidjil= files, you have to use the [[http://git.vidjil.org/blob/master/tools/fuse.py][fuse.py]] script
 - or by any other V(D)J analysis pipelines able to output files
41
   respecting the =.vidjil= [[./format-analysis.org][file format]] (contact us if you are interested)
42

43
44


Mathieu Giraud's avatar
Mathieu Giraud committed
45
* First aid
46

47
- Open data by:
48
    - either with “patients”/“open patient”  if you are connected to a patient database, such as on http://app.vidjil.org/
49
      (in this case, there are always some "Demo" datasets for demonstration purposes),
50
    - or with “file”/“import/export”, manually selecting a =.vidjil= file
51

52
53
54
- You can change the number of displayed clones by moving the slider “number of clones” (menu “filter”).
  The maximal number of clones that can be displayed depends on the processing step before.
  See below ("Can I see all the clones ?").
55

56
57
58
59
60
- Clones can be selected by clicking on them either in the list, on the time graph,
  or the grid (simple selection or rectangle selection).

- There are often very similar clones, coming from either somatic hypermutations or from sequencing errors.
  You can select such clones (for example those sharing a same V and a same J), then:
Mathieu Giraud's avatar
Mathieu Giraud committed
61
   - inspect the sequences in the lower panel (possibly using the “align” function),
62
63
   - remove some of these sequences from the selection (clicking on their name in the lower panel)
   - merge them (button “merge”) in a unique clone.
64
     Once several clones are merged, you can still visualize them by clicking on “+” in the list of clones.
65

66
67
68
- Your analysis (clone tagging, renaming, merging) can be saved:
    - either with “patients”/“save analysis” if you are connected to a patient database
    - or with “file”/“export .analysis”
69

70
71
72
You are advised to go through to the tutorial available from [[http://www.vidjil.org/doc]]
to learn the essential features of Vidjil.

73
* The elements of the Vidjil web application
74

Mikaël Salson's avatar
Mikaël Salson committed
75
** The info panel (upper left panel)
76
77
78
79
80
81
82
83
84
   - patient information :: useer can put some informations in this case to retain about the patient. 
   - locus :: germline used for analyzing the data. In case of multi-locus 
              data, you can select what locus should be displayed (see [[http://git.vidjil.org/blob/master/doc/locus.org][locus.org]])
   - analysis :: name (without extension) of the loaded file used for displaying the data
   - sample :: name of the current sample point. You can also change the current point by clicking directly on his name in the graph panel (when available).
   #The name can be edited (“edit”).
   - date :: indicate the date of the run of the current sample point (edit with the database, on the patient tab). 
             You can change the point viewed by clickong on the =←= and =→= buttons. A cycling view is available by the fix button.
   - segmented :: number of reads where Vidjil found a CDR3, for that sample point
Mathieu Giraud's avatar
Mathieu Giraud committed
85
                  See [[Number of segmented reads]] below.
86
   - total :: total number of reads for that sample point
87
88
89

** The list of clones (left panel)

90
- You can assign other tags with colors to clones using the “★” button.
91
  The “filter” menu allows to further filter clones by tags.
Mathieu Giraud's avatar
Mathieu Giraud committed
92
- Under the “★” button it is possible to normalize clone concentrations
Mikaël Salson's avatar
Mikaël Salson committed
93
  according to this clone. You must specify the expected concentration in the
Mathieu Giraud's avatar
Mathieu Giraud committed
94
  “expected size” field (e.g. 0.01 for 1%). See [[Control with standard/spike]] below.
95

Mathieu Giraud's avatar
Mathieu Giraud committed
96
- The “i” button displays additional information on each clone.
97

Mathieu Giraud's avatar
Mathieu Giraud committed
98
- The list can be sorted on V genes, J genes or clone abundance.
Mathieu Giraud's avatar
Mathieu Giraud committed
99
  The “+” and “-” allow respectively to un-merge or re-merge all clones that have
Mikaël Salson's avatar
Mikaël Salson committed
100
  already been merged.
101

102
103
- Clones can be searched (“search” box) by either their name, their custom name, 
  or their DNA sequence.
104
105
106
107
108
109
- The concentration of some clones may not be displayed. Instead you can have
  either a =+= symbol or a =-= symbol. In the former case that means the clone has
  been detected (positive) but in few reads (typically less than five). In the
  latter case it means that the clone has not been detected (negative) in the
  sample but has been detected in another time point that is not currently
  displayed.
110

111
112
** The time graph

Tatiana Rocher's avatar
Tatiana Rocher committed
113
The time graph is hidden with there is only one timepoint. It shows the X most frequent clones of the sample (this number can be alter with the filter menu).
114

Tatiana Rocher's avatar
Tatiana Rocher committed
115
- The current point is highlighted with a vertical gray bar, you can change that by clicking on another point or using =←= and =→=.
116

Mathieu Giraud's avatar
Mathieu Giraud committed
117
- The gray areas at the bottom of the graph show, for each point, the resolution (1 read / 5 reads).
Mikaël Salson's avatar
Mikaël Salson committed
118

119
- You can reorder the points by dragging them, and hide some points by dragging them on the “+” mark at the right of the points.
Mikaël Salson's avatar
Mikaël Salson committed
120
  If you want to recover some hidden points, you need to drag them from the “+” mark to the graph.
121

122
- If your dataset contains sampling dates (for example in a MRD setup), you can switch between point keys and dates in “settings > point key”
123
124


Mathieu Giraud's avatar
Mathieu Giraud committed
125
** The plot view
126

Tatiana Rocher's avatar
Tatiana Rocher committed
127
128
The grid view show the clones of a selected germline. All the used germlines are on the right of the grid. You can change germline by clicking on it or by using the associated shortcuts (see the shortcuts section).

Mathieu Giraud's avatar
Mathieu Giraud committed
129
130
- The "plot" menu allow to change the (grid plot, bar plot) as well as the X and Y axes of these plot
  Some presets are available.
131

Mathieu Giraud's avatar
Mathieu Giraud committed
132
- In the bar plot mode, the Y axis corresponds to the order of clones inside each bar.
133
134

- The “focus“ button (bottom right) allows to further analyze a selection of clones.
Mathieu Giraud's avatar
Mathieu Giraud committed
135
  To exit the focus mode, click on the “X” near the search box.
136
  
137
138
To further analyze a set of clones sharing a same V and J, it is often useful
to focus on the clones, then to display them ones according to either their “clone length”
Mathieu Giraud's avatar
Mathieu Giraud committed
139
or their “N length” (that is N1-D-N2 in the case of VDJ recombinations)
140

Mikaël Salson's avatar
Mikaël Salson committed
141
** The aligner (bottom panel)
142

143
144
145
146
147
148
The aligner display nucleotide sequences from selected clones.
   - See "What is the sequence displayed for each clone ?" below
   - Sequences can be aligned together (“align” button), identifying substitutions, insertions and deletions.
   - You can remove sequences from the aligner (and the selection) by clicking on their name
   - You can further analyze the sequences with IMGT/V-QUEST and IgBlast on the selected sequences. This opens another window/tab.
   - You can unselect all sequences by clicking on the background of the grid.
149

Mathieu Giraud's avatar
doc    
Mathieu Giraud committed
150
151

* The patient database and the server
152

153
154
If a server with a patient/experiment database is configured with your
installation of Vidjil (as on http://app.vidjil.org/), the
155
'patient' menu gives you access to the server.
156

157
With authentication, you can add patients, then add either
158
=.fasta=, =.fastq=, =.gz= or =.clntab= files, then process your
159
runs and save the results of your analysis.
160

Mathieu Giraud's avatar
doc    
Mathieu Giraud committed
161
162
** The different elements

163
164
*** Patients
      
Mathieu Giraud's avatar
doc    
Mathieu Giraud committed
165
Once you are authenticated, this page show the patient list. Here you
166
167
168
can see your patients and patients whose permission has been given to you.

New patients can be added ('add patient'), edited ('e') or deleted ('X').
169
170
By default, you are the only one who can see and update this new patient.
If you have an admin access, you can grant access to other users ('p').
171

Mathieu Giraud's avatar
doc    
Mathieu Giraud committed
172
173
*** Runs

marc's avatar
doc    
marc committed
174
175
176
177
Runs can be manipulated the same way as patients, New runs can be added ('add run'), 
edited ('e') or deleted ('X').
Runs and Patients are both used to make set of samples who share a same patient or have been sequenced in the same run.
A sample can be included in a patient sample set and a run sample set.
178

marc's avatar
doc    
marc committed
179
180
181
*** Adding a sample

Clicking on a patient or a on a run give acccess to the "samples" page. Each sample is
182
a =.fasta=, =.fastq=, =.gz= or =.clntab= file that will be processed by one or several
183
pipelines with one or several /configurations/ that set software options.
Mikaël Salson's avatar
Mikaël Salson committed
184
185
186
187
188

Depending on your granted access, you can add a new sample to the list (=add file=),
download sequence files when they are available (=dl=) or delete sequence files (=X=).
Note that sequence files may be deleted (in particular to save server disk space),
which is not the case for the results (unless the user wants so).
189

190
You can see which samples have been processed with the selected
191
192
config, and access to the results (=See results=, bottom right).

Mikaël Salson's avatar
Mikaël Salson committed
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
**** Adding a sample
To add a sample, you must add at least one sample file. Each sample file must
be linked to a patient or to a run. One of those fields will be automatically
completed depending on whether you accessed the sample page from a patient or
from a run. Both fields provide autocompletion to help you enter the correct
patient or correct run.  It is advised to fill in both fields (when it makes
sense). However please note that the correspondig patients and runs must have
been created beforehand.

**** Pre-processing
The sample files may be preprocessed and this preprocess is chosen when adding
samples. At the moment the only preprocess avalaible is the paired-end read
merging.

***** Read merging
People using Illumina sequencers may sequence paired-end fragments. It is
*highly* recommended to merge those reads in order to have a read that consists
of the whole DNA fragment instead of split fragments.

There are two configurations to merge reads. Indeed in case the merging is not
possible for some reads we must keep one of the fragments (either R1 or
R2). We cannot keep both because it would bias the quantification (as there
would be two unmerged reads instead of one).  Depending on the sequencing
strategy it could be better to keep R1 or R2 in such a case. Therefore it
really depends on users. You must choose to keep the fragment that most
probably contains both a part of the V and the J genes.


221

Mathieu Giraud's avatar
doc    
Mathieu Giraud committed
222
*** Processing samples, configs
223

224
Depending on your granted accesses, you can schedule a processing for a sequence file (select a config and =run=).
225
The processing can take a few seconds to a few hours, depending on the
226
software lauched, the options set in the config, the size of the sample and the server load.
227

228
229
The base configurations are « TRG », « IGH », « multi » (=-g germline=), « multi+inc » (=-g germline -i=), « multi+inc+xxx » (=-g germline -i -2=, default advised configuration).
See https://github.com/vidjil/vidjil/blob/master/doc/locus.org for information on these configurations.
Mikaël Salson's avatar
Mikaël Salson committed
230

231
232
233
234
The « reload » button (bottom left) updates the status of the task, that should do =QUEUED= → =ASSIGNED= → =RUNNING= → =COMPLETED=.
It is possible to launch several process at the same time (some will wait in the =QUEUED= / =ASSIGNED= states), and also to launch process while you
are uploading data. Finally, you can safely close the window with the patient database (and even the browser) when some process are queued/launched.
The only thing you should not do is to close completely the browser while sequences are uploading.
235
236


Mathieu Giraud's avatar
doc    
Mathieu Giraud committed
237
238
*** Groups

239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
Each patient and run is assigned to at least one group. This determines which groups have access to a patient or run.
Users are assigned to diffrent groups and therefore gain access to any patients and runs that said group has access to.

There are also groups that may be clustered together. Usually this represents an organisation, such as a Hospital.
The organisation has a group to which subgroups are associated. This allows users with different sets of permissions
to gain access to files uploaded to the organisation's group automatically.

Users may be a part of several groups. By default Users are assigned their personnal group to which they can upload files
and be the sole possessor of an access to this file.
Different groups implies different sets of permissions. A user may not have the same permissions on a file accessed
from an organisation's group as (s)he does on files from her/his personnal group, or even from a group associated to
another organisation.

The different permissions that can be attributed are:
  - Read: Permissions to sview patients/runs to which a group or organisation has access to
  - Create: Permissions to create patients/runs
  - Upload: Permissions to upload samples to the patients/runs of a group
  - Run: Permissions to run vidjil on an uploaded samples to the patients/runs of a group
  - View Details: Permissions to view patient/run data in an unencrypted manner for the patients/runs of a group
  - Save: Permissions to save an analysis for the patients/runs of a group

260
261
262
263
* Can I see all the clones ?


The interest of NGS/Rep-Seq studies is to provide a deep view of any
264
V(D)J repertoire. The underlying analysis softwares (such as Vidjil)
265
266
267
268
269
270
271
272
try to analyze as much reads as possible (see below 'Number of segmented reads').
One often wants to "see all clones", but a complete list is difficult
to see in itself. In a typical dataset with about 10^6 reads, even in
the presence of a dominant clone, there can be 10^4 or 10^5 different
clones detected.

** The "top" slider in the "filter" menu

273
274
The "top 50" clones are the clones that are in the first 50 ones
in *at least one* sample. As soon as one clone is in this "top 50"
275
276
list, it is displayed for every sample, even if its concentration is
very low in other samples.
277
278
Most of the time, a "top 50" is enough. The hidden clones are thus the
one that never reach the 50 first clones. With a default installation,
279
280
281
282
283
284
285
286
287
288
289
290
the slider can be set to display clones until the "top 100" on the grid 
(and, on the graph, until "top 20").

However, in some cames, one may want to track some clones that are
never in the "top 100", as for example:
  - a standard/spike with low concentration
  - a clone in a MRD following of a patient without the diagnostic point

(Upcoming feature). If clone is "tagged" in the =.vidjil= or
in the =.analysis= file, it will always be shown even if it does not
meet the "top" filter.

291
** The "smaller clones"
292

293
There is a virtual clone per locus in the clone list which groups all clones that are hidden
294
(because of the "top" or because of hiding some tags). The sum of
295
ratios in the list of clones is always 100%: thus the "smaller clones"
296
297
changes when one use the "filter" menu.

298
Note that the ratios include the "smaller clones": if a clone
299
300
301
302
303
is reported to have 10.54%, this 10.54% ratio relates to the number of
analyzed reads, including the hidden clones.



304
305

* What is the sequence displayed for each clone ?
306
<<representative>>
307
308
309
310
311
312
313
314
The sequences displayed for each clone are not individual reads.  
The clones may gather thousands of reads, and all these reads can have
some differences. Depending on the sequencing technology, the reads
inside a clone can have different lengths or can be shifted,
especially in the case of overlapping paired-end sequencing. There can be also
some sequencing errors.
The =.vidjil= file has to give one consensus sequence per clone, and
Rep-Seq algorithms have to deal with great care to these difference in
315
order not to gather reads from different clones.
316
317
318
319
320
321
322

For the Vidjil algorithm, it is required that the window centered on
the CDR3 is /exactly/ shared by all the reads. The other positions in
the consensus sequence are guaranteed to be present in /at least half/
of the reads. The consensus sequence can thus be shorter than some reads.


323
* How can I assess the quality of the data and the analysis ?
324

Mikaël Salson's avatar
Mikaël Salson committed
325
To make sure that the PCR, the sequencing and the Vidjil analysis went well, several elements can be controlled.
326

Mikaël Salson's avatar
Mikaël Salson committed
327
** Number of segmented reads
328
329
A first control is to check the number of “segmented reads” in the info panel (top left box).
For each point, this shows the number of reads where Vidjil found a CDR3.
330
331
     
Ratios above 90% usually mean very good results. Smaller ratios, especially under 60%, often mean that something went wrong.
332
The “info“ button further detail the causes of non-segmentation (=UNSEG=, see detail on [[http://git.vidjil.org/blob/master/doc/algo.org][algo.org]]).
333
334
There can be several causes leading to bad ratios: 

335
336
337
338
*** Analysis or biological causes

   - The data actually contains other germline/locus that what was searched for
      (solution: relauch Vidjil, or ask that we relaunch Vidjil, with the correct germline sequences).
339
      See [[http://git.vidjil.org/blob/master/doc/locus.org][locus.org]] for information on the analyzable locus.
Mikaël Salson's avatar
Mikaël Salson committed
340

341
342
   - There are incomplete/exceptional recombinations
     (Vidjil can process some of them, config =multi+inc= or command-line option =-i=).
Mikaël Salson's avatar
Mikaël Salson committed
343

344
345
   - There are too many hypersomatic mutations
     (usually Vidjil can process mutations until 10% mutation rate... above that threshold, some sequences may be lost).
346

347
348
   - There are chimeric sequences or translocations
     (Vidjil does not process these sequences).
349

Mikaël Salson's avatar
Mikaël Salson committed
350
*** PCR or sequencing causes
351

352
   - the read length is too short, the reads do not span the junction zone (UNSEG too few V/J or UNSEG only V/J).
353
      (Vidjil detects a “window” including the CDR3. By default this window is 40–60bp long, so the read needs be that long centered on the junction).
354

Mikaël Salson's avatar
Mikaël Salson committed
355
   - In particular, for paired-end sequencing, one of the ends can lead to reads not fully containing the CDR3 region
356
      (solution: ignore this end, or extend the read length, or merge the ends with very conservative parameters).
357

Mikaël Salson's avatar
Mikaël Salson committed
358
359
   - There were too many PCR or sequencing errors
      (this can be asserted by inspecting the related clones, checking if there is a large dispersion around the main clone)
360

Mikaël Salson's avatar
Mikaël Salson committed
361
** Control with standard/spike
362

Mikaël Salson's avatar
Mikaël Salson committed
363
   - If your sample included a standard/spike control, you should first
Mathieu Giraud's avatar
Mathieu Giraud committed
364
365
     identify the main standard sequence (if that is not already done) and
     specify its expected concentration (by clicking on the “★” button).
Mikaël Salson's avatar
Mikaël Salson committed
366
367
     Then the data is normalized according to that sequence.
   - You can (de)activate normalization in the settings menu.
368

Mikaël Salson's avatar
Mikaël Salson committed
369
370
371
372
373
374
375
** Steadiness verification
   - When assessing different PCR primers, PCR enzymes, PCR cycles, one may want to see how regular the concentrations are among the points.
   - When following a patient one may want to identify any clone that is emerging.
   - To do so, you may want to change the color system, in the “color” menu
     select “by abundance at selected timepoint”.  The color ranges from red
     (high concentration) to purple (low concentration) and allows to easily
     spot on the graph any large change in concentration.
Mathieu Giraud's avatar
Mathieu Giraud committed
376

377

378
379
380
381
382
383
384
385
386
387
** Clone coverage
   The clone coverage is computed over the consensus sequence which is
   displayed for each clone (see [[representative][What is the sequence displayed for each clone?]]). 
   Its length should be representative of the read lengths among that clone. A
   clone can be constituted of thousands of reads of various lengths. We
   expect the consensus sequence to be close to the median read length of the
   clone. The clone coverage is such a measure: having a clone coverage
   between .85 and 1 is quite frequent. On the contrary, if it is .5 it means that the consensus sequence
  length is half shorter than the median read length in the clone.

388
  There is a bad clone coverage (< 0.5) when reads do share the same window
389
390
391
392
  (it is how Vidjil defines a clone) and when they have frequent discrepancies
  outside of the window. Such cases have been observed with chimeric reads
  which share the same V(D)J recombinations in their first half and have
  totally different and unknown sequences in their second half.
393

394
  In the web application, the clones with a low clone coverage (< 0.5) are displayed in
395
396
397
  the list with an orange I on the right. You can also visualize the clones
  according to their clone coverage by selecting for example “clone
  coverage/GC content” in the preset menu of the “plot” box.
398
399
400
401
* Keyboard shortcuts

  | =←= and =→=             | navigate between samples                            |
  | =Shift-←= and =Shift-→= | decrease or increase the number of displayed clones |
Mathieu Giraud's avatar
Mathieu Giraud committed
402
  | numeric keypad, =0-9=   | switch between available plot presets               |
403
404
405
406
407


  | =a=: TRA        |                                    |
  | =b=: TRB        |                                    |
  | =g=: TRG        |                                    |
Mathieu Giraud's avatar
Mathieu Giraud committed
408
  | =d=: TRD, TRD+  | change the selected germline/locus |
409
410
  | =h=: IGH, IGH+  |                                    |
  | =l=: IGL        |                                    |
411
  | =k=: IGK, IGK+  |                                    |
412
  | =x=: xxx        |                                    |
413
  Note: You can select just one locus by holding the Shift key while pressing
414
  the letter corresponding to the locus of interest.
415

416
417
 | =Ctrl-s=  | save the analysis         (when connected to a patient database)  |
 | =Shift-p= | open the 'patient' window (when connected to a patient database) |
418
419
420
421




422
* References
423

424
If you use Vidjil for your research, please cite the following references:
425

426
427
428
Marc Duez et al.,
“Vidjil: High-throughput analysis of immune repertoire”,
submitted
429
430

Mathieu Giraud, Mikaël Salson, et al.,
431
“Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing”,
432
433
434
BMC Genomics 2014, 15:409 
http://dx.doi.org/10.1186/1471-2164-15-409

Mathieu Giraud's avatar
Mathieu Giraud committed
435