Attention une mise à jour du serveur va être effectuée le lundi 17 mai entre 13h et 13h30. Cette mise à jour va générer une interruption du service de quelques minutes.

Commit 93409bae authored by Mathieu Giraud's avatar Mathieu Giraud

tutorial: split into several files

parent 734def88
Pipeline #68862 passed with stage
in 5 seconds
This diff is collapsed.
\section{Viewing and filtering clones}
\subsection{Looking to a clone}
Each RepSeq algorithm has its own definition of what a clone is (or, more precisely
a clonotype), and on how to output its sequence and how to assign a V(D)J designation.
In this file, the most abundant clone
is \texttt{IGHV3-9 7/CCCGGA/17 J6*02}.
\question{Select this clone, either by clicking on the list or on the grid.
How many reads do this clone represent? (see again the bottom part to the right)}
There are several options to display the V(D)J designation.
\question{In the \com{settings} menu, select \com{length} to show N zones by their length. Revert to the
default \com{sequence (when short)} setting to show the full N on short sequences.}
\question{Try also the options \com{alleles in clone names} : by selecting \com{always}, the clone
V gene is displayed as \com{IGHV3-9*01}. Revert to the default \com{when not *01} to have more condensed V(D)J designations.}
\subsection{Showing more clones}
By default Vidjil displays the 50 most abundant clones at each time point.
With five time points, we may therefore have from 50 to 250 clones displayed
depending if the top 50 are always the same or always different or, more
realistically, in-between.
This number can be increased to a maximum of 100 clones by going to the \com{filter} menu and by putting the
slider to its right end.
\question{Notice how the IGH smaller clones percentage changes. What was its
initial value? What is it now?}
The \textit{smaller clones} correspond to clones that are not displayed
because they are never among the most abundant ones.
\subsection{Tagging and filtering clones}
Consider the most abundant clones in the list: \texttt{IGHV3-9 7/CCCGGA/17 J6*02} and \texttt{TRGV10 13//5 JP1}.
Usually we may want to tag them in order to remember it later on.
\question{Click on the star and choose colored tags for these two clones, such as \texttt{clone 1} or \texttt{clone 2}.
Notice how the color applies throughout all the views.}
Later you may want to filter clones depending on the tags you have chosen.
\question{In the upper left part, click on the little gray square (at the
right of the coloured squares). What happens? What if you click again?}
This is a way of filtering some clones. This may be useful when we want to
focus on some specific clones. Another way of doing so is to filter them by
their gene names or by their DNA sequences.
\question{In the search box,
enter \texttt{GGAGTCGGGG} and validate with \texttt{Enter}. How many sequences are
left?}
Note that the search is performed both on the forward and the reverse strand.
\question{Check that by searching for the reverse complement of the
sequence: \texttt{CCCCGACTCC}. Do you find the same results as previously?}
\question{How can you cancel this filter and view again all the clones?}
\bigskip
Another solution to tag a specific clone is to rename it.
\question{Double click on the name of a clone (in the list of clones) and
choose another name (\textit{e.g.} interesting clone) and validate using
\texttt{Enter}.}
\bigskip
After this rename, you can see that the clone is still selected.
\question{Click on several clones by holding the \texttt{Ctrl} key to select
more. Each time you add a new clone to the selection, its sequence
is added in the bottom part.}
\question{How many clones are selected? How many reads do those clones
represent?}
\question{\new Notice the star at the the right of the screen, near the number
of reads. You can also tag clones using this icon. In that way, you will be able to tag
all the selected clones at once.}
\question{When you want to focus on the selected clones, you can click on the
focus link on the right, next to the number of selected clones.
This feature is useful when you want to analyse some clones more thoroughly
without being annoyed by other clones.}
\question{To remove this focus, click on the cross next to the search box,
above the list.}
\question{To unselect them all, you can click in an empty area on the top or
bottom plot.}
Sometimes, one wants to hide noisy or unrelated clones.
\question{Select a clone or several clones and click on the \com{hide} button, near the \com{focus} button. Show again these
clones by clicking on the cross next to the search box.}
% Another way to hide clonesis to assign is to change the tag of it as ``standard (niose)`` and choose to uncheck this tag by clicking on the corresponding tile on the list of tiles at the informatons panel to switch them from a visible state to a filter one.
%%% Voir ci-dessus, déjà mis
\section{Analysing clone populations}
\subsection{Clustering clones through inspection of their sequences}
The first thing to be done is to see if some clones should be clustered (because
of sequencing or PCR errors for instance). This step could be automatized
but, in any case, the automatic clustering would need to be checked by an expert
eye.
By default in the bottom plot (the \textit{grid}), the clones
are displayed according to their V and J genes (or more generally to their
5' and 3' genes).
\question{Identify in the grid the clones with an
\textit{IGHV-3-13}~\textit{IGHJ6} recombination and select them
all. You can do so either by holding \texttt{Ctrl} or by drawing a rectangle around the clones while
maintaining down the left button of the mouse.}
The sequences of the clones now appear in the bottom part of the browser (the
\textit{sequence panel}). If many clones are selected you can view more sequences
by moving the mouse above the sequence panel.
\new In such a case, you may be bothered by the sequence panel going up and
down each time your mouse enters or exits the sequence panel. You can stick it
in its current shape by clicking on the pin at the upper right corner of the
sequence panel.
Then, the sequences in the sequence panel can be visually compared but you can also align
them to see more easily their similarities.
\question{Click on the \com{align} button on the left-hand side. The differences are
emphasized in bold.}
Now it is the user's expertise to determine if sequences are sufficiently
similar, depending on her or his specific question. If some sequences don't appear to be similar enough, you can remove
them from the sequence panel by clicking on the cross in front of the sequence in
the sequence panel.
\question{Remove all the sequences that are not similar enough with the first
one.}
Now all the sequences in the sequence panel should be highly similar. All their
differences could be due to sequencing or PCR errors.
These artifacts (mutations, homopolymers, insertions, deletions)
depend on the sequencer and the PCR technique.
\question{Cluster all those clones in a single clone by clicking on the ``cluster''
button, next to the \com{align} button.}
All the clustered sequences now appear within a same clone. That can be seen
in the list: the clone which hosts the subclones appears with a $+$ on its
left. You can click on the $+$ to see the subclones that have been clustered in
the main one.
\question{Click on the $+$ and observe the changes in the grid.}
As you may have noticed the subclones appear again in the grid. You can
compare their sequences again if you'd like (for example to double check that
you were right to cluster them). You can also remove some subclones from the
cluster by clicking on the cross at their left in the list.
\question{For the sake of the exercise, remove the last clone of the cluster.}
\question{%
%For the next step, choose the preset \com{V distribution} (keyboard shortcut \com{5}).
% On n'a pas encore parlé ici des presets.
Open the \com{cluster} menu, and choose \com{cluster by V/5}. What happened ? There are now two clones with TRGV2. Why ?}
%% Confirm this by changing the x axis into ``V allele``.
%%% -> Problème, on n'a pas encore parlé des axes à cet endroit.
\question{In the \com{cluster} menu, select \com{revert to previous clusters} to undo these clusterings.}
\subsection{Other metrics and analysis on the clones}
As a proxy to sequence similarity we used the V and J genes, however there are
other ways to assess sequence similarity that may be more pertinent.
Moreover you may want to plot other metrics on the lymphocyte population.
%
For instance we can choose to plot the V genes versus the length of the N
insertions.
\question{Go to the \com{plot} menu (in the upper left corner of the grid),
and in the preset box choose \com{V/N length}.}
Then you can continue aligning and clustering clones if necessary.
\question{You can also try the preset \com{clone consensus length/GC content}
which tends to separate quite nicely the distinct clones.}
Note that you can choose any axis to be plotted: just go the \com{plot} menu and
select any value you would like for the $x$ axis and for the $y$ axis.
For bar charts, the box sizes always relates to the clone size,
and the $y$ axis selects the order of the boxes sharing a same $x$).
%% \item Regarder les stats disponibles, mettre n°7 (taille des reads)
\question{In the \com{plot} menu, switch between the ``bubble plot'' and the ``bar plot''.
In the bar plot mode, pass the mouse over the bars: What happens?}
Another possibility is to request Vidjil to compute the similarity between
clones.
\question{Now select the preset \com{plot by similarity} or even \com{plot
similarity by locus} to plot similarity for the current locus (beware: this
may take some time).}
Now the most similar clones should be close together. However note that it is
theoretically impossible to achieve such a representation in 2 dimensions. So
it is possible that two dissimilar clones are close together or, conversely,
that two similar clones are far apart.
\question{Press the keys \texttt{0} to \texttt{9} on the numeric keypad. What happens ?}
There is still a feature to help you analyse your data that we have not
explored yet.
You can change the colors to make it represent some variables of interest
with the \com{color by} menu.
\question{First choose the preset \com{plot by similarity and by locus} and
then color by \com{N length} (in the box at the top of the screen).}
\marginpar{We apologize to color blinds: the colors are not yet color-blind friendly.}Clones that are close on the grid with similar colors are likely to
be similar.
\question{Choose now the preset \com{CDR3 length distribution} and
then color by \com{productivity}.
See that the color tiles in the info part (upper right) change to show the color key.}
\question{\new Instead of coloring by productivity, you could also color by
\com{clone}. When coloring by \com{clone}, each clone has a random color. Thus in
a bar plot, it is a convenient color mode to see the peaks that are due to a
single clone or to several clones.
However clones may be very similar. Another option is to color by
\com{CDR3}. In such case all clones with the same CDR3 will have the same
color (note that, due to a lack of available colors two different CDR3s
could share the same color just by chance).}
Using those different features you should be able to analyse how similar your
sequences are, and potentially you could cluster them if you'd like or tag them.
\question{\new
Select the most abundant clone. It now appears in the sequence panel.
Now we would like to compare the sequence with the germline genes.
We can add the germline genes to the sequence panel by going
to the \com{import/export} menu and by clicking on \com{add germline genes}.
Now we can click on the \com{align} button to see the alignment between the
genes and the sequence. Mutations can be identified and silent mutations are
displayed with a double border in blue.
}
\bigskip
\textit{This part is specific to samples analyzed with Vidjil-algo.}
Some clones may be less trustable than other ones\dots{} Let's see how to spot them.
\question{In the clone list, search clones with an orange warning at the
right side. Click on the warning. What are the warnings due to?}
There may have two reasons:
\begin{itemize}
\item average coverage: in that case the clonal sequence displayed is short
compared to the reads in the clone. This may be the case when too different
sequences have been put in a clone. The value is generally $\geq 80\,\%$.
\item $e$-value: It is a statistical value computed to ensure that
recombinations have not been spot by chance. This value is generally much
lower than 1 ($<10^{-5}$).
\end{itemize}
You can view those values for any clone by clicking the \textit{i} icon on the
right side, in the list of clones.
\subsection{Analysing recombinations from several loci}
If you want to focus on specific locus, you can click on the locus name in
the upper left part. One click will make the locus disappear, another one will
make it appear again.
If you hold the \texttt{Shift} key (the one which is usually above the left
\texttt{Ctrl} key) while clicking it will hide all the loci but the one you
clicked on.
\question{Click on \com{IGH}, while holding the \texttt{Shift} key. Now what is the
number of analyzed reads? Why did it change?}
\question{Now click on \com{TRG}, to filter it in again.}
\question{Press on the \texttt{g} key. What happens? Now, press on the
\texttt{h} key. Press on the \texttt{g} again (you can do that anytime you
like :)). Let's stick to the TRG locus.}
You can also change the current locus by clicking on the locus name in the
right part of the grid.
\subsection{Clone quantification (using spike-ins)}
Sometimes you may include spike-ins in your sample to allow a more reliable
quantification.
Let us assume that the main clone with IGHV-3-9 / IGHJ5 is a spike-in whose
expected concentration is 1\% (.01).
\question{First let's color this clone with the \com{standard} tag.}
\question{\new Now we will set its concentration to .01 as expected. Click again on
the star. In the \com{normalize to} field enter \com{.01} and click \com{ok}.
Now, in the graph, this clone should correspond to a straight line at 1\%.}
\question{\new Notice how the concentrations of the other clones have changed
accordingly.
You can go to the \com{settings} menu to disable this normalization and to
go back to the raw concentrations.}
Then you can set expected concentrations for other clones and you are free to
switch between those normalizations.
It is also possible to set up normalization against external data,
contact us if you are interested.
\section{Working with external software and exporting data}
\subsection{Checking VDJ designations with other software}
For some studies, VDJ designations are very important.
In the list and in the sequence panel, those designations are written in their
short form.
\question{Put the mouse cursor over a clone. In the status bar (between the
grid and the sequence panel), the complete designation appears.}
We can double check this designation with other popular software.
\question{Select a few clones.}
\marginpar{This requires an internet connection.}
\question{Click on the down triangle, which is right to \com{IMGT/V-QUEST}. The
clone sequences are sent to IMGT/V-QUEST.}
\question{Then tick the checkbox 5'V/D/3'J. In the sequence panel the boundaries of
the V(D)J genes as computed by IMGT/V-QUEST are underlined.}
Note that data returned by IMGT/V-QUEST is available by clicking on the \textit{i} icon of analyzed clones,
enabling you to compare the annotations made by the original software and by IMGT/V-QUEST.
\question{You can also directly send the sequences to IMGT/V-QUEST or IgBlast
by clicking the corresponding buttons. This opens a new page with the
corresponding websites.}
\bigskip
It may happen the software makes a mistake in the VDJ designation.
In such a case you're very welcome to report us the problem
and we will try to improve the designation algorithm.
\question{Go in the \com{Help} menu and click on \com{get
support}. It opens your mailer with a pre-composed email
describing the data you are on as well as the clones you selected.}.
Even if you do not use the \com{get support} button, it's a good practise
to send the complete address showing in your web browser, such
as \url{http://app.vidjil.org/?set=3241&config=39&plot=v,size,bar},
when you want to discuss with colleagues or with us your data or your analyses.
\bigskip
Suppose that you would like to change the VDJ designation shown on the web application.
\question{Click on the \textit{i} icon in the list of clones for the clone you
want to change the designation. In the segmentation part, click the edit
button. Choose what you would like to modify.}
Beware: the modifications you made (name changes, clusters, clone
tagging, sample reordering\dots) will \textbf{not} be automatically saved. You have to save
your changes by yourself either by clicking on \com{save patient} in the top left menu (where the
``patient'' name is written) or by using the \texttt{Ctrl+S} keyboard
shortcut.
For this demonstration data, you cannot save your changes as you do not have
the rights to modify this patient.
% TODO : créer un should-vdj automatiquement !
\subsection{Exporting data}
\question{In the export menu, generate printable reports by clicking on both entries starting with \com{export
report}. What differs between both?}
\question{Select some clones and then, in the export menu, choose \com{export
fasta}. What happens?}
\question{Open the \com{import/export} menu, and click on \com{export csv}.
The resulting file describes all visible clones (V(D)J designation, size for each sample).
It can be opened by any spreadsheet software such as LibreOffice Calc or Excel for further analysis.}
\question{Open again \com{import/export} menu, and click on the
\com{export bottom graph} button.
This exports the current view of the plot.}
\question{\new Select some clones and align them. The alignment can be
exported with the \com{export aligned fasta} button in the
\com{import/export} menu.}
\section{Assessing the quality of the run and of the analysis}
The Vidjil web application allows to run several ``RepSeq'' (immune repertoire analysis) algorithms.
Each RepSeq algorithm has its own definition of what a clone is (or, more precisely
a clonotype), how to output its sequence and how to assign a V(D)J designation.
The number of analyzed reads will depend on the algorithm used.
This sample has been processed using the Vidjil algorithm.
\marginpar{The percentage of analyzed reads can range from .01\,\% (for
RNA-Seq or capture data) to 98-99\,\% (for very high-quality runs mostly on
Illumina).}
\question{How many reads have been analyzed in the current sample with the embedded algorithm ?}
Now we will try to assess the reason why some reads were not analyzed in our
sample.
This might reflect a problem during the sequencing protocol\dots or that could
be normal.
For that sake you will need to display the information box by clicking on the
\textit{i} in the upper left part.
\question{What are the average read lengths on IGH? and on TRG?}
The lines starting with \texttt{UNSEG} display the reasons why some reads have
not been analyzed.
You can see what those reasons mean in the online documentation of the
algorithm: \href{http://www.vidjil.org/doc/vidjil-algo\#unsegmentation-causes}{vidjil.org/doc/vidjil-algo\#unsegmentation-causes
}
\question{What are the major causes explaining the reads have not been
analyzed? Also have a look at the average read lengths of these causes. Do
you notice something regarding the average read lengths?}
\section{Dealing with samples and patients}
We will see how to make the best use of the patient and sample database and
how to use it efficiently.
For this sake you need an account with the rights to create new patients,
runs, sets, to upload data and, preferably, to run analyses.
Therefore the demo account is not suitable.
\question{ Retrieve the toy dataset at
\href{http://vidjil.org/seqs/tutorial_dataset.zip}{vidjil.org/seqs/tutorial\_dataset.zip}
and extract the files from the archive.}
You should now have three files. We will imagine that those three files are
the results from a single sequencing run. More precisely, each one corresponds to
a single patient. Thus we now want to upload those files and assign all of
them to a same \com{run} and each of them to a single \com{patient}.
\question{
Go to the main page of the Vidjil platform (by default
\href{https://app.vidjil.org}{app.vidjil.org}).
You should be on the \com{patients} page.
Go at the bottom of the page and click on \com{+ new patients} to create the
three patients.
}
Note that usually you should check whether the patient has
already been created by searching her/his name in the search box at the
upper left corner
\question{
You are now on the creation page for patients, runs, and sets.
You can create as many patients, runs and sets as you want.
\marginpar{Patients, runs and sets are just different ways to
group samples.
The names are just used to add some semantic so that you
know that your patients will be on the patient page, your runs on the run
page and your other sets (thus any set of samples you want to make) on the
run page.}
Here we already have a line to create one patient.
We want to create two additional patients and one run.
Thus click twice on \com{add patient} and once on \com{add run}.
}
Now you should have three lines with Patient 1, Patient 2, Patient 3 and one
line with Run 1.
If you created too many lines you can remove some by clicking on the cross at
the right hand side.
\question{
For instance click on the cross corresponding to Patient 3.
The line has now been removed.
Click again on \com{add patient} so that the line appears again (it is now
called Patient 4).
}
\question{ Now you can fill the mandatory fields (circled with red) and,
optionally, the other fields.}
The last field is optional but it is very important (the field called
\com{patient/run information (\#tags can be used)}.
Here you can enter any information relevant to this set of samples.
More specifically you can enter tags (starting with a \#) that will allow
you to search very easily and quickly all the patients/runs/sets sharing
this tag.
By default when you enter a \# in this field, some tags appear and the
suggestions are updated while you enter other characters.
Note that a tag cannot contain any space.
Also note that you can create other tags just by entering whatever you would
like in the field preceded with a \#. Thus any tag you enter is saved (and
can be suggested later on).
\question{For patient 1 in this last field, enter \texttt{\#diagnosis of
patient with \#B-ALL}. For patient 2, enter \texttt{\#blood sample \#CLL}.
For patient 4, enter \texttt{bone \#marrow \#B-ALL}}
Now the three patients and the run have been created but we have not uploaded
the sequence files yet.
\question{Now go to the \com{runs} page. You should see the run you have just
created. Click on it. Then click on \com{+ add samples}.}
Similarly to the patient/run creation page, we can add as many samples as we
want on this page.
\question{As we need to upload three samples, click twice on the \com{add
other sample} button so that you have three lines to add a sample.}
\question{For sample 1, choose the file corresponding to patient 1 (and
respectively for patient 2 and 3). You can also add extra information, with
tags, as previously.}
Note the \com{common sets} field. This field means that all the samples will
be added to this run (the one you created). If you would like to \textbf{all}
the samples to another patient/run/set you should specify it here.
In our case we want to add each sample to a different patient. Thus we don't
need to modify this field.
\question{Instead we need to modify the last field on each line. Click on
it. A list should appear with the last patients/runs/sets you created.
Either click on the correct patient or type the first letters of her/his
name. Then validate with \com{Enter} or by clicking on the correct entry.}
\question{When you have associated each sample to its corresponding patient
you can upload the samples by clicking on the \com{Submit samples} button.}
Now you are back on the page of the run where you should see the three samples
that are being uploaded.
\question{When the upload is finished you launch the analysis by selecting the
configuration in the drop down at the right (\com{multi+inc+xxx}) and then
clicking on the gearwheel.}
You can have a coffee, a tea, or something else, while the process is
launched.
\question{To regularly check the status of your job you can click
on the \com{reload} button at the button left of the page.
Your process usually goes through the following stages: \com{QUEUED},
(possibly \com{STOPPED}), \com{ASSIGNED}, \com{RUNNING}, \com{COMPLETED} (or
\com{FAILED} when there is an issue, in such a case please contact us)}
Then you can view the results as explained before.
Instead we will remain on the server.
\question{Now go back to the \com{patients} page. You can filter the page
using the tags you entered previously.
Enter \texttt{\#B-ALL} in the search box (notice the autocompletion that helps you)
and validate with \com{Enter}.}
\section{Tracking clones on several samples}
\label{sec:tracking}
%Load now some data with several samples.
The \textit{time graph} shows the evolution of the top clones of each sample into all the samples.
Bear in mind that to ensure readability at most 50 curves are displayed in this graph.
\marginpar{When loading data with only one sample, the time graph is replaced by a second bar/grid plot.}
\question{Pass the mouse over the bubbles in the grid or over the lines in the time graph.
Click on some clone. What happens ?}
\question{Click on the label of the time graph to select another sample.
What happens to the number of analyzed reads ? to the size of the top clones
?}
When switching the time point, the views dynamically update which allows to
easily track the changes along time. Also note that the number of analyzed
reads differ from the previous point. We can again analyse the reason why some
reads were unsegmented.
\bigskip
We will look now at how the V gene distribution evolves along the time.
\question{In the grid, select the preset \com{V distribution}. Then click
on the \com{play} icon in the upper left part (below the \textit{i} icon).}
By doing so you can look at how the V distribution changes along the time.
Of course you can also change the data displayed in the grid to look at
the evolution of another information.
\bigskip
We remind that by default at most 50 clones are displayed
on the time graph. However the remaining of the application usually displays
the 50 \textit{most abundant clones} at each sample (which can account to hundreds of
clones, when having several samples).
\question{Select a sample, order the list by size, and pass the mouse through the list
of top 50 clones. What happens in the graph when hovering clones that are not in the top 50 ?}
\bigskip
If you have many samples, you may wish to reorder the samples.
\question{Drag the label of one sample to reorder the samples.}
\question{Drag one label to the box with the pin icon to hide this sample.}
\bigskip
You may also want to compare two samples, either to check a replicate, to check for possible contaminations, or to
compare different research or medical situations.
\question{In the \com{color by} menu, choose \com{by abundance}. Select a different
sample. What happens ? Are there some clones with a significant different concentration in both samples ?
Revert the color by choosing \com{by tag}.}
Another option is to directly plot a log-log curve comparing two samples.
\question{In the \com{plot} menu, choose the preset \com{compare two samples}. Click
successively on two labels in the time graph to select the samples to be compared.
Are there again some clones with a significant different concentration in both samples ?}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment