algo.org 19.4 KB
Newer Older
1
#+TITLE: Vidjil Algorithm -- Command-line Manual
2 3
#+AUTHOR: The Vidjil team (Mathieu, Mikaël and Marc)

4 5 6
# Vidjil -- High-throughput Analysis of V(D)J Immune Repertoire -- [[http://www.vidjil.org]]
# Copyright (C) 2011, 2012, 2013, 2014, 2015 by Bonsai bioinformatics 
# at CRIStAL (UMR CNRS 9189, Université Lille) and Inria Lille
Marc Duez's avatar
merge  
Marc Duez committed
7
# contact@vidjil.org
Mikaël Salson's avatar
Mikaël Salson committed
8

Mikaël Salson's avatar
Mikaël Salson committed
9 10 11 12
V(D)J recombinations in lymphocytes are essential for immunological
diversity. They are also useful markers of pathologies, and in
leukemia, are used to quantify the minimal residual disease during
patient follow-up.
Mikaël Salson's avatar
Mikaël Salson committed
13

14
Vidjil processes high-throughput sequencing data to extract V(D)J
Mathieu Giraud's avatar
Mathieu Giraud committed
15
junctions and gather them into clones. Vidjil starts 
Mikaël Salson's avatar
Mikaël Salson committed
16 17
from a set of reads and detects "windows" overlapping the actual CDR3.
This is based on an fast and reliable seed-based heuristic and allows
18
to output all sequenced clones. The analysis is extremely fast
Mathieu Giraud's avatar
Mathieu Giraud committed
19
because, in the first phase, no alignment is performed with database
20 21 22
germline sequences. At the end, only the representative sequences 
of each clone have to be analyzed. Vidjil can also cluster similar
clones, or leave this to the user after a manual review in the browser.
Mikaël Salson's avatar
Mikaël Salson committed
23 24

The method is described in the following paper:
Mikaël Salson's avatar
Mikaël Salson committed
25

Mikaël Salson's avatar
Mikaël Salson committed
26 27
Mathieu Giraud, Mikaël Salson, et al.,
"Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing",
28 29
BMC Genomics 2014, 15:409
http://dx.doi.org/10.1186/1471-2164-15-409
Mikaël Salson's avatar
Mikaël Salson committed
30 31 32

Vidjil is open-source, released under GNU GPLv3 license.

33
* Supported platforms
Mikaël Salson's avatar
Mikaël Salson committed
34 35 36 37 38 39 40 41 42 43 44

Vidjil has been successfully tested on the following platforms :
 - CentOS 6.3 amd64
 - CentOS 6.3 i386
 - Debian Squeeze 
 - Fedora 17
 - FreeBSD 9.1 amd64
 - NetBSD 6.0.1 amd64
 - Ubuntu 12.04 amd64
 - Ubuntu 12.04 i386

45 46
Vidjil is developed with continuous integration using systematic unit and functional testing
The results of these automated tests can be checked on [[https://travis-ci.org/vidjil/vidjil][travis-ci.org]].
Mikaël Salson's avatar
Mikaël Salson committed
47

48 49 50 51 52 53 54 55
* Requirements
  
  To install and use Vidjil on a computer, make sure:
  - to be on a POSIX system ;
  - to have a C++ compiler ;
  - to have the =zlib= installed (=zlib1g-dev= package under Debian/Ubuntu,
    =zlib-devel= package under Fedora/CentOS).

56
* Installation
Mikaël Salson's avatar
Mikaël Salson committed
57

Mikaël Salson's avatar
Mikaël Salson committed
58
#+BEGIN_SRC sh
Mikaël Salson's avatar
Mikaël Salson committed
59 60 61 62 63 64 65
make data
   # get some IGH rearrangements from a single individual, as described in:
   # Boyd, S. D., and al. Individual variation in the germline Ig gene
   # repertoire inferred from variable region gene rearrangements. J
   # Immunol, 184(12), 6986–92.

make germline
Mathieu Giraud's avatar
Mathieu Giraud committed
66
   # get IMGT germline databases (IMGT/GENE-DB) -- you have to agree to IMGT license: 
Mikaël Salson's avatar
Mikaël Salson committed
67 68 69 70 71 72 73 74 75 76
   # academic research only, provided that it is referred to IMGT®,
   # and cited as "IMGT®, the international ImMunoGeneTics information system® 
   # http://www.imgt.org (founder and director: Marie-Paule Lefranc, Montpellier, France). 
   # Lefranc, M.-P., IMGT®, the international ImMunoGeneTics database,
   # Nucl. Acids Res., 29, 207-209 (2001). PMID: 11125093

make                     # compile Vijil
make test                # run self-tests

./vidjil -h              # display help/usage
Mikaël Salson's avatar
Mikaël Salson committed
77
#+END_SRC
Mikaël Salson's avatar
Mikaël Salson committed
78 79


80 81 82
* Input and output files

The main input file of Vidjil is a /set of reads/, given as a =.fasta=
83 84
or =.fastq= file, possibly compressed with gzip (=.gz=).
This set of reads can reach several gigabytes. It is
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
never loaded entirely in the memory, but reads are processed one by
one by the Vidjil algorithm.

The main output of Vidjil (with the default =-c clones= command) are two following files:

 - The =.vidjil= file is /the file for the Vidjil browser/. 
   The file is in a =.json= format (detailed in [[file:format-analysis.org][format-analysis.org]])
   describing the windows and their count, the representatives (=-y=),
   the detailed segmentation (=-z=, see warning below), and possibly
   the results of the further clustering.

   The browser takes this =.vidjil= file (possibly merged with
   =fuse.py=) for the /visualization and analysis/ of clones and their
   tracking along different samples (for example time points in a MRD
   setup or in a immunological study).  
   Please see [[file:browser.org][browser]].org for more information on the browser.

 - The =.vdj.fa= file is /a FASTA file for further processing by other bioinformatics tools/.
   The sequences are at least the windows (and their count in the headers) or
   the representatives (=-y=) when they have been computed.
   The headers include the count of each window, and further includes the 
   detailed segmentation (=-z=, see warning below), given in a '.vdj' format, see below.
   The further clustering is not output in this file.

   The =.vdj.fa= output enable to use Vidjil as a /filtering tool/,
   shrinking a large read set into a manageable number of (pre-)clones
   that will be deeply analyzed and possibly further clustered by
   other software.


The default options are very conservative (large window, no further
automatic clusterization, see below), leaving the user or other
software making detailed analysis and decisions on the final
clustering.

By default, the two output files are named =out/basename.vidjil= in =out/basename.vdj.fa=, where:
 - =out= is the directory where all the outputs are stored, including auxiliary output files (can be changed with the =-o= option)
 - =basename= is the basename of the input =.fasta/.fastq= file (can be overriden with the =-b= option)


125
* Vidjil parameters
Mikaël Salson's avatar
Mikaël Salson committed
126

Mikaël Salson's avatar
Mikaël Salson committed
127
Launching vidjil with =-h= option provides the list of parameters that can be
128
used. We detail here the options of the main =-c clones= command.
Mikaël Salson's avatar
Mikaël Salson committed
129

130 131 132 133 134 135 136 137
** Germline selection

#+BEGIN_EXAMPLE
Germline databases (one -V/(-D)/-J, or -G, or -g option must be given for all commands except -c germlines)
  -V <file>     V germline multi-fasta file
  -D <file>     D germline multi-fasta file (and resets -m, -M and -w options), will segment into V(D)J components
  -J <file>     J germline multi-fasta file
  -G <prefix>   prefix for V (D) and J repertoires (shortcut for -V <prefix>V.fa -D <prefix>D.fa -J <prefix>J.fa) (basename gives germline code)
138 139
  -g <path>     multiple germlines (in the path <path>, takes TRA, TRB, TRG, TRD, IGH, IGK and IGL and sets window prediction parameters)
  -i            multiple germlines, also incomplete rearrangements (must be used with -g)
140 141 142 143
#+END_EXAMPLE

 - Options such as =-G germline/IGH= or =-G germline/TRG= select one germline system.
 - The =-V/(-D)/-J= options enable to select individual V, (D) and J repertoires (fasta files).
Mathieu Giraud's avatar
Mathieu Giraud committed
144
   This allows in particular to select incomplete rearrangement using custom V or J repertoires with added sequences.
145 146
 - The =-g germline/= option launches the analysis on the seven germlines, selecting the best locus for each read.
   Using =-g germline/ -i= stests also some incomplete and unusual recombinations.
147
   See [[http://git.vidjil.org/blob/master/doc/locus.org][locus.org]] for information on the analyzable locus.
148 149 150
   Now the seed and window parameters are hard-coded for each germline. In a future release, the mechanism will be more flexible
   and will parse the =germline/germlines.data= file.

Mathieu Giraud's avatar
Mathieu Giraud committed
151 152 153 154 155 156 157 158 159
** Main algorithm parameters

#+BEGIN_EXAMPLE
Window prediction
  (use either -s or -k option, but not both)
  -s <string>   spaced seed used for the V/J affectation
                (default: #####-#####, ######-######, #######-#######, depends on germline)
  -k <int>      k-mer size used for the V/J affectation (default: 10, 12, 13, depends on germline)
                (using -k option is equivalent to set with -s a contiguous seed with only '#' characters)
160 161
  -m <int>      minimal admissible delta between last V and first J k-mer (default: -10) (default with -D: 0)
  -M <int>      maximal admissible delta between last V and first J k-mer (default: 20) (default with -D: 80)
162
  -w <int>      w-mer size used for the length of the extracted window (default: 50)
163
  -e <float>    maximal e-value for determining if a segmentation can be trusted (default: 'all', no limit)
164
  -t <int>      trim V and J genes (resp. 5' and 3' regions) to keep at most <int> nt (default: 0) (0: no trim)
Mathieu Giraud's avatar
Mathieu Giraud committed
165 166
#+END_EXAMPLE

167
The =-s=, =-k=, =-m= and =-M= options are the options of the seed-based heuristic. A detailed
168
explanation can be found in the paper. These options are for advanced usage, the defaults values should work.
Mathieu Giraud's avatar
Mathieu Giraud committed
169 170

The =-w= option fixes the size of the "window" that is the main
171 172
identifier to gather clones. The default value (=-w 50=) was selected
to ensure a high-quality clone gathering. The
Mathieu Giraud's avatar
Mathieu Giraud committed
173
high-throughput heuristic predicts the center of the "window" that may
Mikaël Salson's avatar
Mikaël Salson committed
174
be shifted by a few bases from the actual "center" of the CDR3 (for TRG,
Mathieu Giraud's avatar
Mathieu Giraud committed
175 176 177
less than 15 bases compared to the IMGT/V-QUEST or IgBlast prediction
in >99% of cases). The extracted window should be large enough to
fully contain the CDR3 as well as some part of the end of the V and
178
the start of the J, or at least some specific N region, to uniquely identify a clone.
Mathieu Giraud's avatar
Mathieu Giraud committed
179

180 181 182 183
Setting =-w= to lower values may "segment" (analyze) a few more reads, depending
on the read length of your data, but may in some rare cases falsely cluster reads from
different clones. The =-w 40= option is usually safe, and =-w 30= can also be tested.
Setting =-w= to lower values is not recommended.
Mathieu Giraud's avatar
Mathieu Giraud committed
184

185
The =-e= option sets the maximal e-value accepted for segmenting a sequence.
186
It is an upper bound on the number of exepcted windows found by chance by the seed-based heuristic.
187 188
The e-value computation takes into account both the number of reads in the
input sequence and the number of locus searched for.
189 190 191
The default value is 1.0, but values such as 1000, 1e-3 or even less can be used
to have a more or less permissive segmentation.
The threshold can be disabled with =-e all=.
192

Mikaël Salson's avatar
Mikaël Salson committed
193
The =-t= option sets the maximal number of nucleotides that will be indexed in
194 195 196 197 198
V genes (the 3' end) or in J genes (the 5' end). This reduces the load of the
indexes, giving more precise window estimation and e-value computation.
This option is currently not set, it will be set by default in a next release.
Using =-t 100= is generally safe.

199 200 201 202 203
** Threshold on clone output

The following options control how many clones are output and analyzed.

#+BEGIN_EXAMPLE
204
Limits to report a clone (or a window)
205
  -r <nb>       minimal number of reads supporting a clone (default: 10)
206 207
  -% <ratio>    minimal percentage of reads supporting a clone (default: 0)

208
Limits to further analyze some clones
209 210 211
  -y <nb>       maximal number of clones computed with a representative ('all': no limit) (default: 100)
  -z <nb>       maximal number of clones to be segmented ('all': no limit, do not use) (default: 20)
  -A            reports and segments all clones (-r 1 -% 0 -y all -z all), to be used only on very small datasets
212 213
#+END_EXAMPLE

214
The =-r/-%= options are strong thresholds: if a clone does not have
215 216
the requested number of reads, the clone is discarded (except when
using =-l=, see below).
217
The default =-r 10= option is meant to only output clones that
218
have a significant read support. *You should use* =-r 1= *if you
219
want to detect all clones starting from the first read* (especially for
220 221
MRD detection).

222 223
The =-y= option limits the number of clones for which a representative
sequence is computed. Usually you do not need to have more
224
representatives (see below), but you can safely put =-y all= if you want
225 226
to compute all representative sequences.

227 228 229
The =-z= option limits the number of clones that are fully analyzed,
/with their V(D)J segmentation/, in particular to enable the browser
to display the clones on the grid (otherwise they are displayed on the
230
'?/?' axis). It should be smaller than =-y=.
Mikaël Salson's avatar
Mikaël Salson committed
231 232 233
If you want to analyze more clones, you should use =-z 50= or
=-z 100=.  It is not recommended to use larger values: outputting more
than 100 clones is often not useful since they can't be visualized easily
234 235
in the browser, and takes large computation time (full dynamic programming, 
see below).
236

237
Note that even if a clone is not in the top 20 (or 50, or 100) but
238 239
still passes the =-r=, =-%= options, it is still reported in both the =.vidjil=
and =.vdj.fa= files. If the clone is at some MRD point in the top 20 (or 50, or 100),
Mikaël Salson's avatar
Mikaël Salson committed
240
it will be fully analyzed/segmented by this other point (and then
241 242 243 244
collected by the =fuse.py= script, using representatives computed at this
other point, and then, on the browser, correctly displayed on the grid). 
*Thus is advised to leave the default* =-y 100 -z 20= *options 
for the majority of uses.*
245 246 247 248 249

The =-A= option disables all these thresholds. This option should be
used only for test and debug purposes, on very small datasets, and
produce large file and takes huge computation times.

Mikaël Salson's avatar
Mikaël Salson committed
250

251 252 253
** Labeled windows

Vidjil allows to indicate that specific windows that must be followed
254
(even if those windows are 'rare', below the =-r/-%= thresholds).
255 256 257

Such windows can be provided either with =-W <window>=, or with =-l <file>=.
The file given by =-l= should have one window by line, as in the following example:
Mikaël Salson's avatar
Mikaël Salson committed
258

259 260 261 262 263 264
#+BEGIN_EXAMPLE
TGTGCGAGAGATGGACGGGATACGTAAAACGACATATGGTTCGGGGTTTGGTGCTTTTGA my-clone-1
TGTGCGAGAGATGGACGGAATACGTTAAACGACATATGGTTCGGGGTATGGTGCTTTTGA my-clone-2 foo
#+END_EXAMPLE

Windows and labels must be separed by one space.
Mikaël Salson's avatar
Mikaël Salson committed
265 266 267
The first column of the file is the window to be followed
while the remaining columns consist of the window's label.
In Vidjil output, the labels are output alongside their windows.
Mikaël Salson's avatar
Mikaël Salson committed
268

269 270 271 272
With the =-F= option, /only/ the labeld windows are kept. This allows
to quickly filter a set of reads, looking for a known window,
with the =-FaW <window>= options:
All the reads with this windows will be extracted to =out/seq/clone.fa-1=.
273

274
** Further clustering (experimental)
275

276 277 278
These options have no consequences on the =.vdj.fa= file, but adds
additional information in the =.vidjil= file to be visualized in the
browser.
279 280 281

Setting the =-n= option triggers an additional automatic
clustering using DBSCAN algorithm (Ester and al., 1996). 
Mikaël Salson's avatar
Mikaël Salson committed
282

283
The =-E= option allows to specify a file for manually clustering two windows
Mikaël Salson's avatar
Mikaël Salson committed
284 285 286
considered as similar. Such a file may be automatically produced by vidjil
(out/edges), depending on the option provided. Only the two first columns 
(separed by one space) are important to vidjil, they only consist of the 
Mikaël Salson's avatar
Mikaël Salson committed
287
two windows that must be clustered.
Mikaël Salson's avatar
Mikaël Salson committed
288 289


290 291


292
* Examples of use
Mikaël Salson's avatar
Mikaël Salson committed
293 294

All the following examples are on a IGH VDJ recombinations : they thus
295
require either the =-G germline/IGH= option, or the multi-germline =-g germline= option.
Mikaël Salson's avatar
Mikaël Salson committed
296

Mikaël Salson's avatar
Mikaël Salson committed
297
#+BEGIN_SRC sh
298
./vidjil -G germline/IGH data/Stanford_S22.fasta
299 300
   # Detects windows overlapping IGH CDR3s and gather the reads into clones
   # Summary of clones is available both in out/Stanford_S22.vdj.fa
301
   # and in out/Stanford_S22.vidjil.
Mikaël Salson's avatar
Mikaël Salson committed
302
#+END_SRC
Mikaël Salson's avatar
Mikaël Salson committed
303

Mikaël Salson's avatar
Mikaël Salson committed
304
#+BEGIN_EXAMPLE
Mikaël Salson's avatar
Mikaël Salson committed
305 306 307 308
>8--window--1 
CACCTATTACTGTACCCGGGAGGAACAATATAGCAGCTGGTACTTTGACTTCTGGGGCCA
>5--window--2 
CTATGATAGTAGTGGTTATTACGGGGTAGGGCAGTACTACTACTACTACATGGACGTCTG
Mikaël Salson's avatar
Mikaël Salson committed
309
(...)
Mikaël Salson's avatar
Mikaël Salson committed
310
#+END_EXAMPLE
Mikaël Salson's avatar
Mikaël Salson committed
311

Mikaël Salson's avatar
Mikaël Salson committed
312
   Windows of size 60 (modifiable by =-w=) have been extracted.
Mikaël Salson's avatar
Mikaël Salson committed
313
   The first window has 8 occurrences, the second window has 5 occurrences.
Mikaël Salson's avatar
Mikaël Salson committed
314

315 316 317 318 319 320 321 322

#+BEGIN_SRC sh
./vidjil -g germline -i data/reads.fasta
   # Detects for each read the best locus
   # Detects windows overlapping CDR3s and gather the reads into clones
#+END_SRC


Mikaël Salson's avatar
Mikaël Salson committed
323
#+BEGIN_SRC sh
324
./vidjil -c clones -G germline/IGH -r 1 ./data/clones_simul.fa
Mikaël Salson's avatar
Mikaël Salson committed
325
   # Extracts the windows (-r 1, with at least 1 read each), 
326 327
   # then gather them into clones 
   # A more natural option could be -r 5.
Mathieu Giraud's avatar
Mathieu Giraud committed
328
   # For debug purpose, if one wants all the clones, use the option -A.
329 330
   # Results are both
   #  - on the standard output
331 332 333 334
   #  - in out/clones_simul.vdj.fa (fasta file to be processed by other tools)
   #  - in out/clones_simul.vidjil (for the browser)
   # Additional files are in out/clones_simul.windows.fa and out/seq/clone.fa-*
   # If one adds the '-U' option, an additonal out/clones_simul.segmented.vdj.fa file is produced,
335
   # listing segmented reads using the .vdj format (see below)
Mikaël Salson's avatar
Mikaël Salson committed
336
#+END_SRC
Mikaël Salson's avatar
Mikaël Salson committed
337

Mikaël Salson's avatar
Mikaël Salson committed
338
#+BEGIN_SRC sh
339
./vidjil -c clones -G germline/IGH -r 1 -n 5 ./data/clones_simul.fa
Mikaël Salson's avatar
Mikaël Salson committed
340
   # Window extraction + clone gathering,
341
   # with automatic clustering, distance five (-n 5)
Mikaël Salson's avatar
Mikaël Salson committed
342
#+END_SRC
Mikaël Salson's avatar
Mikaël Salson committed
343

Mikaël Salson's avatar
Mikaël Salson committed
344
#+BEGIN_SRC sh
345
./vidjil -c segment -G germline/IGH data/segment_S22.fa
Mikaël Salson's avatar
Mikaël Salson committed
346 347 348
   # Segment the reads onto VDJ germline 
   # (this is slow and should only be used for testing)
#+END_SRC
Mikaël Salson's avatar
Mikaël Salson committed
349

350 351
#+BEGIN_SRC sh
./vidjil -c germlines file.fastq
352
   # Output statistics on the number of occurrences of k-mers of the different germlines
353
#+END_SRC
Mikaël Salson's avatar
Mikaël Salson committed
354

355
* Segmentation and .vdj format
356

357
Vidjil output includes segmentation of V(D)J recombinations. This happens
358 359
in the following situations:

360
- in a first pass, when requested with =-U= option, in a =.segmented.vdj.fa= file.
361 362 363 364 365 366 367

      The goal of this ultra-fast segmentation, based on a seed
      heuristics, is only to locate the w-window overlapping the
      CDR3. This should not be taken as a real V(D)J segmentation, as
      the center of the window may be shifted up to 15 bases from the
      actual center.

368 369 370
- in a second pass, on the standard output and in both =.vidjil= and =.vdj.fa= files
        - at the end of the clones detection (default command =-c clones=)
        - or directly when explicitly requiring segmentation (=-c segment=)
371 372

      This segmentation obtained by full comparison (dynamic
373
      programming) with all germline sequences. Such segmentation are
374
      not at the core of the Vidjil clone gathering method (which
375 376 377
      relies only on the 'window', see above). They are slow to compute
      and are provided only for convenience.
      They should be checked with other softwares such
378
      as IgBlast, iHHMune-align or IMGT/V-QUEST.
Mikaël Salson's avatar
Mikaël Salson committed
379 380

Segmentations of V(D)J recombinations are displayed using a dedicated
381
.vdj format. This format is compatible with FASTA format. A line starting
Mikaël Salson's avatar
Mikaël Salson committed
382 383
with a > is of the following form:

Mikaël Salson's avatar
Mikaël Salson committed
384
#+BEGIN_EXAMPLE
385
>name + VDJ  startV endV   startD endD   startJ  endJ   Vgene   delV/N1/delD5'   Dgene   delD3'/N2/delJ   Jgene   comments
Mikaël Salson's avatar
Mikaël Salson committed
386

387
        name          sequence name (include the number of occurrences in the read set and possibly other information)
Mikaël Salson's avatar
Mikaël Salson committed
388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410
        +             strand on which the sequence is mapped
        VDJ           type of segmentation (can be "VJ", "VDJ", 
    	              or shorter tags such as "V" for incomplete sequences).	
		      The following line are for "VDJ" recombinations :

        startV endV   start and end position of the V gene in the sequence (start at 0)
        startD endD                      ... of the D gene ...
        startJ endJ                      ... of the J gene ...

        Vgene         name of the V gene 

        delV          number of deletions at the end (3') of the V
        N1            nucleotide sequence inserted between the V and the D
        delD5'        number of deletions at the start (5') of the D

        Dgene         name of the D gene being rearranged

        delD3'        number of deletions at the end (3') of the D
        N2            nucleotide sequence inserted between the D and the J
        delJ          number of deletions at the start (5') of the J

        Jgene         name of the J gene being rearranged
        
411
        comments      optional comments. In Vidjil, the following comments are now used:
412
                      - "seed" when this comes for the first pass (.segmented.vdj.fa). See the warning above.
413
                      - "!ov x" when there is an overlap of x bases between last V seed and first J seed
414 415
                      - the name of the locus (TRA, TRB, TRG, TRD, IGH, IGL, IGK, possibly followed
                        by a + for incomplete/unusual recombinations)
Mikaël Salson's avatar
Mikaël Salson committed
416

Mikaël Salson's avatar
Mikaël Salson committed
417 418
#+END_EXAMPLE

Mikaël Salson's avatar
Mikaël Salson committed
419 420 421 422 423
Following such a line, the nucleotide sequence may be given, giving in
this case a valid FASTA file.

For VJ recombinations the output is similar, the fields that are not
applicable being removed:
Mikaël Salson's avatar
Mikaël Salson committed
424

425
#+BEGIN_EXAMPLE
426
>name + VJ  startV endV   startJ endJ   Vgene   delV/N1/delJ   Jgene  comments
427
#+END_EXAMPLE