Commit f177f1db authored by Mathieu Giraud's avatar Mathieu Giraud
Browse files

doc/format-analysis.org: corrections, details

parent 569a0140
......@@ -37,9 +37,10 @@ Software computing clones may choose some relevant identifiers:
This is an almost minimal =.vidjil= file, describing clones in one sample.
The =seg= element is optional: clones without =seg= elements will be shown on the grid with '?/?'.
All other elemnts are required. The =reads.germlines= list can have only one element the case of data on a unique locus.
There is here one clone with a segmentation =TRGV5*01 5/CC/0 TRGJ1*02=.
Note that other elements could be added by some program (such as =tag= or =clusters=).
All other elements are required. The =reads.germlines= list can have only one element the case of data on a unique locus.
There is here one clone on the =TRG= locus with a designation =TRGV5*01 5/CC/0 TRGJ1*02=.
Note that other elements could be added by some program (such as =tag=, to identify some clones,
or =clusters=, to further cluster some clones, see below).
#+BEGIN_SRC js :tangle analysis-example1.vidjil
{
......@@ -80,7 +81,7 @@ Note that other elements could be added by some program (such as =tag= or =clust
}
#+END_SRC
** =.vidjil= file -- several samples
** =.vidjil= file -- several related samples
This a =.vidjil= file obtained by merging with =fuse.py= two =.vidjil= files corresponding to two samples.
Clones that have a same =id= are gathered (see 'What is a clone?', above).
......@@ -147,8 +148,8 @@ do a correct gathering.
** =.analysis= file
This file reflects what an user could have done with the browser (or with some other tool).
She has manually set sample names (=names=), tagged (=tag=, =tags=) and clustered (=clusters=)
This file reflects the annotations a user could have done within the Vidjil browser or some other tool.
She has manually set sample names (=names=), tagged (=tag=, =tags=), named (=name=) and clustered (=clusters=)
some clones, and added external data (=data=).
#+BEGIN_SRC js :tangle analysis-example2.analysis
......@@ -206,40 +207,43 @@ considered. In that case we should first consider the second point (whose =name=
is /fu1)/ and the point to be considered in second should be the first one in
the file (whose =name= is /diag/).
As exemplified in the =clusters= field, this proceeds to the clustering of
clones defined in the =.vidjil= file (here /clone2/ and /clone3/ are defined in the
vidjil file in previous section). If clones do not exist, the clusters are
The =clusters= field indicate clones (by their =id=) that have been further clustered.
Usually, these clones were defined in a related =.vidjil= file (as /clone2/ and /clone3/,
see the =.vidjil= file in the previous section). If these clones do not exist, the clusters are
just ignored. The first item of the cluster is considered as the
representative clone of the cluster.
* The different elements
* Detailed specification
** Generic information for traceability [required]
#+BEGIN_SRC js
"producer": "", // arbitrary string, user/software/options producing this file [required]
"timestamp": "", // last modification date [required]
"vidjil_json_version": "2014.10", // version of the format [required]
"producer": "my-repseq-software -z -k (v. 123)", // arbitrary string, user/software/version/options producing this file [required]
"timestamp": "2014-10-01 12:00:11", // last modification date [required]
"vidjil_json_version": "2016a", // version of the format [required]
#+END_SRC
** 'reads' element [.vidjil only, required]
** Statistics: the =reads= element [.vidjil only, required]
The number of analyzed reads (=segmented=) may be higher than the sum of the read number of all clones,
when one choose to report only the 'top' clones (=-t= option for fuse).
#+BEGIN_SRC js
{
"total" : // total number of reads per sample (with samples.number elements)
"segmented" : // number of segmented reads per sample (with samples.number elements)
"germline" : { // number of segmented reads per sample/germline (with samples.number elements)
"TRG" :
"IGH" :
"total" : [], // total number of reads per sample (with samples.number elements)
"segmented" : [], // number of analyzed/segmented reads per sample (with samples.number elements)
"germline" : { // number of analyzed/segmented reads per sample/germline (with samples.number elements)
"TRG" : [],
"IGH" : []
}
}
#+END_SRC js
#+END_SRC
** 'Samples' element [required]
** =samples= element [required]
#+BEGIN_SRC js
{
......@@ -253,22 +257,21 @@ representative clone of the cluster.
"order": [], // custom sample order (lexicographic order by default) [optional]
// traceability on each sample (with sample.number elements)
"producer": [],
"timestamp": [],
"log": [],
"log": []
}
#+END_SRC
** 'Clones' list
Each element in the 'clones' list describes properties of a clone.
** =clones= list, with read count, tags, V(D)J designation and other sequence features
In a .vidjil file, this is the main part, describing all clones.
In the .analysis file, this section is intended to describe some specific clones.
Each element in the =clones= list describes properties of a clone.
In a =.vidjil= file, this is the main part, describing all clones.
In the =.analysis= file, this section is intended to describe some specific clones.
#+BEGIN_SRC js
......@@ -293,46 +296,36 @@ In the .analysis file, this section is intended to describe some specific clones
// this will create a normalization option in the
// settings browser menu
"seg": // segmentation information [optional]
"seg": // detailed V(D)J desigination/segmentation and other sequences features [optional]
// in the browser clones, that are not segmented will be shown on the grid with '?/?'
// positions are related to the 'sequence'
// names of V/D/J genes should match the ones in files referenced in germline/germline.data
// Positions must start at 1.
// Positions on the sequence start at 1.
{
"5": {"name": "IGHV5*01",
"start": 0,
"stop": 0},
"4": {"name": "IGHD1*01",
"start": 0,
"stop": 0},
"3": {"name": "IGHJ3*02",
"start": 0,
"stop": 0},
"5": {"name": "IGHV5*01", "start": 1, "stop": 120}, // V (or 5') segment
"4": {"name": "IGHD1*01", "start": 124, "stop": 135}, // D (or middle) segment
"3": {"name": "IGHJ3*02", "start": 136, "stop": 171}, // J (or 3') segment
// any feature to be highligthed in the sequenc
// any feature to be highligthed in the sequence
// the optional "seq" element gives a sequence that corresponds to this feature
// CDR3 should be stored that way (in a field called "cdr3"), this is similar
// for the other region of interest.
// The junction is also stored in that way (in a "junction" field),
// JUNCTION//CDR3 should be stored that way (in fields called "junction" of "cdr3"),
// its productivity must be stored in a boolean field called "productive".
// Positions must also start at 1.
"somefeature": { "start": 1, "stop": 100, "seq": "" }
// Positions also start at 1.
"somefeature": { "start": 56, "stop": 61, "seq": "ACTGTA" }
}
"reads": [], // number of reads in this clones [.vidjil only, required]
// (with samples.number elements)
"top": 0, // required so that the browser displays the clone
"top": 0, // (not documented now) [required] threshold to display/hide the clone
"stats": [] // (not documented now) [.vidjil only] (with sample.number elements)
}
#+END_SRC
** 'Germlines' list [optional][work in progress, to be documented]
** =germlines= list [optional][work in progress, to be documented]
extend the =germline.data= default file with a custom germline
......@@ -347,24 +340,24 @@ extend the =germline.data= default file with a custom germline
}
#+END_SRC
** 'Clusters' list [optional]
** Further clustering of clones: the =clusters= list [optional]
Each element in the 'clusters' list describe a list of clones that are 'merged'.
In the browser, it will be still possible to see them or to unmerge them.
The first clone of each line is used as a representative for the cluster.
** 'Data' list [optional][work in progress, to be documented]
** =data= list [optional][work in progress, to be documented]
Each element in the 'data' list is a list of values (of size samples.number)
Each element in the =data= list is a list of values (of size samples.number)
showing additional data for each sample, as for example qPCR levels or spike information.
In the browser, it will be possible to display these data and to normalize
against them (not implemented now).
** 'Tags' list [optional]
** Tagging some clones: =tags= list [optional]
The 'tags' list describe the custom tag names as well as tags that should be hidden by default.
The =tags= list describe the custom tag names as well as tags that should be hidden by default.
The default tag names are defined in [[../browser/js/vidjil-style.js]].
#+BEGIN_SRC js
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment