Commit 52ba91b7 authored by Mathieu Giraud's avatar Mathieu Giraud

doc/algo.org, vidjil.cpp: more documentation on experimental clustering options

parent 91493ce7
......@@ -933,6 +933,8 @@ int main (int argc, char **argv)
////////////////////////////////////////
if (command == CMD_CLONES || command == CMD_WINDOWS) {
string f_json = out_dir + f_basename + JSON_SUFFIX ;
//////////////////////////////////
//$$ Kmer Segmentation
......@@ -1094,7 +1096,7 @@ int main (int argc, char **argv)
clones_windows = comp.cluster(forced_edges, w, cout, epsilon, minPts) ;
comp.stat_cluster(clones_windows, cout );
comp.del();
cout << " ==> " << clones_windows.size() << " clusters" << endl ;
cout << " ==> " << clones_windows.size() << " clusters (" << f_json << ")" << endl ;
}
else
{
......@@ -1378,7 +1380,6 @@ int main (int argc, char **argv)
} // end if (command == CMD_CLONES)
//$$ .json output: json_data_segment
string f_json = out_dir + f_basename + JSON_SUFFIX ;
cout << " ==> " << f_json << "\t(data file for the browser)" << endl ;
ofstream out_json(f_json.c_str()) ;
......
......@@ -288,16 +288,19 @@ All the reads with this windows will be extracted to =out/seq/clone.fa-1=.
** Further clustering (experimental)
These options have no consequences on the =.vdj.fa= file, but adds
additional information in the =.vidjil= file to be visualized in the
browser.
The following options are experimental and have no consequences on the =.vdj.fa= file,
nor on the standard output. They instead add a =clusters= sections in the =.vidjil= file
that will be visualized in the browser.
Setting the =-n= option triggers an additional automatic
clustering using DBSCAN algorithm (Ester and al., 1996).
The =-n= option triggers an automatic clustering using DBSCAN algorithm (Ester and al., 1996).
Using =-n 5= usually cluster reads within a distance of 1 mismatch (default score
being +1 for a match and -4 for a mismatch). However, more distant reads can also
be clustered when there are more than 10 reads within the distance threshold.
This behaviour can be controlled with the =-N= option.
The =-E= option allows to specify a file for manually clustering two windows
considered as similar. Such a file may be automatically produced by vidjil
(out/edges), depending on the option provided. Only the two first columns
(=out/edges=), depending on the option provided. Only the two first columns
(separed by one space) are important to vidjil, they only consist of the
two windows that must be clustered.
......@@ -354,6 +357,8 @@ CTATGATAGTAGTGGTTATTACGGGGTAGGGCAGTACTACTACTACTACATGGACGTCTG
./vidjil -c clones -G germline/IGH -r 1 -n 5 ./data/clones_simul.fa
# Window extraction + clone gathering,
# with automatic clustering, distance five (-n 5)
# The result of the automatic clustering is in the .vidjil file
# and can been seen/edited in the browser.
#+END_SRC
#+BEGIN_SRC sh
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment