- 05 Feb, 2016 8 commits
-
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
Discussion with @mikael-s and @flothoni. We now run another dynamic programming once the overlap was handled -- only on the best reference sequence -- to check the actual e-value of the D segment.
-
Mathieu Giraud authored
-
Vidjil Team authored
We do not want to detect twice the same D gene. Note that we do not currently forbid alleles of a same gene. Discussion between @flothoni, @mikael-s, and @magiraud.
-
Vidjil Team authored
When a D has already been detected, we do not want to detect anything inside this D. Before this commit, spurious D detections could happen in the EXTEND_D_ZONE. Discussion between @flothoni, @mikael-s, and @magiraud.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
When a D segment has been detected, we now try to detect an additional D between V/D or between D/J, possibly detecting VDDJ (or even some VDDDJ) recombinations. Note that this detection is not optimal. A chaining algorithm would be preferable here. Moreover, statistics should be refined, as now the only filter is done before check_and_remove_overlap.
-
- 04 Feb, 2016 1 commit
-
-
Mathieu Giraud authored
-
- 02 Feb, 2016 8 commits
-
-
Mathieu Giraud authored
-
Mathieu Giraud authored
We need to store sequence_or_rc in the Segmenter.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
Previously, we have at many places things like "int *del_DD_left, int *DD_start, int *best_DD, int *DD_end, int *del_DD_right". This was not so clean and error-prone. Now all these parameters are stored into a ‘AlignBox’ object. This will lead to further simplifications of the code, better code maintenance, and allow some extensions.
-
Mathieu Giraud authored
We would like to call that on other places than between the V and the J.
-
- 01 Feb, 2016 6 commits
-
-
Mathieu Giraud authored
We factorize some computations (seq_left, seq_right, seg_N). This is the last commit of the day sponsored by the CERNA.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
This enables in particular the analysis of +Vk/-Vk recombinations. This commit is again sponsored by the CERNA.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
Until now, the FineSegmenter tested both strands, resulting in a code duplication and in unnecessary computations. This improvement is sponsored by the CERNA.
-
Mathieu Giraud authored
-
- 28 Jan, 2016 1 commit
-
-
Mathieu Giraud authored
-
- 26 Jan, 2016 3 commits
-
-
Mathieu Giraud authored
The condition on ratioMin is now strict (the default ratioMin is now 1.9 instead of 2.0), and obscure conditions are removed. A few borderline cases could now pass here (max_found), but they should anyway be discarded by the following e-values tests (and yield the same unsegmentation cause than before). The whole test is now cleaner and more symmetrical.
-
Mathieu Giraud authored
Two conditions were always met.
-
Mathieu Giraud authored
This changes nothing on results.max_found and on the segmentation, but it ensures that first_pos_max and last_pos_mas are always between 0 and the length of affectations. This is more symmetrical.
-
- 21 Jan, 2016 1 commit
-
-
Mathieu Giraud authored
As detected by @mikael-s in the parent commit, the region between V and J affectations was not taken into account in the p-values, yielding erroneous segmentations when this region was very large. Now this region is counted *both* for the computation of left and right p-values, solving the bug of the parent commit. This could be sometimes over-conservative : are we counting things twice ? In regular situations, the answer is no, as the p-values are eventually computed by getProbabilityAtLeastOrAbove in kmerstore.h, that takes into account the length of the seed. A more exact option could have been to use something like (first_pos_max + last_pos_max / 2) + getS()/2, but it would raise symmetry problems. The selected option should anyway improve the estimation in most of the cases.
-
- 22 Dec, 2015 2 commits
-
-
Mathieu Giraud authored
This was not used since at least one year.
-
Mathieu Giraud authored
There was some duplicate code, not tested nor documented, to generate 'code_short'. This code is now removed. The 'code_short' value was only used in the json output, it will be now directly computed by the web application.
-
- 18 Dec, 2015 4 commits
-
-
Mathieu Giraud authored
For the MAX_12 pseudo-germline, the FineSegmenter now calls override_rep5_rep3_from_labels, and then continue by the regular way. Note that IKmerStore:getLabel() returns *one* Fasta file, even when several files were used for the same KmerAffect, such as in TRD+ or IGK+. In these cases, the FineSegmenter will probably fail when the bad Fasta file is returned.
-
Mathieu Giraud authored
This function sets the rep5/3 according to two KmerAffects. It will be quite useful for some pseudo-germlines. This should not be used for regular germlines that have and use some rep5/3.
-
Mathieu Giraud authored
Labels were introduced in df19d79c, for -c germlines, and were later used for -2. Even if they were previously strings, they always designated some file.
-
Mathieu Giraud authored
These names are the names of the underlying file(s). Moreover, we define some constants.
-
- 17 Dec, 2015 1 commit
-
-
Mathieu Giraud authored
-
- 12 Dec, 2015 5 commits
-
-
Mathieu Giraud authored
core/affectanalyser.cpp: better estimation of .max_found in .getMaximum(), more reads as UNSEG_AMBIGUOUS Since the new flexible heuristic, introduced more than one year ago (de008a24, version 2014-07), the idea behind the check of the segmentation point was: "Do we have enough affectations in good positions ('before' at the left and 'after' at the right) ? We tolerate some of them in bad positions, but there must be 'ratioMin' more in good positions." However, the actual implementation of this idea was rather partial. As a result, the 'ambiguous' sequence added in the previous commit was falsely segmented. The new code is more symmetrical, .max_found being set to true when there are both: - more V at the left than V at the right, and than J at the left - more J at the right than J at the left, and than V at the right "More" is defined by ratioMin and currently equals to 2.0. A few reads that were previously segmented will now appear as UNSEG_AMBIGUOUS.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
core/segment.h, core/windowExtractor.cpp, vidjil.cpp: option '-uu', do not create files for segmented sequences We start the file creation at STATS_FIRST_UNSEG.
-
Mathieu Giraud authored
vidjil.cpp, core/windowExtractor.{h,cpp}: new option '-uu', split reads according to their unsegmentation cause
-
Mathieu Giraud authored
-