- 02 Feb, 2016 6 commits
-
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
Previously, we have at many places things like "int *del_DD_left, int *DD_start, int *best_DD, int *DD_end, int *del_DD_right". This was not so clean and error-prone. Now all these parameters are stored into a ‘AlignBox’ object. This will lead to further simplifications of the code, better code maintenance, and allow some extensions.
-
Mathieu Giraud authored
We would like to call that on other places than between the V and the J.
-
- 01 Feb, 2016 6 commits
-
-
Mathieu Giraud authored
We factorize some computations (seq_left, seq_right, seg_N). This is the last commit of the day sponsored by the CERNA.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
This enables in particular the analysis of +Vk/-Vk recombinations. This commit is again sponsored by the CERNA.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
Until now, the FineSegmenter tested both strands, resulting in a code duplication and in unnecessary computations. This improvement is sponsored by the CERNA.
-
Mathieu Giraud authored
-
- 28 Jan, 2016 1 commit
-
-
Mathieu Giraud authored
-
- 26 Jan, 2016 3 commits
-
-
Mathieu Giraud authored
The condition on ratioMin is now strict (the default ratioMin is now 1.9 instead of 2.0), and obscure conditions are removed. A few borderline cases could now pass here (max_found), but they should anyway be discarded by the following e-values tests (and yield the same unsegmentation cause than before). The whole test is now cleaner and more symmetrical.
-
Mathieu Giraud authored
Two conditions were always met.
-
Mathieu Giraud authored
This changes nothing on results.max_found and on the segmentation, but it ensures that first_pos_max and last_pos_mas are always between 0 and the length of affectations. This is more symmetrical.
-
- 21 Jan, 2016 1 commit
-
-
Mathieu Giraud authored
As detected by @mikael-s in the parent commit, the region between V and J affectations was not taken into account in the p-values, yielding erroneous segmentations when this region was very large. Now this region is counted *both* for the computation of left and right p-values, solving the bug of the parent commit. This could be sometimes over-conservative : are we counting things twice ? In regular situations, the answer is no, as the p-values are eventually computed by getProbabilityAtLeastOrAbove in kmerstore.h, that takes into account the length of the seed. A more exact option could have been to use something like (first_pos_max + last_pos_max / 2) + getS()/2, but it would raise symmetry problems. The selected option should anyway improve the estimation in most of the cases.
-
- 22 Dec, 2015 2 commits
-
-
Mathieu Giraud authored
This was not used since at least one year.
-
Mathieu Giraud authored
There was some duplicate code, not tested nor documented, to generate 'code_short'. This code is now removed. The 'code_short' value was only used in the json output, it will be now directly computed by the web application.
-
- 18 Dec, 2015 4 commits
-
-
Mathieu Giraud authored
For the MAX_12 pseudo-germline, the FineSegmenter now calls override_rep5_rep3_from_labels, and then continue by the regular way. Note that IKmerStore:getLabel() returns *one* Fasta file, even when several files were used for the same KmerAffect, such as in TRD+ or IGK+. In these cases, the FineSegmenter will probably fail when the bad Fasta file is returned.
-
Mathieu Giraud authored
This function sets the rep5/3 according to two KmerAffects. It will be quite useful for some pseudo-germlines. This should not be used for regular germlines that have and use some rep5/3.
-
Mathieu Giraud authored
Labels were introduced in df19d79c, for -c germlines, and were later used for -2. Even if they were previously strings, they always designated some file.
-
Mathieu Giraud authored
These names are the names of the underlying file(s). Moreover, we define some constants.
-
- 17 Dec, 2015 1 commit
-
-
Mathieu Giraud authored
-
- 12 Dec, 2015 7 commits
-
-
Mathieu Giraud authored
core/affectanalyser.cpp: better estimation of .max_found in .getMaximum(), more reads as UNSEG_AMBIGUOUS Since the new flexible heuristic, introduced more than one year ago (de008a24, version 2014-07), the idea behind the check of the segmentation point was: "Do we have enough affectations in good positions ('before' at the left and 'after' at the right) ? We tolerate some of them in bad positions, but there must be 'ratioMin' more in good positions." However, the actual implementation of this idea was rather partial. As a result, the 'ambiguous' sequence added in the previous commit was falsely segmented. The new code is more symmetrical, .max_found being set to true when there are both: - more V at the left than V at the right, and than J at the left - more J at the right than J at the left, and than V at the right "More" is defined by ratioMin and currently equals to 2.0. A few reads that were previously segmented will now appear as UNSEG_AMBIGUOUS.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
core/segment.h, core/windowExtractor.cpp, vidjil.cpp: option '-uu', do not create files for segmented sequences We start the file creation at STATS_FIRST_UNSEG.
-
Mathieu Giraud authored
vidjil.cpp, core/windowExtractor.{h,cpp}: new option '-uu', split reads according to their unsegmentation cause
-
Mathieu Giraud authored
-
Mathieu Giraud authored
There are two places where the segmentation can fail with UNSEG_ONLY_V/J. The first one, when there is no segmentation point, previously returned UNSEG_ONLY_V/J even when there was only one (possibly noisy) V/J k-mer. This is now corrected, UNSEG_ONLY_V/J is triggered only when one has at least DETECT_THRESHOLD k-mers (now 5). Ideally, we should use here an e-value check, but the segmentation point returned by kaa->getMaximum() is not really meaningfull in these cases and my lead to false statistics computations.
-
Mathieu Giraud authored
The .fastq parser in OnlineFasta() is rigourous. When the file was truncated somewhere, there can be a non-valid sequence at the end of the file. Previously Vidjil was halting in this case, and this could be quite frustating when a large number of sequences were already processed. Now we just warn the user, stop the analysis at this point, and properly output the clones. Note that this does not affect the initial scan done on at most SAMPLE_APPROX_NB_SEQUENCES sequences (currently 200): Vidjil will still halt on any error in these first sequences.
-
- 09 Nov, 2015 3 commits
-
-
Mathieu Giraud authored
Starting from the full read, we can not limit the DP computations to a k-band around the diagonal without first knowing where is exactly the junction. Nevertheless, we can avoid computing about one half of the DP matrix, as the end of the V / the start of the J (minus some deletions) must be matched. The BOTTOM_TRIANGLE_SHIFT is now set to 20, and this should be large enough to handle V/J deletions until ~30 bp (see comment in segment.h). (The current tests were even passing with BOTTOM_TRIANGLE_SHIFT set to 10.) Now the FineSegmenter (as launched by 'make shouldvdj_with_rc_merged') is about 35% faster.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
- 17 Oct, 2015 6 commits
-
-
Mathieu Giraud authored
Unit testing was a good idea: it discovered a bug that is now corrected. When using 'only_nth_sequence', we must call hasNextData().
-
Mathieu Giraud authored
core/windowExtractor.{h,cpp}, vidjil.cpp: do not handle anymore -x/-X options inside WindowExtractor
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-
Mathieu Giraud authored
-