Commit 783d0b19 authored by Mathieu Giraud's avatar Mathieu Giraud

core/affectanalyser.cpp: better estimation of .max_found in .getMaximum(),...

core/affectanalyser.cpp: better estimation of .max_found in .getMaximum(), more reads as UNSEG_AMBIGUOUS

Since the new flexible heuristic, introduced more than one year ago (de008a24, version 2014-07),
the idea behind the check of the segmentation point was:
  "Do we have enough affectations in good positions ('before' at the left and 'after' at the right) ?
   We tolerate some of them in bad positions, but there must be 'ratioMin' more in good positions."

However, the actual implementation of this idea was rather partial.
As a result, the 'ambiguous' sequence added in the previous commit was falsely segmented.

The new code is more symmetrical, .max_found being set to true when there are both:
 - more V at the left than V at the right, and than J at the left
 - more J at the right than J at the left, and than V at the right
"More" is defined by ratioMin and currently equals to 2.0.

A few reads that were previously segmented will now appear as UNSEG_AMBIGUOUS.
parent a29ad45d
......@@ -178,7 +178,10 @@ affect_infos KmerAffectAnalyser::getMaximum(const KmerAffect &before,
2) there should be at least one 'before' and one 'after' (? CHECK ?)
*/
if (results.nb_after_right >= results.nb_before_right*ratioMin
if ((results.nb_after_right >= results.nb_before_right*ratioMin)
&& (results.nb_after_right >= results.nb_after_left*ratioMin)
&& (results.nb_before_left >= results.nb_after_left*ratioMin)
&& (results.nb_before_left >= results.nb_before_right*ratioMin)
&& (results.nb_after_right > 0 || results.nb_before_right == 0)
&& currentValue < results.max_value
&& results.max_value > 0) {
......
......@@ -163,10 +163,12 @@ class KmerAffectAnalyser: public AffectAnalyser {
* maximise the number of affectations before, minus the number of
* affectations after the returned positions.
*
* The maximum reached must be above max(0, total number of
* <before>) and such that the number of <before> after the
* rightmost max position is <ratioMin> times less than the number
* of <after> after that position. If no so much maximum is found,
* The maximum reached must be above max(0, total number of <before>)
* and such that the numbers of <before>/<after> in "good" positions
* (at the left of the leftmost max position for <before>,
* and at the right of the rightmost max position for <after>)
* are at least <ratioMin> times than the numbers of <before>/<after>
* in "bad" positions. If no so much maximum is found,
* the boolean <max_found> is set to false in the structure.
*
* @complexity time: linear in count(), space: constant
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment