Commit 330c03e8 authored by Mathieu Giraud's avatar Mathieu Giraud

core/segment.cpp: UNSEG_ONLY_V/J only when left/right e-values are good enough

There were previously two places where the segmentation could fail with UNSEG_ONLY_V/J.
The first one, when there is no 'good' segmentation point, was improved by 5298c86f, requiring at least 5 k-mers.
It turned out that it was not sufficient, as reported by @flothoni in #2107.

Contrarily to what what thought in 5298c86f, the segmentation point returned by kaa->getMaximum()
is actually meaningful to compute some e-values, even when kaa->getMaximum().max_found is false.

We thus now remove completely some ancient code where the thresholds were based on the number of k-mers,
and rely solely on checkLeftRightEvaluesThreshold() to discriminate between these unsegmentation causes.

Streamlines the code, and will fix #2107.
parent 35b70da4
......@@ -612,36 +612,25 @@ void KmerSegmenter::computeSegmentation(int strand, KmerAffect before, KmerAffec
affect_infos max;
max = kaa->getMaximum(before, after);
// We did not find a good segmentation point
if (!max.max_found) {
// We labeled it detected if there were both enough affect_5 and enough affect_3
bool detected_before = (max.nb_before_left + max.nb_before_right >= DETECT_THRESHOLD);
bool detected_after = (max.nb_after_left + max.nb_after_right >= DETECT_THRESHOLD);
if (detected_before && detected_after)
because = UNSEG_AMBIGUOUS ;
else if ((strand == 1 && detected_before) || (strand == -1 && detected_after))
because = UNSEG_TOO_FEW_J ;
else if ((strand == 1 && detected_after) || (strand == -1 && detected_before))
because = UNSEG_TOO_FEW_V ;
else
because = UNSEG_TOO_FEW_ZERO ;
return ;
}
// E-values
pair <double, double> pvalues = kaa->getLeftRightProbabilityAtLeastOrAbove();
evalue_left = pvalues.first * multiplier ;
evalue_right = pvalues.second * multiplier ;
evalue = evalue_left + evalue_right ;
// This can lead to UNSEG_TOO_FEW_ZERO or UNSEG_TOO_FEW_V/J
checkLeftRightEvaluesThreshold(threshold, strand);
if (because != NOT_PROCESSED)
return ;
if (!max.max_found) {
// The 'ratioMin' checks at the end of kaa->getMaximum() failed
because = UNSEG_AMBIGUOUS;
return ;
}
// There was a good segmentation point
box_V->end = max.first_pos_max;
......
......@@ -23,13 +23,7 @@
#define DETECT_THRESHOLD_STRAND 5 /* If the number of total affectations
is above this threshold, then a sequence with no clearly attributed
stranf will be marked as STRAND_NOT_CONSISTEN */
#define DETECT_THRESHOLD 5 /* This threshold only affects the classification of unsegmented sequences.
If the numbers of both V and J affectations are above this threshold,
then the unsegmented sequence will be marked as AMBIGUOUS.
Otherwise, if the number of either V or J affectations if above this threshold,
then the unsegmented sequence will be marked as either ONLY V or ONLY J. */
strand will be marked as STRAND_NOT_CONSISTENT */
#define JSON_REMEMBER_BEST 4 /* The number of V/D/J predictions to keep */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment