Commit c216ed6f authored by Mikael Salson's avatar Mikael Salson

segment.cpp: Estimate number of aligned nucleotides

Knowing the number of aligned nucleotides allow to compute the minimal score under
which the optimisation may have biased the result.

The number of aligned nucleotides could be computed more accurately with #2138.
Indeed if there are large portions that are deleted in the read (for instance)
they won't be taken in the number in this case. However it would be important
as the optimisation may yield a wrong gene in such a case.

See #3066
parent abdcc630
Pipeline #22139 passed with stages
in 22 minutes and 12 seconds
......@@ -903,7 +903,16 @@ void align_against_collection(string &read, BioReader &rep, int forbidden_rep_id
}
int score_with_limit_number_of_indels = (rep.sequence(box->ref_nb).size() - BOTTOM_TRIANGLE_SHIFT) * segment_cost.match + BOTTOM_TRIANGLE_SHIFT * segment_cost.insertion;
int length = best_best_i; // end position of the alignment in the read
int del_end = rep.sequence(box->ref_nb).size() - best_best_j;
if (reverse_ref || reverse_both) {
length = read.length() - length - 1;
del_end = best_best_j;
}
length = min(length, (int) rep.sequence(box->ref_nb).size());
length += del_end;
// length is an estimation of the number of aligned nucleotides. It would be better with #2138
int score_with_limit_number_of_indels = (length - BOTTOM_TRIANGLE_SHIFT) * segment_cost.match + BOTTOM_TRIANGLE_SHIFT * segment_cost.insertion;
if (onlyBottomTriangle && best_score < score_with_limit_number_of_indels) {
// Too many indels/mismatches, let's do a full DP
align_against_collection(read, rep, forbidden_rep_id, reverse_ref, reverse_both,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment