Commit a42f8c39 authored by Mikaël Salson's avatar Mikaël Salson

segment.{h,cpp}: Make sure number of matches is > 0

When computing whether the alignment has too many differences
we should not allow having a negative number of mismatches.
So we have to make sure that length >= BOTTOM_TRIANGLE_SHIFT.
But this is not enough: having only 1 match in 20 nucleotides
doesn't make sense. Therefore we may allow a given fraction
of the sequences to be matches just by chance.
We could have used probabilities but this should be a rare
case. I'm not sure that's worth it.
parent ee0bb421
......@@ -915,7 +915,9 @@ void align_against_collection(string &read, BioReader &rep, int forbidden_rep_id
length = min(length, (int) rep.sequence(box->ref_nb).size());
length += del_end;
// length is an estimation of the number of aligned nucleotides. It would be better with #2138
int score_with_limit_number_of_indels = (length - BOTTOM_TRIANGLE_SHIFT) * segment_cost.match + BOTTOM_TRIANGLE_SHIFT * segment_cost.insertion;
int min_number_of_matches = min(int(length * FRACTION_ALIGNED_AT_WORST), length - BOTTOM_TRIANGLE_SHIFT); // Minimal number of matches we can have with a triangle
int max_number_of_insertions = length - min_number_of_matches;
int score_with_limit_number_of_indels = min_number_of_matches * segment_cost.match + max_number_of_insertions * segment_cost.insertion;
if (onlyBottomTriangle && best_score < score_with_limit_number_of_indels) {
// Too many indels/mismatches, let's do a full DP
align_against_collection(read, rep, forbidden_rep_id, reverse_ref, reverse_both,
......@@ -54,7 +54,7 @@
- Margins above the half of the window length may produce shortened or shifted windows
#define FRACTION_ALIGNED_AT_WORST .5 /* Fraction of the sequence that should be aligned before deactivating the heuristics */
using namespace std;
using json = nlohmann::json;
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment