Commit 26b013eb authored by Mikaël Salson's avatar Mikaël Salson

automaton.hpp: Try to make unknown affect position consistent

In Aho-Corasick automaton we know the affect once we traversed the sequence.
However for backward-compatibility reasons with k-mer indexes we need to move
those affects so that they correspond to the start of the sequence.
However for an Unknown Affect, it is hard to tell where we should put it
as the affect is unknown. If we see that we will overwrite an existing
affect, we try another position.

This solves some cases in #4225
parent c411a097
......@@ -277,11 +277,23 @@ vector<Info> PointerACAutomaton<Info>::getResults(const seqtype &seq, bool no_re
size_t seq_len = seq.length();
vector<Info> result(seq.length());
unsigned char previous_length = 0;
for (size_t i = 0; i < seq_len; i++) {
current_state = (pointer_state<Info> *)next(current_state, seq[i]);
Info info = current_state->informations.front();
if (! info.isNull()) {
result[i - info.getLength()+1] = info;
if (info.isAmbiguous() && ! result[i - info.getLength() + 1].isNull()
&& previous_length > 0)
// We try to maintain a consistency as the length for an ambiguous
// affect is a bit tricky to guess. So if we see that we gonna
// overwrite a result, we try to prevent that
result[i - previous_length + 1] = info;
else {
result[i - info.getLength()+1] = info;
if (! info.isAmbiguous())
previous_length = info.getLength();
}
}
}
......
......@@ -3,3 +3,9 @@
$ Find only +k and ? affects before the stretch of _ for all loci
16: seed .*(\+k| \?){28}( _)+$
!LAUNCH: $VIDJIL_DIR/$EXEC -g $VIDJIL_DIR/germline -r 1 -4 -K ../data/chimera-fake-half.fa
!OUTPUT_FILE: out/chimera-fake-half.affects
$ Find only +B and ? affects on the TRB and unexpected lines
2: seed .* _(\+B| \?){48} _
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment