- 13 Jan, 2017 40 commits
-
-
Mikaël Salson authored
It will be useful to determine the type of index to build at runtime.
-
Mikaël Salson authored
-
Mikaël Salson authored
-
Mikaël Salson authored
The affectation length is longer with Aho-Corasick than with k-mer based indexes. So we use the length of the read instead.
-
Mikaël Salson authored
The length of the affectations must be the seed span and not the seed weight as the dashes are replaced by actual letters. Unit test added.
-
Mikaël Salson authored
-
Mikaël Salson authored
This makes possible to compare affectations regardless of their length.
-
Mikaël Salson authored
Some indexes store global informations regarding all the kmers (ArrayKmerStore and MapKmerStore). Other indexes have a more detailed view on the kmers that are stored (Aho-Corasick automaton). These indexes are more accurate on the index load per affectation.
-
Mikaël Salson authored
Go back to the previous computation. Don't use kmer.getLength() to compute the index load since it is not properly set with the UNKNOWN or NOT_UNKNOWN affectations.
-
Mikaël Salson authored
Depending on the length of the affectation, we may not have the same results
-
Mikaël Salson authored
At first, we considered that smallestAnalysableLength() would be enough to deal with differences between a k-mer index and an Aho-Corasick automaton. This is actually not the case. We need to have the length of the seed together with the affectation for the Aho-Corasick automaton. In the end this allows to get the same affectation strings than with a k-mer index. This would not have been possibble with the smallesAnalysableLength() function and it makes it useless.
-
Mikaël Salson authored
This will be useful for Aho-Corasick automaton to put the affectation at the right place.
-
Mikaël Salson authored
To determine whether two affectations are equal, we don't need to check their length for the unknown and ambiguous affectations. They could even differ. The length of an ambiguous affectation would be the length of the affectations causing the ambiguity
-
Mikaël Salson authored
-
Mikaël Salson authored
Needed with the Aho-Corasick automaton (because of using a map)
-
Mikaël Salson authored
-
Mikaël Salson authored
-
Mikaël Salson authored
The germline may have an index but also sub-germlines (eg. for XXX germline). All of them must be finished.
-
Mikaël Salson authored
When the germline by itself didn't insert any kmer but relies on other germlines (eg. for the XXX germline), we compute the index load by relying on the index load of the other germlines. Note that the sum may not accurately reflect the index load (as there may have common k-mers between the germlines).
-
Mikaël Salson authored
It prevents from finishing several times (which may cause some troubles)
-
Mikaël Salson authored
-
Mikaël Salson authored
Equality should be obtained with the length too. Difference should not be coded independently of equality
-
Mikaël Salson authored
At least for now this should not happen since we index all a germline with the same seed
-
Mikaël Salson authored
-
Mikaël Salson authored
Those functions are not really used with Kmer but they are there for the compatibility with KmerAffect.
-
Mikaël Salson authored
The information on germlines were recovered through the index which does not make sense with an Aho-Corasick automaton. In such a case the index is the same but the germlines have different seeds and therefore index loads should differ. Since getIndexLoad depends on the KmerAffect, it also differs on V and J
-
Mikaël Salson authored
We calculate the index load depending on the type of Affectation stored. Thus we now need just one type of getIndexLoad() as the other one doesn't make sense any more. This change should allow to compute a confident e-value using the Aho-Corasick automaton
-
Mikaël Salson authored
Just counting the number of inserted k-mers is not sufficient. We need to count the number of kmers inserted for each value of k. Additionnally, the value must be counted only for final state (as they correspond to actuel germline kmers)
-
Mikaël Salson authored
This doubles the space used by affectations. But this allows to compute the index load on an index which has several types of seeds. This may also solve the problem with revcomp with Aho-Corasick index (where affectations were not symmetric). There is now an additionnal parameter for the affectations.
-
Mikaël Salson authored
Delete the states iteratively rather than recursively. Quite logically a recursive destruction explodes the stack with real data.
-
Mikaël Salson authored
refs was initialized only sometimes in Germline. Now it is initialized when any index is built (together with the ID)
-
Mikaël Salson authored
As indexes can be finish()-ed, we must add that possibility for germlines too
-
Mikaël Salson authored
-
Mikaël Salson authored
In an Aho-Corasick automaton (as built here) a transition should always be defined. Unless we forgot to build the failure functions. This assertion is there for absent-minded people, it may save them several hours of debugging…
-
Mikaël Salson authored
The ID was initialised in createIndex, but if it was not called, the ID was not initialised. By transferring the initialisation to the base constructor we ensure that the ID will be initialised. The drawback is that classes that should be virtual now have a constructor.
-
Mikaël Salson authored
The seed must be given to the germline.
-
Mikaël Salson authored
-
Mikaël Salson authored
-
Mikaël Salson authored
-
Mikaël Salson authored
Since it can be redefined (and is redefined) we need to make the method virtual so that the redefinition will be called even by some code produced in IKmerStore. We also need to tell in PointerACAutomaton that we must use the other versions of insert that are defined in IKmerStore.
-