- 13 Jan, 2017 17 commits
-
-
Mikaël Salson authored
-
Mikaël Salson authored
-
Mikaël Salson authored
The length of the affectations must be the seed span and not the seed weight as the dashes are replaced by actual letters. Unit test added.
-
Mikaël Salson authored
Some indexes store global informations regarding all the kmers (ArrayKmerStore and MapKmerStore). Other indexes have a more detailed view on the kmers that are stored (Aho-Corasick automaton). These indexes are more accurate on the index load per affectation.
-
Mikaël Salson authored
At first, we considered that smallestAnalysableLength() would be enough to deal with differences between a k-mer index and an Aho-Corasick automaton. This is actually not the case. We need to have the length of the seed together with the affectation for the Aho-Corasick automaton. In the end this allows to get the same affectation strings than with a k-mer index. This would not have been possibble with the smallesAnalysableLength() function and it makes it useless.
-
Mikaël Salson authored
-
Mikaël Salson authored
When the germline by itself didn't insert any kmer but relies on other germlines (eg. for the XXX germline), we compute the index load by relying on the index load of the other germlines. Note that the sum may not accurately reflect the index load (as there may have common k-mers between the germlines).
-
Mikaël Salson authored
It prevents from finishing several times (which may cause some troubles)
-
Mikaël Salson authored
We calculate the index load depending on the type of Affectation stored. Thus we now need just one type of getIndexLoad() as the other one doesn't make sense any more. This change should allow to compute a confident e-value using the Aho-Corasick automaton
-
Mikaël Salson authored
Just counting the number of inserted k-mers is not sufficient. We need to count the number of kmers inserted for each value of k. Additionnally, the value must be counted only for final state (as they correspond to actuel germline kmers)
-
Mikaël Salson authored
This doubles the space used by affectations. But this allows to compute the index load on an index which has several types of seeds. This may also solve the problem with revcomp with Aho-Corasick index (where affectations were not symmetric). There is now an additionnal parameter for the affectations.
-
Mikaël Salson authored
Delete the states iteratively rather than recursively. Quite logically a recursive destruction explodes the stack with real data.
-
Mikaël Salson authored
In an Aho-Corasick automaton (as built here) a transition should always be defined. Unless we forgot to build the failure functions. This assertion is there for absent-minded people, it may save them several hours of debugging…
-
Mikaël Salson authored
The ID was initialised in createIndex, but if it was not called, the ID was not initialised. By transferring the initialisation to the base constructor we ensure that the ID will be initialised. The drawback is that classes that should be virtual now have a constructor.
-
Mikaël Salson authored
Method which returns how many nucleotides at each time we can get. Useful to know what is the expected length of a getResults().
-
Mikaël Salson authored
Must be called after having inserted all the sequences.
-
Mikaël Salson authored
Implements an Aho-Corasick automaton that extends the IKmerStore.
-