- 24 Mar, 2020 1 commit
-
-
Mikaël Salson authored
-
- 26 Apr, 2019 1 commit
-
-
Mikaël Salson authored
Computing them for each read is time consuming. But many of the are common (there are few different index loads and the read length is small)
-
- 28 Feb, 2019 1 commit
-
-
Mikaël Salson authored
See #2596 for the reason why we need to do this. This should be reverted with #1169
-
- 23 Jul, 2018 1 commit
-
-
Mikaël Salson authored
-
- 13 Jun, 2018 1 commit
-
-
Cyprien Borée authored
(squashed from earlier commits)
-
- 07 Jul, 2017 1 commit
-
-
Mikaël Salson authored
We now have an abstract class to deal with biological sequence files. This will allow to more easily manage different file types. This commit only reorganizes the code so that we will be able to add a BAM reader easily. Functionnally the code should be equivalent to its previous version. Some functions that were not used have been removed. The operator>> has been removed as it was only used in unit testing. This operator is not convenient as having the filename may be useful to reopen the file or to know its extension, to guess the filetype. See #2016
-
- 14 Mar, 2017 1 commit
-
-
Mathieu Giraud authored
With the previous changes, one may have e-value computations on short sequences, leading to a possibly negative number of k-mers in getProbabilityAtLeastOrAbove(). Fixes #2107.
-
- 13 Jan, 2017 20 commits
-
-
Mikaël Salson authored
This is necessary (for instance) with xxx germlines.
-
Mikaël Salson authored
Kmer indexes and Aho-Corasick don't have the same features. While it is possible to deal with several seeds in Aho-Corasick it is not with a simple lookup table. Therefore we can consider different seeds only in some cases.
-
Mikaël Salson authored
This has been moved in another file for better code separation.
-
Mikaël Salson authored
It will be useful to determine the type of index to build at runtime.
-
Mikaël Salson authored
The length of the affectations must be the seed span and not the seed weight as the dashes are replaced by actual letters. Unit test added.
-
Mikaël Salson authored
Some indexes store global informations regarding all the kmers (ArrayKmerStore and MapKmerStore). Other indexes have a more detailed view on the kmers that are stored (Aho-Corasick automaton). These indexes are more accurate on the index load per affectation.
-
Mikaël Salson authored
Go back to the previous computation. Don't use kmer.getLength() to compute the index load since it is not properly set with the UNKNOWN or NOT_UNKNOWN affectations.
-
Mikaël Salson authored
At first, we considered that smallestAnalysableLength() would be enough to deal with differences between a k-mer index and an Aho-Corasick automaton. This is actually not the case. We need to have the length of the seed together with the affectation for the Aho-Corasick automaton. In the end this allows to get the same affectation strings than with a k-mer index. This would not have been possibble with the smallesAnalysableLength() function and it makes it useless.
-
Mikaël Salson authored
Needed with the Aho-Corasick automaton (because of using a map)
-
Mikaël Salson authored
It prevents from finishing several times (which may cause some troubles)
-
Mikaël Salson authored
Those functions are not really used with Kmer but they are there for the compatibility with KmerAffect.
-
Mikaël Salson authored
We calculate the index load depending on the type of Affectation stored. Thus we now need just one type of getIndexLoad() as the other one doesn't make sense any more. This change should allow to compute a confident e-value using the Aho-Corasick automaton
-
Mikaël Salson authored
This doubles the space used by affectations. But this allows to compute the index load on an index which has several types of seeds. This may also solve the problem with revcomp with Aho-Corasick index (where affectations were not symmetric). There is now an additionnal parameter for the affectations.
-
Mikaël Salson authored
refs was initialized only sometimes in Germline. Now it is initialized when any index is built (together with the ID)
-
Mikaël Salson authored
As indexes can be finish()-ed, we must add that possibility for germlines too
-
Mikaël Salson authored
The ID was initialised in createIndex, but if it was not called, the ID was not initialised. By transferring the initialisation to the base constructor we ensure that the ID will be initialised. The drawback is that classes that should be virtual now have a constructor.
-
Mikaël Salson authored
-
Mikaël Salson authored
Since it can be redefined (and is redefined) we need to make the method virtual so that the redefinition will be called even by some code produced in IKmerStore. We also need to tell in PointerACAutomaton that we must use the other versions of insert that are defined in IKmerStore.
-
Mikaël Salson authored
Method which returns how many nucleotides at each time we can get. Useful to know what is the expected length of a getResults().
-
Mikaël Salson authored
Must be called after having inserted all the sequences.
-
- 28 Sep, 2016 3 commits
-
-
Mikaël Salson authored
For an unknown reason, Clang complained about the line: seed = IKmerStore<T>::seed because “error: no viable overloaded '='” It seemed to interpret one of the variable as const (which is not) as it indicated: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.1.1/../../../../include/c++/6.1.1/bits/basic_string.h:565:7: note: candidate function not viable: 'this' argument has type 'const string' (aka 'const basic_string<char>'), but method is not marked const The solution was to use a local variable. Clang was happy, but I don't see what really makes the difference.
-
Mikaël Salson authored
Let the possibility to provide a seed when inserting in a KmerStore. This will be particularly suitable with an index having different seeds depending on the sequences.
-
Mikaël Salson authored
We can provide a seed to the getResults() method to have results depending on a seed. Therefore multiple seeds can be used to index.
-
- 27 Jun, 2016 1 commit
-
-
Mikaël Salson authored
-
- 20 Jun, 2016 1 commit
-
-
Mikaël Salson authored
-
- 18 Dec, 2015 1 commit
-
-
Mathieu Giraud authored
Labels were introduced in df19d79c, for -c germlines, and were later used for -2. Even if they were previously strings, they always designated some file.
-
- 15 Jun, 2015 4 commits
-
-
Mathieu Giraud authored
-
Mathieu Giraud authored
The e-value computation could be more precise by taking into account the actual kmer.
-
Mathieu Giraud authored
-
Mathieu Giraud authored
core/kmerstore.h, core/germline.cpp: delete only once the index when it is shared between several germlines This is more generic than was what done by ac3ea649.
-
- 12 Jun, 2015 1 commit
-
-
Mathieu Giraud authored
-
- 04 Jun, 2015 1 commit
-
-
Mathieu Giraud authored
1 << (2 * k) may be not reliable for large k :-)
-
- 21 May, 2015 1 commit
-
-
Mathieu Giraud authored
-