-
Mathieu Giraud authored
There are some 'N' and other extended nucleotides in the germline sequences. As we store in the indexes both the k-mers and their reverse complement, and as we handle extended nucleotides almost randomly (see tools:nuc_to_int()), we may have slight differences when analyzing some reads and their reverse complement. Ignoring such k-mers allow thus to be more deterministic, getting the same results on a (pure ACGT) read and its reverse complement. Another option (harder to implement) could be to add several k-mers in the index, but this would decrease the effective weight of the seed. Note that we should also improve the analysis of reads that includes extended nucleotides.
b0e3045d