• Mathieu Giraud's avatar
    core/kmerstore.h: ignore all k-mers with extended nucleotides when updating index · b0e3045d
    Mathieu Giraud authored
    There are some 'N' and other extended nucleotides in the germline sequences.
    As we store in the indexes both the k-mers and their reverse complement, and as
    we handle extended nucleotides almost randomly (see tools:nuc_to_int()),
    we may have slight differences when analyzing some reads and their reverse complement.
    Ignoring such k-mers allow thus to be more deterministic, getting the same
    results on a (pure ACGT) read and its reverse complement.
    Another option (harder to implement) could be to add several k-mers in the index,
    but this would decrease the effective weight of the seed.
    Note that we should also improve the analysis of reads that includes extended nucleotides.
kmerstore.h 9.55 KB