Mentions légales du service

Skip to content
  • Mathieu Giraud's avatar
    core/kmerstore.h: ignore all k-mers with extended nucleotides when updating index · b0e3045d
    Mathieu Giraud authored
    There are some 'N' and other extended nucleotides in the germline sequences.
    As we store in the indexes both the k-mers and their reverse complement, and as
    we handle extended nucleotides almost randomly (see tools:nuc_to_int()),
    we may have slight differences when analyzing some reads and their reverse complement.
    Ignoring such k-mers allow thus to be more deterministic, getting the same
    results on a (pure ACGT) read and its reverse complement.
    
    Another option (harder to implement) could be to add several k-mers in the index,
    but this would decrease the effective weight of the seed.
    
    Note that we should also improve the analysis of reads that includes extended nucleotides.
    b0e3045d