1. 13 Jan, 2017 16 commits
  2. 05 Jan, 2017 1 commit
  3. 01 Dec, 2016 1 commit
  4. 29 Sep, 2016 2 commits
  5. 28 Sep, 2016 20 commits
    • Mikaël Salson's avatar
      representative.cpp: satisfy clang · a916080d
      Mikaël Salson authored
      Clang doesn't allow the declaration of a variable-length array
      with a non-POD type. Thus we do it the classical way.
    • Mikaël Salson's avatar
      kmerstore.h: satisfy clang · 01e7f4b4
      Mikaël Salson authored
      For an unknown reason, Clang complained about the line:
      seed = IKmerStore<T>::seed
      because “error: no viable overloaded '='”
      It seemed to interpret one of the variable as const (which is not)
      as it indicated:
      /usr/bin/../lib/gcc/x86_64-linux-gnu/6.1.1/../../../../include/c++/6.1.1/bits/basic_string.h:565:7: note: candidate function not viable: 'this' argument has type 'const string' (aka 'const basic_string<char>'),
            but method is not marked const
      The solution was to use a local variable. Clang was happy, but I don't
      see what really makes the difference.
    • Mikaël Salson's avatar
    • Mikaël Salson's avatar
      windows: read must be sampled depending on the score · 3dacebba
      Mikaël Salson authored
      The sample was constituted only on the sequence length which made sense
      when the reads were stored depending on their length. But now the scoring
      function changed and we focus on quality. It does not make sense anymore
      to retrieve the longest reads then. We just want to have a sample of the
      best reads (ie. those of better quality).
      Using that, if we change our scoring function again, no modification
      will be needed.
      Note that the second parameter of getBestReads() is not used yet.
      It may be useful to prevent too bad sequences to be sampled.
      SequenceSampler is not used anymore. It will be removed in a future release.
    • Mikaël Salson's avatar
      read_score: First quality is space instead of ! · 5ca7d62c
      Mikaël Salson authored
      This is done to ensure that quality is always > 0 (and the score too).
    • Mikaël Salson's avatar
      read_score: Avoid buffer overflow · 18a16385
      Mikaël Salson authored
      With well chosen qualities we could have a buffer overflow
      (asserts are tested only during development).
      This is now prevented
    • Mikaël Salson's avatar
      read_score.{h,cpp}: Don't allocate memory for qualities each time · 08a9fcc4
      Mikaël Salson authored
      The array always has the same length.
      A static version is enough and avoids allocating/free-ing at each call.
    • Mikaël Salson's avatar
      read_score.cpp: Make sure quality is ok. · 06cfbd27
      Mikaël Salson authored
    • Mikaël Salson's avatar
      core/representative: Use one index per seed. · 345f0f6a
      Mikaël Salson authored
      When the seeds are mixed we may have false positive hits
      which will make us extend a representative when we should not.
    • Mikaël Salson's avatar
      core/representative: The required sequence is now mandatory. · 6def2c4c
      Mikaël Salson authored
      With the previous implementation we could find a representative without
      the required sequence serving as an anchor. Now it is more difficult
      and therefore it becomes mandatory. This is not a problem as in
      practice we always have a required sequence (the window).
    • Mikaël Salson's avatar
      core/representative: The contiguous seed can be given as a parameter · 8f6f8bd1
      Mikaël Salson authored
      We may not want to hard-code the contiguous seed (particularly for the tests).
      We may also want to provide the spaced seeds, but it is not possible at the
      moment. We will see if it appears to be useful.
    • Mikaël Salson's avatar
      core/representative: The cover length is the most important. · dcc7acea
      Mikaël Salson authored
      What matters is how many positions are covered by seeds.
      Therefore the main criteria to determine the best representative
      is the highest cover length. In case of tie, the length of the representative
      will be taken into account (is it really reasonable to take the longest one
      in that case as we will take the noisest?)
    • Mikaël Salson's avatar
      core/representative: Don't break the loop too early. · e6d67207
      Mikaël Salson authored
      This probably had a good reason to be there. But now this would make us
      leave the loop too early without considering the other sequences.
    • Mikaël Salson's avatar
      core/representative: Do not rely anymore on k · 34129b02
      Mikaël Salson authored
      k was the length of the contiguous seed. As we now have several seeds
      this does not make sense anymore.
      We now rely on the seed lengths.
      Also, when extending the representative to the right we make sure that
      the last position of the seed is on the position to be extended. This is
      done to make sure that extending to the left or to the right are
      equivalent (which was not the case in the previous implementation with
      several seeds).
    • Mikaël Salson's avatar
      core/read_storage, core/window: Average read length · 5bf47aae
      Mikaël Salson authored
      The average read length was obtained through getAverageScore() which is
      not very robust as nothing guarantees the score to be the read length.
      There is now a method in ReadStorage to get the average read length
      which can be called from the outside.
    • Mikaël Salson's avatar
      core/representative: New optional parameter: try_hard · aaa970f2
      Mikaël Salson authored
      When set, we will try hard to find a representative.
      By default it is not set. The KmerRepresentativeComputer will
      automatically try hard if the representative found by default is not
      long enough (< THRESHOLD_BAD_COVERAGE).
    • Mikaël Salson's avatar
      tools: Remove Ns at the end of a sequence · 2099c49f
      Mikaël Salson authored
      We remove the longest prefix (resp. suffix) ending with a
      N (resp. starting) whose N content is greater or equal to
      This is particularly useful for the representative: because of spaced
      seeds we may have N in the representative, especially on the ends of the
      representative (when only one seed matched at the extremity of the
    • Mikaël Salson's avatar
      representative.cpp: Use multiple seeds to compute representative · 8d340098
      Mikaël Salson authored
      Since spaced seeds may be used, we need to compute the positions that
      are covered by actual letters (#). Some positions may be missing in
      which case we will put a N in the representative.
    • Mikaël Salson's avatar
      kmerstore.h: Inserting with a seed. · 83ff2886
      Mikaël Salson authored
      Let the possibility to provide a seed when inserting in a
      KmerStore. This will be particularly suitable with an index having
      different seeds depending on the sequences.
    • Mikaël Salson's avatar
      IKmerStore: getResults() depending on seed · daffb14f
      Mikaël Salson authored
      We can provide a seed to the getResults() method to have results
      depending on a seed. Therefore multiple seeds can be used to index.