• Rayan Chikhi's avatar
    this is THE big commit for GATB-core of november 2015. changelog is: · 223f9743
    Rayan Chikhi authored
    - Graph simplifications are moved from Minia to GATB-core. Call them using: graph.simplify()
    - New data structure for faster neighborhood queries. Enable it using: graph.precomputeAdjacency()
    - As a consequence of optimizations, it is now much faster to call indegree(node), outdegree(node) and
    degree(node,size_t &in,size_t &out) rather than neighbors(node, direction).size() if you're only interested
    in degrees.
    - LargeInt (the type of kmer) constructor has been removed, for speed reasons. Be warned that it might
    break existing code that implicitly rely on 0-initialization of kmers, but problems can hopefully be detected using valgrind.
    - Graph becomes GraphTemplate<Node,Edge,GraphDataVariant>, and compatibility is preserved via typedefs
    - This enables to define GraphFast<span>, a graph object which only holds Node's and Edge's for a single k-mer size,
    as opposed to a boost::variant of multiple kmer sizes before. It is faster.
    - Graph API has been changed:
    neighbors<Node> becomes neighbors,
    neighbors<Edge> becomes neighborsEdge,
    iterator<Node> becomes iterator,
    iterator<BranchingNode> becomes iteratorBranching
    - The change above was necessary, because it is difficult to specialize nested templates in C++. Actually, not all templated
    graph functions have been un-templated (because some aren't used in conjunction with GraphFast). There is still work to do.
    - Due to graph template, the following classes have also been changed to be also templatized: BranchingAlgorithm, all Frontline's, all Terminator's
    typedef's have been created to preserve compatibility
    - For speed of tools, it is now advised to follow Minia.cpp's functor technique and use GraphFast<span> instead of Graph. However
    Graph should still work and offer same performance as before.
    - GraphData is moved from Graph.cpp to Graph.hpp
    - MPHF index of a node is now cached in the Node object
    - because of that, 'const Node&' should now be just 'Node&', everywhere.
    - added a function graph.disableNodeState() to disable recording node state (normal, deleted, marked).
    Graph then avoids making MPHF queries when checking if a node exists (also involved in neighbors() queries).
    This makes the bloom flavor of graphs faster, but once precomputeAdjacency() is called, it is not relevant anymore.
    - added scripts/parse_gcc_output.py for visual inspection of gcc compilation/link errors involving graph templates
    - slightly modified src/CMakeLists so that tools may set their own KSIZE_DEFAULT_LIST (e.g. Minia)
    - speedup to LargeInt::hash1()
    - a few unit tests have been added, as well as one benchmark significantly improved: bench_graph.cpp
    - added minimizer stuff that was missing from last commit (some bugfixes, and also specialization to LargeInt<1>)
    - LargeInt's are not instances of ArrayData anymore. Instead, ArrayData is a member. This is faster.