Simpler merge
This branch simplifies the merge
and merge-dl
programs. In particular, it uses a much simpler algorithm to efficiently transpose a large sparse (sub)matrix. It uses much less memory on large cases (for RSA-829, peak memory usage goes down from 1400GB to 950GB). It is not much slower in general, and usually faster with hyperthreading.
Garbage collection is simplified: it is simply run after each pass (no more complicated and fragile tuning).