How to speed-up purge_bucket?
Imported issue: Initially reported by @gaudry in https://gforge.inria.fr/tracker/?group_id=2065&aid=18394
In las, a significant amount of time is spent in purge_bucket(). This is not a surprise.
The critical loop, that can be found at the end of the file bucket.c, contains two things:
1- get the position of an update, and check whether it is a survivor;
2- decode the prime_hint
Of course, we really decode only if we have a survivor, however, due to a wrapping
effect, we have to do a few operations for each prime_hint (basically a conditional
addition of a constant to a variable).
In the current code, it is very likely that these operations on hints are ""for free"", since
the other part of the critical loop, there is a random memory access in the S array which
hardly fits in the L1 cache (bucket-region is 2^16, hardcoded in a rather strong way in the
code).
Idea for improvement: make the array S small enough so that it fits in L1, and remove
the hint computation outside the critical loop. This might be possible, by using a hint that
would be an index in the piece of factor base that is of interest. If there is no wrapping
effect, then the decoding of the hint could be done while trial dividing, so that the memory
access to the factor base is masked by the heavy arithmetic cost in this part of the code.
Having pieces of fb of length bounded by 2^16 might creates some other problems somewhere
else and it is unclear if this could save something in the end...