las uses both openmp and pthreads, which causes several issues.

On 33815b67, the clang build is more than twice slower than the gcc build (in some cases), while in other situations both are equally bad. For stupid reasons.

eval $(make show)
if ! [ -f /tmp/c120.roots.gz ] ; then
$build_tree/sieve/makefb -poly parameters/polynomials/c120.poly -lim 5500000 -maxbits 12 -out c120.roots.gz -t 4
fi

with respectively a gcc-9.3.0 and a clang-9.0.1 build, I have (on my home machine):

localhost $ ./build/localhost/sieve/las -poly parameters/polynomials/c120.poly -I 12 -q0 4000000 -q1 4001000 -lim0 3000000 -lim1 5500000 -lpb0 27 -lpb1 27 -mfb0 54 -mfb1 54 -ncurves0 14 -ncurves1 19 -fb1 /tmp/c120.roots.gz -t auto -production | tail -n 1
# Total 3533 reports [0.00393s/r, 49.8r/sq] in 3.63 elapsed s [382.1% CPU]
localhost $ ./build/localhost.clang/sieve/las -poly parameters/polynomials/c120.poly -I 12 -q0 4000000 -q1 4001000 -lim0 3000000 -lim1 5500000 -lpb0 27 -lpb1 27 -mfb0 54 -mfb1 54 -ncurves0 14 -ncurves1 19 -fb1 /tmp/c120.roots.gz -t auto -production | tail -n 1
# Total 3533 reports [0.00783s/r, 49.8r/sq] in 8.7 elapsed s [317.9% CPU]

Another test is on grvingt. Here, things go really bad, because in both cases we have a long wait at the beginning of the computation, just between the lines # Reading side-1 factor base took 0.1s (0.1s real) and # polynomial has no roots for xxx of the yyy primes that were tried.

The culprit is the mix of pthread (or other kind of tailor-made threads) and openmp threading in las.

las is programmed so that it uses the machine fully (with -t auto at least), and at any rate this is the usage we have in mind.

There are (to my knowledge) at least two places that las reaches as "utility" code, and that use openmp (while las proper does not).

the mpz_poly layer in utils/mpz_poly.cpp
the product tree code in sieve/ecm/batch.cpp

Unfortunately, the openmp runtime eagerly spawns as many threads as it sees fit, and those threads seem to be taxing the cpu continuously, leading to very inefficient code. YMMV, which is why on my test with my home machine, the gcc build appears not to be affected. However, in some cases we pay a very high price.

There are several possible runtime workarounds.

run with OMP_NUM_THREADS=1 ; it is probably fine to do so, at least as far as the utils/mpz_poly.cpp code is concerned. While it is useful to have it openmp'ed in certain cases, that is not the case with las. The situation with the batch code is a bit different, and I'm not sure about what we should do.
run with OMP_DYNAMIC=true ; it may or may not be a good idea, but I don't like it, really. Results are not deterministic, and what the openmp runtime decides to do is bound to be based on heuristics that we cannot control.

However, I think that we should rather fix this in the code.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information