Commit efb4ea74 authored by Emmanuel Thomé's avatar Emmanuel Thomé
Browse files

dodo

parent ed257c04
......@@ -14,6 +14,20 @@ cado-nfs](https://gitlab.inria.fr/cado-nfs/cado-nfs/commit/8a72ccdde) as
baseline. In some cases we provide exact commit numbers for specific
commands as well.
We also repeat this important paragraph from the RSA-240 documentation.
Most (if not all) information boxes in this document rely on two shell
variables, `CADO_BUILD` and `DATA`, be set and `export`-ed to shell
subprocess (as with `export CADO_BUILD=/blah/... ; export
DATA=/foo/...`). The `CADO_BUILD` variable is assumed to be the path to
a successful cado-nfs build directory. The `DATA` variable, which is
used by some scripts, should point to a directory with plenty of storage,
possibly on some shared filesystem. Storage is also needed to store the
temporary files with collected relations. Overall, a full reproduction of
the computation would need in the whereabouts of 10TB of storage. All
scripts provided in this script expect to be run from the directory where
they are placed, since they are also trying to access companion data
files.
## Searching for a polynomial pair
We searched for a polynomial pair ``à la Joux-Lercier'', using the
......@@ -63,7 +77,7 @@ In particular, the number-one polynomial pair according to MurphyE finds
Additional sample-sieving was performed on the few best candidates. With
a test on 128,000 special-q, 3 polynomials could not be separated.
We ended-up using the following [`dlp240.poly`](dlp240.poly):
We ended up using the following [`dlp240.poly`](dlp240.poly):
```
cat > dlp240.poly <<EOF
......@@ -136,13 +150,13 @@ to get a reliable estimate.
In production there is no need to activate the on-the-fly duplicate
removal which is supposed to be cheap but maybe not negligible and it is
important to emulate the fact that the cached factor base (the /tmp/fbc
file) is precomputed and hot (i.e., cached in memory by the OS and/or the hard-drive),
because this is the situation in
production; for this, it suffices to start a first run and interrupt it
as soon as the cache is written. Of course, we use the batch smoothness
detection on side 1 and we have to precompute the product of all primes
to be extracted. This means that on the other hand, the file
`dlp240.fb1.gz` is _not_ needed in production.
file) is precomputed and hot (i.e., cached in memory by the OS and/or the
hard-drive), because this is the situation in production; for this, it
suffices to start a first run and interrupt it as soon as the cache is
written. Of course, we use the batch smoothness detection on side 1 and
we have to precompute the product of all primes to be extracted. This
means that on the other hand, the file `dlp240.fb1.gz` is _not_ needed in
production.
```
$CADO_BUILD/sieve/ecm/precompbatch -poly dlp240.poly -lim1 0 -lim0 536870912 -batch0 dlp240.batch0 -batch1 dlp240.batch1 -batchlpb0 29 -batchlpb1 28
......@@ -181,31 +195,24 @@ we obtain about 2430 core.years for the total sieving time.
## Estimating linear algebra time (coarsely)
After the fact, we know the matrix size for DLP-240 (about 36M, density
250 per row). The good thing is that it is not too far from the matrix
size predicted above (37M). It is important to know that the basic
characteristics of the matrix that can be expected from filtering are
sufficient to give a rough idea of the computational cost of linear
algebra. Thanks to the filtering simulations, these timings may be
obtained ahead of time.
To determine ahead of time the linear algebra time for a sparse binary
matrix with (say) 37M rows/columns and 250 non-zero entries per row, it
is possible to _stage_ a real set-up, just for the purpose of
measurement. cado-nfs has a useful _staging_ mode precisely for that
purpose. In the RSA-240 context, we advise against its use because of
bugs, but these bugs seem to be less of a hurdle in the DLP-240 case. The
only weirdness is a random distribution problem (as of
only weirdness is that the random distribution of the generated matrices
seems to be inconsistent with the requested density (as of
[8a72ccdde](https://gitlab.inria.fr/cado-nfs/cado-nfs/commit/8a72ccdde)
at least). To obtain sample timings, one can run the command:
```
DATA=$DATA CADO_BUILD=$CADO_BUILD MPI=$MPI nrows=37000000 density=$((250*4/5)) nthreads=32 ./dlp240-linalg-0a-estimate_linalg_time_coarse_method_a.sh
```
(where the multiplication by 4/5 is here to "fix" the distribution
issue). This reports an anticipated time of about 2.65 seconds per
iteration (running on 4 nodes of the `grvingt` cluster).
(where the multiplication by 4/5 is here to "fix" the issue with the
random distribution). This reports an anticipated time of about 2.65
seconds per iteration (running on 4 nodes of the `grvingt` cluster).
To obtain perhaps more accurate timings, the following procedure can also be
To obtain timings in a different way, the following procedure can also be
used, maybe as a complement to the above, to generate a complete fake matrix of the required size with the
[`generate_random_matrix.sh`](generate_random_matrix.sh) script (which
takes well over an hour), and measure the time for 128 iterations (which
......@@ -218,25 +225,28 @@ adjusted to the users' needs.
DATA=$DATA CADO_BUILD=$CADO_BUILD MPI=$MPI nrows=37000000 density=250 nthreads=32 ./dlp240-linalg-0a-estimate_linalg_time_coarse_method_b.sh
```
TODO: update. The text below is from rsa240...
This reports about 1.3 seconds per iteration. Allowing for some
inaccuracy, this experiment is sufficient to build confidence that the
This second method reports about 3.1 seconds per iteration. Allowing for some
inaccuracy, these experiments are sufficient to build confidence that the
time per iteration in the krylov (a.k.a. "sequence") step of block
Wiedemann is about 1.2 to 1.5 seconds per iteration (handling 64-bit wide
vectors). The time per iteration in the mksol (a.k.a. "evaluation") step
Wiedemann is close to seconds per iteration.
The time per iteration in the mksol (a.k.a. "evaluation") step
is in the same ballpark. The time for krylov+mksol can then be estimated
as the product of this timing with `(1+n/m+64/n)*(N/64)`, with `N` the
as the product of this timing with `(1+n/m+1/n)*N`, with `N` the
number of rows, and `m` and `n` the block Wiedemann parameters (we chose
`m=512` and `n=256`). Applied to our use case, this gives an anticipated
cost of `(1+n/m+64/n)*(N/64)*1.3*8*32/3600/24/365=86.6` core-years for
Krylov+Mksol (8 and 32 representing the fact that we used 8-node jobs
`m=48` and `n=16`). Applied to our use case, this gives an anticipated
cost of `(1+n/m+1/n)*N*3*4*32/3600/24/365=628` core-years for
Krylov+Mksol (4 and 32 representing the fact that we used 4-node jobs
with 32-physical cores per node).
Because the parallel code for the "lingen" (linear generator) step of
block Wiedemann was not ready when the computation started, we did no
attempt to anticipate the timing. We were confident that the cost was
going to be minor anyway.
The "lingen" (linear generator) step of
block Wiedemann was perceived as he main potential stumbling block for
the computation. We had to ensure that it would be doable with the
resources we had. To this end, a "tuning" of the lingen program can be
done with the `--tune` flag, so as to get an advance look at the cpu and
memory requirements for that step. These tests were sufficient to
convince us that we had several possible parameter settings to choose
from, and that this computation was doable.
## Validating the claimed sieving results
......@@ -269,23 +279,41 @@ extrapolate.
## Reproducing the filtering results
All relation files collected during sieving were collated into only a
managable number of large files (150 files of 3.2GB each). These had to
undergo filtering in order to produce a linear system. The process is as
follows.
The filtering follows the same general workflow as in the [rsa-240
case](../rsa240/filtering.md), with some notable changes:
- important companion files must be generated beforehand with
```
$CADO_BUILD/numbertheory/badideals -poly dlp240.poly -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -badidealinfo $DATA/dlp240.badidealinfo -badideals $DATA/dlp240.badideals
```
- command-line flag `-dl -badidealinfo $DATA/dlp240.badidealinfo` must be added to the `dup2` program.
- the `merge` and `replay` programs must be replaced by `merge-dl` and
`replay-dl`
- the `replay-dl` command line lists an extra output file
`dlp240.ideals` that is extremely important for the rest of the
computation.
Several filtering experiments were done during the sieving phase.
The final one can be reproduced as follows, with revision 492b804fc:
The final one can be reproduced as follows, with revision `492b804fc`:
```
purge -out purged7.gz -nrels 2380725637 -outdel relsdel7.gz -keep 3 -col-min-index 0 -col-max-index 2960421140 -t 56 -required_excess 0.0 files
$CADO_BUILD/filter/purge -out $DATA/purged7.gz -nrels 2380725637 -outdel $DATA/relsdel7.gz -keep 3 -col-min-index 0 -col-max-index 2960421140 -t 56 -required_excess 0.0 files
```
where `files` is the list of files with unique relations (output of `dup2`).
This took about 7.5 hours on the machine wurst, with 575GB of peak memory.
The merge step can be reproduced as follows:
```
merge-dl -mat purged7.gz -out history250_7 -target_density 250 -skip 0 -t 28
$CADO_BUILD/filter/merge-dl -mat $DATA/purged7.gz -out $DATA/history250_7 -target_density 250 -skip 0 -t 28
```
and took about 20 minutes on the machine wurst, with a peak memory of 118GB.
Finally the replay step can be reproduced as follows:
```
replay-dl -purged purged7.gz -his history250_7.gz -out p240.matrix7.250.bin -index p240.index7.gz -ideals p240.ideals7.gz
$CADO_BUILD/filter/replay-dl -purged $DATA/purged7.gz -his $DATA/history250_7.gz -out $DATA/dlp240.matrix7.250.bin -index $DATA/dlp240.index7.gz -ideals $DATA/dlp240.ideals7.gz
```
## Estimating linear algebra time more precisely, and choosing parameters
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment