Commit 8e6ec640 authored by ZIMMERMANN Paul's avatar ZIMMERMANN Paul
Browse files

another pass of proof-reading

parent d5fff7f9
......@@ -13,10 +13,10 @@ There are several subsections.
* [Validating the claimed sieving results](#validating-the-claimed-sieving-results)
* [Reproducing the filtering results](#reproducing-the-filtering-results)
* [Duplicate removal](#duplicate-removal)
* [The "purge" step, a.k.a. singleton and "clique" removal.](#the-purge-step-aka-singleton-and-clique-removal)
* [The "purge" step, a.k.a. singleton and "clique" removal](#the-purge-step-aka-singleton-and-clique-removal)
* [The "merge" step](#the-merge-step)
* [The "replay" step](#the-replay-step)
* [Computing the right-hand side.](#computing-the-right-hand-side)
* [Computing the right-hand side](#computing-the-right-hand-side)
* [Estimating the linear algebra time more precisely, and choosing parameters](#estimating-the-linear-algebra-time-more-precisely-and-choosing-parameters)
* [Reproducing the linear algebra results](#reproducing-the-linear-algebra-results)
* [Back-substituting the linear algebra result in collected relations](#back-substituting-the-linear-algebra-result-in-collected-relations)
......@@ -25,10 +25,7 @@ There are several subsections.
## Software prerequisites, and reference hardware configuration
This is similar to
[RSA-240](../rsa240/README.md#software-prerequisites-and-reference-hardware-configuration). For full reproducibility of the
computation, 10TB of data is perhaps a bit small; 20TB would be a more
comfortable setup.
[RSA-240](../rsa240/README.md#software-prerequisites-and-reference-hardware-configuration).
As in the RSA-240 case, this documentation relies on [commit 8a72ccdde of
cado-nfs](https://gitlab.inria.fr/cado-nfs/cado-nfs/commit/8a72ccdde) as
baseline. In some cases we provide exact commit numbers for specific
......@@ -44,7 +41,7 @@ used by some scripts, should point to a directory with plenty of storage,
possibly on some shared filesystem. Storage is also needed to store the
temporary files with collected relations. Overall, a full reproduction of
the computation would need in the vicinity of 20TB of storage. All
scripts provided in this script expect to be run from the directory where
scripts provided in this document expect to be run from the directory where
they are placed, since they are also trying to access companion data
files.
......@@ -95,7 +92,7 @@ In particular, the best ranked polynomial pair according to MurphyE finds
10% fewer relations than the actual best-performing ones.
Additional sample sieving was performed on the few best candidates. With
a test on 128,000 special-q, 3 polynomials could not be separated.
a test on 128,000 special-q's, 3 polynomials could not be separated.
We ended up using the following [`dlp240.poly`](dlp240.poly):
......@@ -132,7 +129,7 @@ $CADO_BUILD/sieve/makefb -poly dlp240.poly -side 1 -lim 268435456 -maxbits 16 -t
These files have size 209219374 and 103814592 bytes, respectively. They
take less than a minute to compute.
We can now sieve for randomly sampled special-q, removing duplicate
We can now sieve for randomly sampled special-q's, removing duplicate
relations on the fly. In the output of the command line below, only the
number of unique relations per special-q matters. The timing does not
matter.
......@@ -150,9 +147,9 @@ order to vary the random choices.)
In order to deduce an estimate of the total number of (de-duplicated)
relations, we need to multiply the average number of relations per
special-q obtained during the sample sieving by the number of
special-q in the global q-range. This number of composite special-q can
special-q's in the global q-range. This number of composite special-q's can
be computed exactly or estimated using the logarithmic integral function.
For the target interval, there are 3.67e9 special-q.
For the target interval, there are 3.67e9 special-q's.
```python
# [sage]
......@@ -162,9 +159,9 @@ tot_rels = ave_rel_per_sq * number_of_sq
print (tot_rels)
```
This estimate (2.2G relations) can be made more precise by increasing the
number of special-q that are sampled for sieving. It is also possible to
number of special-q's that are sampled for sieving. It is also possible to
have different nodes sample different sub-ranges of the global range to
get the result more quickly. We consider sampling 1024 special-qs to be
get the result more quickly. We consider sampling 1024 special-q's to be
enough to get a reliable estimate.
## Estimating the cost of sieving
......@@ -205,10 +202,10 @@ sys 5m56.262s
```
Then the `22m19.032s` value must be appropriately scaled in order to
convert it to physical core-seconds. For instance, in our
case, since there are 32 cores and we sieved 2048 special-qs, this gives
case, since there are 32 cores and we sieved 2048 special-q's, this gives
`(22*60+19.0)*32/2048=20.9` core-seconds per special-q.
Finally, we need to multiply by the number of special-q in this
Finally, we need to multiply by the number of special-q's in this
subrange. We get (in Sagemath):
```python
......@@ -233,7 +230,7 @@ corresponding binaries were built.
To determine the linear algebra time ahead of time for a sparse binary
matrix with (say) 37M rows/columns and 250 non-zero entries per row, it
is possible to _stage_ a real set-up, just for the purpose of
measurement. cado-nfs has a useful _staging_ mode precisely for this
measurement. Cado-nfs has a useful _staging_ mode precisely for this
purpose. In the RSA-240 context, we advise against its use because of
bugs, but these bugs seem to be less of a hurdle in the DLP-240 case. The
only weirdness is that the random distribution of the generated matrices
......@@ -263,10 +260,10 @@ DATA=$DATA CADO_BUILD=$CADO_BUILD MPI=$MPI nrows=37000000 density=250 nthreads=3
This second method reports about 3.1 seconds per iteration. Allowing for
some inaccuracy, these experiments are sufficient to build confidence
that the time per iteration in the krylov (a.k.a. "sequence") step of
that the time per iteration in the Krylov (a.k.a. "sequence") step of
block Wiedemann is close to 3 seconds per iteration, perhaps slightly less.
The time per iteration in the mksol (a.k.a. "evaluation") step is in the
same ballpark. The time for krylov+mksol can then be estimated as the
The time per iteration in the Mksol (a.k.a. "evaluation") step is in the
same ballpark. The time for Krylov+Mksol can then be estimated as the
product of this timing with `(1+n/m+1/n)*N`, with `N` the number of rows,
and `m` and `n` the block Wiedemann parameters (we chose `m=48` and
`n=16`). Applied to our use case, this gives an anticipated cost of
......@@ -291,7 +288,7 @@ option and to adjust the `-q0` and `-q1` parameters in order to create
many small work units that in the end cover exactly the global q-range.
Since we do not expect anyone to spend as much computing resources
to perform again exactly the same computation again, we provide the count
to perform exactly the same computation again, we provide the count
of how many (non-unique) relations were produced for each 1G special-q
sub-range in the [`dlp240-rel_count`](dlp240-rel_count) file.
......@@ -326,12 +323,12 @@ The filtering follows roughly the same general workflow as in the
$CADO_BUILD/numbertheory/badideals -poly dlp240.poly -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -badidealinfo $DATA/dlp240.badidealinfo -badideals $DATA/dlp240.badideals
$CADO_BUILD/sieve/freerel -poly dlp240.poly -renumber $DATA/dlp240.renumber.gz -lpb0 35 -lpb1 35 -out $DATA/dlp240.freerel.gz -badideals $DATA/dlp240.badideals -lcideals -t 32
```
- the command line flags `-dl -badidealinfo $DATA/dlp240.badidealinfo` must be added to the `dup2` program.
- the command line flags `-dl -badidealinfo $DATA/dlp240.badidealinfo` must be added to the `dup2` program
- the `merge` and `replay` programs must be replaced by `merge-dl` and
`replay-dl`, respectively
- the `replay-dl` command line lists an extra output file
`dlp240.ideals` that is extremely important for the rest of the
computation.
computation
- as the linear system to be solved is inhomogenous, another program
must be called in order to compute the right-hand side of the system.
......@@ -367,7 +364,7 @@ done
$DATA/dlp240.badidealinfo` arguments to the `dup2` program must be
replaced by `-poly dlp240.poly`.)
### The "purge" step, a.k.a. singleton and "clique" removal.
### The "purge" step, a.k.a. singleton and "clique" removal
Here is the command line of the last filtering run that we used (revision `492b804fc`), with `EXP=7`:
```shell
......@@ -391,7 +388,7 @@ Finally the replay step can be reproduced as follows:
$CADO_BUILD/filter/replay-dl -purged $DATA/purged$EXP.gz -his $DATA/history250_$EXP.gz -out $DATA/dlp240.matrix$EXP.250.bin -index $DATA/dlp240.index$EXP.gz -ideals $DATA/dlp240.ideals$EXP.gz
```
### Computing the right-hand side.
### Computing the right-hand side
This is done with a program called `sm`. There are several variants of this program, and several ways to invoke it. Here is the command line that we used. Note that we use the file `$DATA/dlp240.index$EXP.gz` that was just created by the above step.
```shell
$CADO_BUILD/filter/sm -poly dlp240.poly -purged $DATA/purged$EXP.gz -index $DATA/dlp240.index$EXP.gz -out $DATA/dlp240.matrix$EXP.250.sm -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -t 56
......@@ -413,7 +410,7 @@ related concerns.
We did several filtering experiments based on the DLP-240 data set, as
relations kept coming in. For each of these experiments, we give the
number of raw relations, the number of relations after the initial
number of unique relations, the number of relations after the initial
"purge" step, as well as the number of rows of the final matrix after
"merge", for target densities d=150, d=200, and d=250.
......@@ -484,9 +481,9 @@ numbered `0-1`, `1-2`, until `15-16`). These sequences can
be run concurrently on different sets of nodes, with no synchronization
needed. Each of these 16 sequences needs about 90 days to complete (in
practice, we used a different platform than the one we report timings
for, but the timings and calendar time was in the same ballpark). Jobs
for, but the timings and calendar time were in the same ballpark). Jobs
can be interrupted, and may be restarted exactly
where they left off. E.g., if the latest of the `V1-2.*` files in
where they were left off. E.g., if the latest of the `V1-2.*` files in
`$DATA` is `V1-2.86016`, then the job for sequence 1 can be restarted
with:
```shell
......@@ -529,7 +526,7 @@ All steps `8-mksol.sh` above can be run in parallel (they use the `V*`
files produced in steps `3-krylov` above as a means to jump-start the
computation in the middle). Each uses 8 nodes and takes about 13 hours to
complete (1.43 seconds per iteration). Note that in order to bench the
mksol timings ahead of time, it is possible to create fake files with
Mksol timings ahead of time, it is possible to create fake files with
random data, named as follows
```
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.0-1
......@@ -583,7 +580,7 @@ $MPI/bin/mpiexec [[your favorite mpiexec args]] $CADO_BUILD/filter/sm_append -el
```
We did this in 8 hours on 16 grvingt nodes. Note that the files
`$DATA/purged7_withsm.txt` and `$DATA/relsdel7_withsm.txt` are quite
big: 158G and 2.3TB, respectively.
big: 158GB and 2.3TB, respectively.
Once this precomputation is done, the two big files can be used as
drop-in replacements to the corresponding files in the
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment