Commit 5053df90 authored by ZIMMERMANN Paul's avatar ZIMMERMANN Paul
Browse files

another pass of proof-reading

parent 8e6ec640
......@@ -60,7 +60,7 @@ debian 9 or debian 10). Typical software used were the GNU C compilers
versions 6 to 9, or Open MPI versions 4.0.1 to 4.0.3.
Most (if not all) information boxes in this document rely on two shell
variables, `CADO_BUILD` and `DATA`, be set and `export`-ed to shell
variables, `CADO_BUILD` and `DATA`, be set and exported to shell
subprocess (as with `export CADO_BUILD=/blah/... ; export
DATA=/foo/...`). The `CADO_BUILD` variable is assumed to be the path to
a successful cado-nfs build directory. The `DATA` variable, which is
......@@ -68,7 +68,7 @@ used by some scripts, should point to a directory with plenty of storage,
possibly on some shared filesystem. Storage is also needed to store the
temporary files with collected relations. Overall, a full reproduction of
the computation would need in the vicinity of 10TB of storage. All
scripts provided in this script expect to be run from the directory where
scripts provided in this document expect to be run from the directory where
they are placed, since they are also trying to access companion data
files.
......@@ -127,7 +127,7 @@ to compute on the `grvingt` computers.
The hint file is [`rsa240.hint`](rsa240.hint), and has a weird format.
The following basically says "Three algebraic large primes for special-q
The following basically says "Three algebraic large primes for special-q's
less than 2^31, and two otherwise."
```shell
......@@ -139,7 +139,7 @@ cat > rsa240.hint <<EOF
EOF
```
We can now sieve for random-sampled special-q, and remove duplicate
We can now sieve for random-sampled special-q's, and remove duplicate
relations on the fly. In the output of the command line below, only the
number of unique relations per special-q matters. The timing does not
matter.
......@@ -157,7 +157,7 @@ order to vary the random picks.)
In order to derive an estimate of the total number of (de-duplicated)
relations, it is necessary to multiply the average number of relations per
special-q as obtained during the sample sieving by the number of
special-q in the global q-range. The latter can be precisely estimated
special-q's in the global q-range. The latter can be precisely estimated
using the logarithmic integral function as an approximation of the number
of degree-1 prime ideals below a bound. In
[Sagemath](https://www.sagemath.org/) code, this gives:
......@@ -171,9 +171,9 @@ print (tot_rels)
# 5.88556387364565e9
```
This estimate (5.9G relations) can be made more precise by increasing the
number of special-q that are sampled for sieving. It is also possible to
number of special-q's that are sampled for sieving. It is also possible to
have different nodes sample different sub-ranges of the global range to
get the result faster. Sampling 1024 special-qs should be
get the result faster. Sampling 1024 special-q's should be
enough to get a reliable estimate.
## Estimating the cost of sieving
......@@ -231,10 +231,10 @@ sys 43m15.877s
```
Then the `75m54.351s=4554.3s` value must be appropriately scaled in order
to convert it into physical core-seconds. For instance, in our case,
since there are 32 physical cores and we sieved 1024 special-qs, this
since there are 32 physical cores and we sieved 1024 special-q's, this
gives `4554.3*32/1024=142.32` core.seconds per special-q.
Finally, we need to to multiply by the number of special-q in this
Finally, we need to to multiply by the number of special-q's in this
subrange. We get (in Sagemath):
```python
......@@ -250,7 +250,7 @@ With this experiment, we estimate about 279 core.years for this sub-range.
#### Cost of 1-sided sieving + batch in the q-range [2.1e9,7.4e9]
For special-qs larger then 2.1e9, since we are using batch smoothness
For special-q's larger then 2.1e9, since we are using batch smoothness
detection on side 0, we have to precompute the `rsa240.batch0` file which
contains the product of all primes to be extracted. (Note that the
`-batch1` option is mandatory, even if for our parameters, no file is
......@@ -285,13 +285,13 @@ is as follows:
./rsa240-sieve-batch.sh -q0 2100000000 -q1 2100100000
```
The script prints the start and end date on stdout. The number
of special-qs that have been processed can be found in the output of
of special-q's that have been processed can be found in the output of
`las`, which is written to `$DATA/log/las.${q0}-${q1}.out`. One can again
deduce the cost in core-seconds to process one special-q from this
information, and then the overall cost of sieving the q-range [2.1e9,7.4e9].
The design of this script imposes a rather long range of
special-q to handle for each run of `rsa240-sieve-batch.sh`. Indeed,
special-q's to handle for each run of `rsa240-sieve-batch.sh`. Indeed,
during the final minutes, the `finishbatch` jobs need to take care of the
last survivor files while `las` is no longer running, so the node is
not fully occupied. If the `rsa240-sieve-batch.sh` job takes a few hours,
......@@ -363,7 +363,7 @@ takes only a few minutes). Within the script
several implementation-level parameters are set, and should probably be
adjusted to the users' needs. Along with the `DATA` and `CADO_BUILD`
variables, the script below also requires the `MPI` shell variable to
be set and `export`-ed, so that `$MPI/bin/mpiexec` can actually run MPI
be set and exported, so that `$MPI/bin/mpiexec` can actually run MPI
programs. In all likelihood, this script needs to be tweaked depending on
the specifics of how MPI programs should be run on the target platform.
```shell
......@@ -375,7 +375,7 @@ inaccuracy, this experiment is sufficient to build confidence that the
time per iteration in the Krylov (a.k.a. "sequence") step of block
Wiedemann is about 1.2 to 1.5 seconds per iteration (handling 64-bit wide
vectors). The time per iteration in the Mksol (a.k.a. "evaluation") step
is in the same ballpark. The time for krylov+mksol can then be estimated
is in the same ballpark. The time for Krylov+Mksol can then be estimated
as the product of this timing with `(1+n/m+64/n)*(N/64)`, with `N` the
number of rows, and `m` and `n` the block Wiedemann parameters (we chose
`m=512` and `n=256`). Applied to our use case, this gives an anticipated
......@@ -396,7 +396,7 @@ option and to adjust the `-q0` and `-q1` to create many small work units
that in the end cover exactly the global q-range.
Since we do not expect anyone to spend as many computing resources
to perform again exactly the same computation again, we provide the count
to perform exactly the same computation again, we provide the count
of how many (non-unique) relations were produced for each 100M special-q
sub-range in the [`rsa240-rel_count`](rsa240-rel_count) file .
......@@ -433,7 +433,7 @@ information on these steps.
The filtering output is controlled by a wealth of tunable parameters.
However on the very coarse-grain level we focus on two of them:
* _when_ we decide to stop relation collection.
* _when_ we decide to stop relation collection,
* _how dense_ we want the final matrix to be.
Sieving more is expected to have a beneficial impact on the matrix size,
......@@ -444,7 +444,7 @@ related concerns.
We did several filtering experiments based on the RSA-240 data set, as
relations kept coming in. For each of these experiments, we give the
number of raw relations, the number of relations after the initial
number of unique relations, the number of relations after the initial
"pruning" step of filtering (called "purge" in cado-nfs), as well as the
number of rows of the final matrix after "merge", for target densities
d=100, d=150, and d=200.
......@@ -506,7 +506,7 @@ where the last 4 lines (steps `3-krylov`) correspond to the 4 "sequences"
These sequences can be run concurrently on different sets of nodes, with
no synchronization needed. Each of these 4 sequences needs about 25 days
to complete. Jobs can be interrupted, and can simply be restarted
exactly from the point where they left off. E.g., if the latest of the `V64-128.*`
exactly from the point where they were left off. E.g., if the latest of the `V64-128.*`
files in `$DATA` is `V64-128.86016`, then the job for sequence 1 can be
restarted with:
```shell
......@@ -548,8 +548,8 @@ export MPI
All steps `8-mksol.sh` above can be run in parallel (they use the `V*`
files produced in steps `3-krylov` above as a means to jump-start the
computation in the middle). Each uses 8 nodes and takes about 13 hours to
complete (1.43 seconds per iteration). Note that in order to bench the
mksol timings ahead of time, it is possible to create fake files named as
complete (1.43 seconds per iteration). Note that in order to benchmark the
Mksol timings ahead of time, it is possible to create fake files named as
follows
```
-rw-r--r-- 1 ethome users 564674048 Nov 20 21:47 F.sols0-64.0-64
......
......@@ -7,17 +7,17 @@ so-called "renumber table", as follows.
```
$CADO_BUILD/sieve/freerel -poly rsa240.poly -renumber $DATA/rsa240.renumber.gz -lpb0 36 -lpb1 37 -out $DATA/rsa240.freerel -t 32
```
where `-t 32` specifies the number of thread. This was done with revision
where `-t 32` specifies the number of threads. This was done with revision
`30a5f3eae` of cado-nfs, and takes several hours. (Note that newer
versions of cado-nfs changed the format of this file.)
## Duplicate removal
Duplicate removal was done with revision `50ad0f1fd` of cado-nfs.
cado-nfs proceeds through two passes. We used the default cado-nfs
Cado-nfs proceeds through two passes. We used the default cado-nfs
setting which, on the first pass, splits the input into `2^2=4`
independent slices, with no overlap. cado-nfs supports doing this step
in an incremental way, so that we assume below the the shell variable
independent slices, with no overlap. Cado-nfs supports doing this step
in an incremental way, so that we assume below the shell variable
`EXP` expands to an integer indicating the filtering experiment number.
In the command below, `$new_files` is expected to expand to a file
containing a list of file names of new relations (relative to `$DATA`) to
......@@ -40,7 +40,7 @@ for i in {0..3} ; do
$CADO_BUILD/filter/dup2 -nrels $nrels -renumber $DATA/rsa240.renumber.gz $DATA/dedup/$i/dedup*gz > $DATA/dup2.$EXP.$i.stdout 2> $DATA/dup2.$EXP.$i.stderr
done
```
(Note: in newer versions of cado-nfs, after june 2020, the `dup2`
(Note: in newer versions of cado-nfs, after June 2020, the `dup2`
programs also requires the arguments `-poly rsa240.poly`.)
## The "purge" step, a.k.a. singleton and "clique" removal.
......
......@@ -15,8 +15,7 @@ for admin from 0 to 2000000000000 by 2500000:
```
We found 39890071 size-optimized polynomials, we kept the 104 most promising
ones (i.e., the ones with the smallest `exp_E` value). The best `exp_E` was
57.78, the worst `exp_E` was 59.49.
ones (i.e., the ones with the smallest `exp_E` value). Among them, the best `exp_E` was 57.78, the worst `exp_E` was 59.49.
We used the following command line for root optimization (still with cado-nfs
revision `52ac92746`), where `candidates` is the file containing all candidates
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment