Commit 00295141 authored by Emmanuel Thomé's avatar Emmanuel Thomé
Browse files

language hints in block quotes

parent a570c135
......@@ -30,7 +30,7 @@ files.
## Searching for a polynomial pair
We searched for a polynomial pair ``à la Joux-Lercier'', using the
We searched for a polynomial pair _à la_ Joux-Lercier, using the
program `dlpolyselect` of `cado-nfs`, with parameters `-bound 150`,
`-degree 4` and `-modm 1000003`. We tried also to search for a skewed
pair of polynomials with the `-skewed` option, but this did not seem to
......@@ -38,7 +38,7 @@ give a better polynomial for a fixed amount of time compared to plain
flat polynomials.
A typical command line for an individual work unit was:
```
```shell
$CADO_BUILD/polyselect/dlpolyselect -N 124620366781718784065835044608106590434820374651678805754818788883289666801188210855036039570272508747509864768438458621054865537970253930571891217684318286362846948405301614416430468066875699415246993185704183030512549594371372159029285303 -df 4 -dg 3 -area 2.0890720927744e+20 -Bf 34359738368.0 -Bg 34359738368.0 -bound 150 -modm 1000003 -modr 42 -t 8
```
where `-modr 42` gives the index of the task, and all tasks between 0 and
......@@ -79,7 +79,7 @@ a test on 128,000 special-q, 3 polynomials could not be separated.
We ended up using the following [`dlp240.poly`](dlp240.poly):
```
```shell
cat > dlp240.poly <<EOF
n: 124620366781718784065835044608106590434820374651678805754818788883289666801188210855036039570272508747509864768438458621054865537970253930571891217684318286362846948405301614416430468066875699415246993185704183030512549594371372159029285303
poly0: -236610408827000256250190838220824122997878994595785432202599,-18763697560013016564403953928327121035580409459944854652737,24908820300715766136475115982439735516581888603817255539890,286512172700675411986966846394359924874576536408786368056
......@@ -104,7 +104,7 @@ Here is what it gives with final parameters used in the computation. Here,
`-t 16` specifies the number of threads (more is practically useless, since
gzip soon becomes the limiting factor).
```
```shell
$CADO_BUILD/sieve/makefb -poly dlp240.poly -side 0 -lim 536870912 -maxbits 16 -t 16 -out $DATA/dlp240.fb0.gz
$CADO_BUILD/sieve/makefb -poly dlp240.poly -side 1 -lim 268435456 -maxbits 16 -t 16 -out $DATA/dlp240.fb1.gz
```
......@@ -117,7 +117,7 @@ relations on-the-fly. In the output of the command line below, only the
number of unique relations per special-q matters. The timing does not
matter.
```
```shell
$CADO_BUILD/sieve/las -poly dlp240.poly -fb0 $DATA/dlp240.fb0.gz -fb1 $DATA/dlp240.fb1.gz -lim0 536870912 -lim1 268435456 -lpb0 35 -lpb1 35 -q0 150e9 -q1 300e9 -dup -dup-qmin 150000000000,0 -sqside 0 -A 31 -mfb0 70 -mfb1 70 -lambda0 2.2 -lambda1 2.2 -random-sample 1024 -allow-compsq -qfac-min 8192 -qfac-max 100000000 -allow-largesq -bkmult 1.10 -t auto -v -fbc /tmp/dlp240.fbc
```
......@@ -132,8 +132,8 @@ special-q in the global q-range. This number of composite special-q can
be computed exactly or estimated using the logarithmic integral function.
For the target interval, there are 3.67e9 special-q.
```
[sage]
```python
# sage
ave_rel_per_sq = 0.61 ## pick value ouput by las
number_of_sq = 3.67e9
tot_rels = ave_rel_per_sq * number_of_sq
......@@ -159,7 +159,7 @@ to be extracted. This means that on the other hand, the file
`$DATA/dlp240.fb1.gz` is _not_ needed in production (we only used it for
the estimation of the number of unique relations).
```
```shell
$CADO_BUILD/sieve/ecm/precompbatch -poly dlp240.poly -lim1 0 -lim0 536870912 -batch0 /dev/null -batch1 $DATA/dlp240.batch1 -batchlpb0 29 -batchlpb1 28
```
......@@ -183,8 +183,8 @@ case, since there are 32 cores and we sieved 2048 special-qs, this gives
Finally, it remains to multiply by the number of special-q in this
subrange. We get (in Sagemath):
```
[sage]
```python
# sage
cost_in_core_sec=3.67e9*20.9
cost_in_core_hours=cost_in_core_sec/3600
cost_in_core_years=cost_in_core_hours/24/365
......@@ -212,7 +212,7 @@ only weirdness is that the random distribution of the generated matrices
seems to be inconsistent with the requested density (as of
[8a72ccdde](https://gitlab.inria.fr/cado-nfs/cado-nfs/commit/8a72ccdde)
at least). To obtain sample timings, one can run the command:
```
```shell
DATA=$DATA CADO_BUILD=$CADO_BUILD MPI=$MPI nrows=37000000 density=$((250*4/5)) nthreads=32 ./dlp240-linalg-0a-estimate_linalg_time_coarse_method_a.sh
```
(where the multiplication by 4/5 is here to "fix" the issue with the
......@@ -229,7 +229,7 @@ takes only a few minutes). Within the script
several implementation-level parameters are set, and should probably be
adjusted to the users' needs.
```
```shell
DATA=$DATA CADO_BUILD=$CADO_BUILD MPI=$MPI nrows=37000000 density=250 nthreads=32 ./dlp240-linalg-0a-estimate_linalg_time_coarse_method_b.sh
```
......@@ -316,7 +316,7 @@ integer shell variable `$EXP` was increased by one (starting from 1).
In the command below, `$new_files` is expected to expand to a file
containing a list of file names of new relations (relative to `$DATA`) to
add to the stored set of relations.
```
```shell
mkdir -p $DATA/dedup/{0..3}
$CADO_BUILD/filter/dup1 -prefix dedup -out $DATA/dedup/ -basepath $DATA -filelist $new_files -n 2 > $DATA/dup1.$EXP.stdout 2> $DATA/dup1.$EXP.stderr
grep '^# slice.*received' $DATA/dup1.$EXP.stderr > $DATA/dup1.$EXP.per_slice.txt
......@@ -329,7 +329,7 @@ The second pass of duplicate removal works independently on each of the
non-overlapping slices. The number of slices can thus be used as a sort
of time-memory tradeoff (here, `-n 2` tells the program to do `2^2=4`
slices).
```
```shell
for i in {0..3} ; do
nrels=`awk '/slice '$i' received/ { x+=$5 } END { print x; }' $DATA/dup1.*.per_slice.txt`
$CADO_BUILD/filter/dup2 -nrels $nrels -renumber $DATA/dlp240.renumber.gz -dl -badidealinfo $DATA/dlp240.badidealinfo $DATA/dedup/$i/dedup*gz > $DATA/dup2.$EXP.$i.stdout 2> $DATA/dup2.$EXP.$i.stderr
......@@ -342,7 +342,7 @@ replaced by `-poly dlp240.poly`.)
### The "purge" step, a.k.a. singleton and "clique" removal.
Here is the command line of the last filtering run that we used (revision `492b804fc`), with `EXP=7`:
```
```shell
nrels=$(awk '/remaining/ { x+=$4; } END { print x }' $DATA/dup2.$EXP.[0-3].stderr)
colmax=2960421140
$CADO_BUILD/filter/purge -out $DATA/purged$EXP.gz -nrels $nrels -outdel $DATA/relsdel$EXP.gz -keep 3 -col-min-index 0 -col-max-index $colmax -t 56 -required_excess 0.0 $DATA/dedup/*/dedup*gz
......@@ -352,20 +352,20 @@ This took about 7.5 hours on the machine wurst, with 575GB of peak memory.
### The "merge" step
The merge step can be reproduced as follows (still with `EXP=7` for the
final experiment).
```
```shell
$CADO_BUILD/filter/merge-dl -mat $DATA/purged$EXP.gz -out $DATA/history250_$EXP -target_density 250 -skip 0 -t 28
```
This took about 20 minutes on the machine wurst, with a peak memory of 118GB.
### The "replay" step
Finally the replay step can be reproduced as follows:
```
```shell
$CADO_BUILD/filter/replay-dl -purged $DATA/purged$EXP.gz -his $DATA/history250_$EXP.gz -out $DATA/dlp240.matrix$EXP.250.bin -index $DATA/dlp240.index$EXP.gz -ideals $DATA/dlp240.ideals$EXP.gz
```
### Computing the right-hand side.
This is done with a program called `sm`. There are several variants of this program, and several ways to invoke it. Here is the command line that we used. Note that we use the file `$DATA/dlp240.index$EXP.gz` that was just created by the above step.
```
```shell
$CADO_BUILD/filter/sm -poly dlp240.poly -purged $DATA/purged$EXP.gz -index $DATA/dlp240.index$EXP.gz -out $DATA/dlp240.matrix$EXP.250.sm -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -t 56
```
This took about four hours on the machine wurst.
......@@ -404,7 +404,7 @@ This script needs the `MPI`, `DATA`, `matrix`, and `CADO_BUILD` to be
set. It can be used as follows, where `$matrix` points to one of the
matrices that have been produced by the filter code (after the `replay`
step). For this quick bench, the right-hand-side file is not necessary.
```
```shell
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
......@@ -434,7 +434,7 @@ We decided to use the block Wiedemann parameters `m=48` and `n=16`,
running on 4-node jobs.
The first part of the computation can be done with these scripts:
```
```shell
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
......@@ -453,14 +453,14 @@ needed. Each of these 16 sequences needs about 90 days to complete (in practice,
from where they left off. E.g., if the latest of the `V1-2.*` files in
`$DATA` is `V1-2.86016`, then the job for sequence 1 can be restarted
with:
```
```shell
./dlp240-linalg-3-krylov.sh sequence=1 start=86016
```
Cheap sanity checks can be done periodically with the following script,
which does all checks it can do (note that the command is happy if it
finds _no_ check to do as well!)
```
```shell
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
......@@ -473,7 +473,7 @@ later steps. After step `5-acollect` below, a file named `A0-16.0-3016704` with
size 240950181888 bytes will be in `$DATA`. Step `6-lingen` below runs on
36 nodes, and completes in approximately one week (periodic
checkpoint/restart is supported).
```
```shell
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
......@@ -535,14 +535,14 @@ individual logarithm computations.
This is achieved by the `reconstructlog-dl` program.
```
```shell
$CADO_BUILD/filter/reconstructlog-dl -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -mt 28 -log $DATA/K.sols0-1.0.txt -out $DATA/dlp240.reconstructlog.dlog -renumber $DATA/dlp240.renumber.gz -poly dlp240.poly -purged $DATA/purged7.gz -ideals $DATA/p240.ideals7.gz -relsdel $DATA/relsdel7.gz -nrels 2380725637
```
As written, this command line takes an annoyingly large amount of time
(several weeks). It is possible to reduce this time by precomputing
uncompressed files `$DATA/purged7_withsm.txt` and `$DATA/relsdel7_withsm.txt` that have the Schirokauer maps already computed. These can be computed well ahead of time with the `sm_append` program, which also works with MPI and scales very well. An example command line is:
```
```shell
$MPI/bin/mpiexec [[your favorite mpiexec args]] $CADO_BUILD/filter/sm_append -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -poly $HERE/p240.poly -b 4096 -in "/grvingt/zimmerma/dlp240/filter/purged7.gz" -out "${HERE}/purged7.withsm.txt"
```
We did that in 8 hours on 16 grvingt nodes. Note that the files
......
......@@ -88,7 +88,7 @@ See the [`polyselect.md`](polyselect.md) file in this repository.
And the winner is the [`rsa240.poly`](rsa240.poly) file:
```
```shell
cat > rsa240.poly <<EOF
n: 124620366781718784065835044608106590434820374651678805754818788883289666801188210855036039570272508747509864768438458621054865537970253930571891217684318286362846948405301614416430468066875699415246993185704183030512549594371372159029236099
poly0: -105487753732969860223795041295860517380,17780390513045005995253
......@@ -117,7 +117,7 @@ computation of the factor base is done with the following command. Here,
`-t 16` specifies the number of threads (more is practically useless, since
gzip soon becomes the limiting factor).
```
```shell
$CADO_BUILD/sieve/makefb -poly rsa240.poly -side 1 -lim 2100000000 -maxbits 16 -t 16 -out $DATA/rsa240.fb1.gz
```
......@@ -129,7 +129,7 @@ The hint file is [`rsa240.hint`](rsa240.hint), and has a weird format.
The following basically says "Three algebraic large primes for special-q
less than 2^31, and two otherwise."
```
```shell
cat > rsa240.hint <<EOF
30@1 1.0 1.0 A=32 1800000000,36,72,2.2 2100000000,37,111,3.2
31@1 1.0 1.0 A=32 1800000000,36,72,2.2 2100000000,37,111,3.2
......@@ -143,7 +143,7 @@ relations on-the-fly. In the output of the command line below, only the
number of unique relations per special-q matters. The timing does not
matter.
```
```shell
$CADO_BUILD/sieve/las -poly rsa240.poly -fb1 $DATA/rsa240.fb1.gz -lim0 1800000000 -lim1 2100000000 -lpb0 36 -lpb1 37 -q0 8e8 -q1 7.4e9 -dup -dup-qmin 0,800000000 -sqside 1 -A 32 -mfb0 72 -mfb1 111 -lambda0 2.2 -lambda1 3.2 -random-sample 1024 -t auto -bkmult 1,1l:1.15,1s:1.4,2s:1.1 -v -bkthresh1 90000000 -adjust-strategy 2 -fbc /tmp/rsa240.fbc -hint-table rsa240.hint
```
......@@ -159,8 +159,8 @@ using the logarithmic integral function as an approximation of the number
of degree-1 prime ideals below a bound. In
[Sagemath](https://www.sagemath.org/) code, this gives:
```
[sage]
```python
# sage
ave_rel_per_sq = 19.6 ## pick value ouput by las
number_of_sq = log_integral(7.4e9) - log_integral(8e8)
tot_rels = ave_rel_per_sq * number_of_sq
......@@ -203,7 +203,7 @@ pass `-nq 0`).
In order to measure the cost of sieving in the special-q subrange where
sieving is used on both sides, the typical command-line is as follows:
```
```shell
time $CADO_BUILD/sieve/las -poly rsa240.poly -fb1 $DATA/rsa240.fb1.gz -lim0 1800000000 -lim1 2100000000 -lpb0 36 -lpb1 37 -q0 8e8 -q1 2.1e9 -sqside 1 -A 32 -mfb0 72 -mfb1 111 -lambda0 2.2 -lambda1 3.2 -random-sample 1024 -t auto -bkmult 1,1l:1.15,1s:1.4,2s:1.1 -v -bkthresh1 90000000 -adjust-strategy 2 -fbc /tmp/rsa240.fbc
```
......@@ -233,8 +233,8 @@ gives `4554.3*32/1024=142.32` core.seconds per special-q.
Finally, it remains to multiply by the number of special-q in this
subrange. We get (in Sagemath):
```
[sage]
```python
# sage
cost_in_core_sec=(log_integral(2.1e9)-log_integral(8e8))*4554.3*32/1024
cost_in_core_hours=cost_in_core_sec/3600
cost_in_core_years=cost_in_core_hours/24/365
......@@ -252,7 +252,7 @@ contains the product of all primes to be extracted. (Note the `-batch1`
option is mandatory, even if for our parameters, no file is produced on
side 1.)
```
```shell
$CADO_BUILD/sieve/ecm/precompbatch -poly rsa240.poly -lim0 0 -lim1 2100000000 -batch0 $DATA/rsa240.batch0 -batch1 $DATA/rsa240.batch1 -batchlpb0 31 -batchlpb1 30
```
......@@ -277,7 +277,7 @@ machine.
When the paths are properly set (either by having `CADO_BUILD` and
`DATA` set correctly, or by tweaking the script), a typical invocation
is as follows:
```
```shell
./rsa240-sieve-batch.sh -q0 2100000000 -q1 2100100000
```
The script prints on stdout the start and end date, and in the output of
......@@ -295,18 +295,20 @@ phase takes negligible time. Both for the benchmark and in production it
is then necessary to have jobs taking at least a few hours.
On our sample machine, here is an example of a benchmark:
```
```shell
./rsa240-sieve-batch.sh -q0 2100000000 -q1 2100100000 > /tmp/sieve-batch.out
[ wait ... ]
# [ wait ... ]
start=$(date -d "`grep "^Starting" /tmp/sieve-batch.out | head -1 | awk -F " at " '//{print $2}'`" +%s)
end=$(date -d "`grep "^End" /tmp/sieve-batch.out | tail -1 | awk -F " at " '//{print $2}'`" +%s)
nb_q=`grep "# Discarded 0 special-q's out of" /tmp/log/las.2100000000-2100100000.out | awk '{print $(NF-1)}'`
echo -n "Cost in core.sec per special-q: "; echo "($end-$start)/$nb_q*32" | bc -l
Cost in core.sec per special-q: 67.43915571828559121248
# Cost in core.sec per special-q: 67.43915571828559121248
```
[sage]
```python
# sage
cost_in_core_sec=(log_integral(7.4e9)-log_integral(2.1e9))*67.4
cost_in_core_hours=cost_in_core_sec/3600
cost_in_core_years=cost_in_core_hours/24/365
......@@ -359,7 +361,7 @@ variables, the script below also requires that the `MPI` shell variable
be set and `export`-ed, so that `$MPI/bin/mpiexec` can actually run MPI
programs. In all likelihood, this script needs to be tweaked depending on
the specifics of how MPI programs should be run on the target platform.
```
```shell
nrows=300000000 density=200 nthreads=32 ./rsa240-linalg-0a-estimate_linalg_time_coarse_method_b.sh
```
......@@ -457,7 +459,7 @@ This script needs the `MPI`, `DATA`, `matrix`, and `CADO_BUILD` to be
set. It can be used as follows, where `$matrix` points to one of the
matrices that have been produced by the filter code (after the `replay`
step).
```
```shell
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
export CADO_BUILD
......@@ -482,7 +484,7 @@ giving rise to `n/64=4` sequences to be computed indepedently. We used
8-node jobs.
The first part of the computation can be done with these scripts:
```
```shell
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
export CADO_BUILD
......@@ -501,14 +503,14 @@ needed. Each of these 4 sequences needs about 25 days to complete. Jobs can be i
from where they left off. E.g., if the latest of the `V64-128.*` files in
`$DATA` is `V64-128.86016`, then the job for sequence 1 can be restarted
with:
```
```shell
./rsa240-linalg-3-krylov.sh sequence=1 start=86016
```
Cheap sanity checks can be done periodically with the following script,
which does all checks it can do (note that the command is happy if it
finds _no_ check to do as well!)
```
```shell
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
export CADO_BUILD
......@@ -520,7 +522,7 @@ Once this is done, data must be collated before being processed by the
later steps. After step `5-acollect` below, a file named `A0-256.0-1654784` with
size 27111981056 bytes will be in `$DATA`. Step `6-lingen` below runs on
16 nodes, and completes in slightly less than 10 hours.
```
```shell
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
export CADO_BUILD
......@@ -559,7 +561,7 @@ After having successfully followed the steps above, a file named
Let `W` be the kernel vector computed by the linear algebra step.
The characters step transforms this kernel vector into dependencies.
We used the following command on the machine `wurst`:
```
```shell
$CADO_BUILD/linalg/characters -poly rsa240.poly -purged $DATA/purged11.gz -index $DATA/rsa240.index11.gz -heavyblock $DATA/rsa240.matrix11.200.dense.bin -out $DATA/rsa240.kernel -ker $DATA/W -lpb0 36 -lpb1 37 -nchar 50 -t 56
```
This gave after a little more than one hour 21 dependencies
......@@ -569,7 +571,7 @@ This gave after a little more than one hour 21 dependencies
The following command line can be used to process dependencies `start` to
`start+t-1`, using `t` threads (one thread for each dependency):
```
```shell
$CADO_BUILD/sqrt/sqrt -poly rsa240.poly -prefix $DATA/rsa240.dep.gz -side0 -side1 -gcd -dep $start -t $t
```
The `stdout` file contains one line per dependency, either FAIL or a
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment