Commit 81cebb32 authored by Emmanuel Thomé's avatar Emmanuel Thomé
Browse files

WIP update 250 and have it more or less in sync with 240

parent 2f71bf42
......@@ -153,16 +153,16 @@ be computed exactly or estimated using the logarithmic integral function.
For the target interval, there are 3.67e9 special-q.
```python
# sage
# [sage]
ave_rel_per_sq = 0.61 ## pick value ouput by las
number_of_sq = 3.67e9
tot_rels = ave_rel_per_sq * number_of_sq
print (tot_rels)
```
This estimate of 2.2e9 relations can be made more precise by increasing
the number of special-q that are sampled for sieving. It is also possible
to have different nodes sampling different sub-ranges of the global range
to get the result faster. We consider that sampling 1024 special-qs is
This estimate (2.2G relations) can be made more precise by increasing the
number of special-q that are sampled for sieving. It is also possible to
have different nodes sample different sub-ranges of the global range to
get the result faster. We consider that sampling 1024 special-qs is
enough to get a reliable estimate.
## Estimating the cost of sieving
......@@ -189,6 +189,12 @@ Then a typical benchmark is as follows:
time $CADO_BUILD/sieve/las -v -poly dlp240.poly -t auto -fb0 $DATA/dlp240.fb0.gz -allow-compsq -qfac-min 8192 -qfac-max 100000000 -allow-largesq -A 31 -lim1 0 -lim0 536870912 -lpb0 35 -lpb1 35 -mfb1 250 -mfb0 70 -batchlpb0 29 -batchlpb1 28 -batchmfb0 70 -batchmfb1 70 -lambda1 5.2 -lambda0 2.2 -batch -batch1 $DATA/dlp240.batch1 -sqside 0 -bkmult 1.10 -q0 150e9 -q1 300e9 -fbc /tmp/dlp240.fbc -random-sample 2048
```
(note: the first time this command line is run, it takes some time to
create the "cache" file `/tmp/dlp240.fbc`. If you want to avoid this, you
may run the command with `-random-sample 1024` replaced by
`-random-sample 0` first, which will _only_ create the cache file. Then
run the command above.)
On our sample machine, the result of the above line is:
```
real 22m19.032s
......@@ -204,7 +210,7 @@ Finally, it remains to multiply by the number of special-q in this
subrange. We get (in Sagemath):
```python
# sage
# [sage]
cost_in_core_sec=3.67e9*20.9
cost_in_core_hours=cost_in_core_sec/3600
cost_in_core_years=cost_in_core_hours/24/365
......
......@@ -25,8 +25,8 @@ The cado-nfs documentation should be followed, in order to obtain a
complete build. Note in particular that some of the experiments below
require the use of the [hwloc](https://www.open-mpi.org/projects/hwloc/)
library, and also some MPI implementation. [Open
MPI](https://www.open-mpi.org/) is routinely used for tests, but cado-nfs also
works with Intel MPI, for instance. The bottom line is that although
MPI](https://www.open-mpi.org/) is routinely used for tests, but cado-nfs
also works with Intel MPI, for instance. The bottom line is that although
these external pieces software are marked as _optional_ for cado-nfs,
they must be regarded as real prerequisites for large experiments.
......@@ -109,13 +109,14 @@ To estimate the number of relations produced by a set of parameters:
- We create a "hint" file where we tell which strategy to use for which
special-q size.
- We random-sample in the global q-range, using sieving and not batch:
this produces the same relations. This is slower but -batch is
currently incompatible with on-line duplicate removal.
this produces the same relations. This is slower but `-batch` is
currently incompatible with on-line (on-the-fly) duplicate removal.
Here is what it gives with final parameters used in the computation. The
computation of the factor base is done with the following command. Here,
`-t 16` specifies the number of threads (more is practically useless, since
gzip soon becomes the limiting factor).
Here is what it gives with the parameters that were used in the computation.
The computation of the factor base is done with the following command.
Here, `-t 16` specifies the number of threads (more is practically
useless, since gzip soon becomes the limiting factor).
```shell
$CADO_BUILD/sieve/makefb -poly rsa240.poly -side 1 -lim 2100000000 -maxbits 16 -t 16 -out $DATA/rsa240.fb1.gz
......@@ -160,17 +161,17 @@ of degree-1 prime ideals below a bound. In
[Sagemath](https://www.sagemath.org/) code, this gives:
```python
# sage
# [sage]
ave_rel_per_sq = 19.6 ## pick value ouput by las
number_of_sq = log_integral(7.4e9) - log_integral(8e8)
tot_rels = ave_rel_per_sq * number_of_sq
print (tot_rels)
```
This estimate (5.9G relations) can be made more precise by increasing the number of
special-q that are sampled for sieving. It is also possible to have
different nodes sample different sub-ranges of the global range to get
the result faster. We consider that sampling 1024 special-qs is enough
to get a reliable estimate.
This estimate (5.9G relations) can be made more precise by increasing the
number of special-q that are sampled for sieving. It is also possible to
have different nodes sample different sub-ranges of the global range to
get the result faster. We consider that sampling 1024 special-qs is
enough to get a reliable estimate.
## Estimating the cost of sieving
......@@ -189,14 +190,14 @@ conditions were reached during production as well, most of the time.
In production there is no need to activate the on-the-fly duplicate
removal which is supposed to be cheap but maybe not negligible. There is
no need to pass the hint file, since we are going to run the siever on
different parts of the q-range, and on each of them the parameters are
also no need to pass the hint file, since we are going to run the siever
on different parts of the q-range, and on each of them the parameters are
constant. Finally, during a benchmark, it is important to emulate the
fact that the cached factor base (the `/tmp/rsa240.fbc` file) is precomputed and
hot (i.e., cached in memory by the OS and/or the hard-drive),
because this is the situation in production; for this, it suffices
to start a first run and interrupt it as soon as the cache is written (or
pass `-nq 0`).
fact that the cached factor base (the `/tmp/rsa240.fbc` file) is
precomputed and hot (i.e., cached in memory by the OS and/or the
hard-drive), because this is the situation in production; for this, it
suffices to start a first run and interrupt it as soon as the cache is
written (or pass `-nq 0`).
#### Cost of 2-sided sieving in the q-range [8e8,2.1e9]
......@@ -207,11 +208,11 @@ sieving is used on both sides, the typical command-line is as follows:
time $CADO_BUILD/sieve/las -poly rsa240.poly -fb1 $DATA/rsa240.fb1.gz -lim0 1800000000 -lim1 2100000000 -lpb0 36 -lpb1 37 -q0 8e8 -q1 2.1e9 -sqside 1 -A 32 -mfb0 72 -mfb1 111 -lambda0 2.2 -lambda1 3.2 -random-sample 1024 -t auto -bkmult 1,1l:1.15,1s:1.4,2s:1.1 -v -bkthresh1 90000000 -adjust-strategy 2 -fbc /tmp/rsa240.fbc
```
(note: the first time this command line is run, it takes some time
to create the "cache" file `/tmp/rsa240.fbc`. If you want to avoid this, you may
run the command with `-random-sample 1024` replaced by `-random-sample 0`
first, which will _only_ create the cache file. Then run the command
above.)
(note: the first time this command line is run, it takes some time to
create the "cache" file `/tmp/rsa240.fbc`. If you want to avoid this, you
may run the command with `-random-sample 1024` replaced by
`-random-sample 0` first, which will _only_ create the cache file. Then
run the command above.)
While `las` tries to print some running times, some start-up or finish
tasks might be skipped; furthermore the CPU-time gets easily confused by
......@@ -234,7 +235,7 @@ Finally, it remains to multiply by the number of special-q in this
subrange. We get (in Sagemath):
```python
# sage
# [sage]
cost_in_core_sec=(log_integral(2.1e9)-log_integral(8e8))*4554.3*32/1024
cost_in_core_hours=cost_in_core_sec/3600
cost_in_core_years=cost_in_core_hours/24/365
......@@ -247,32 +248,32 @@ With this experiment, we get therefore about 279 core.years for this sub-range.
#### Cost of 1-sided sieving + batch in the q-range [2.1e9,7.4e9]
For special-qs larger then 2.1e9, since we are using batch smoothness
detection on side 0, we have to precompute the `rsa240.batch0` file that
contains the product of all primes to be extracted. (Note the `-batch1`
option is mandatory, even if for our parameters, no file is produced on
side 1.)
detection on side 0, we have to precompute the `rsa240.batch0` file which
contains the product of all primes to be extracted. (Note that the
`-batch1` option is mandatory, even if for our parameters, no file is
produced on side 1.)
```shell
$CADO_BUILD/sieve/ecm/precompbatch -poly rsa240.poly -lim0 0 -lim1 2100000000 -batch0 $DATA/rsa240.batch0 -batch1 $DATA/rsa240.batch1 -batchlpb0 31 -batchlpb1 30
```
Then, we can use the [`rsa240-sieve-batch.sh`](rsa240-sieve-batch.sh) shell-script
given in this repository. This launches:
- one instance of las, that does the sieving on side 1 and print the
Then, we can use the [`rsa240-sieve-batch.sh`](rsa240-sieve-batch.sh)
shell script given in this repository. This launches:
- one instance of `las`, which does the sieving on side 1 and prints the
survivors to files;
- 6 instances of the `finishbatch` program, which processes these files as
they are produced. These programs do the batch smoothness detection,
and produce relations.
- 6 instances of the `finishbatch` program. Those instances process the
files as they are produced, do the batch smoothness detection, and
produce relations.
The script takes two command-line arguments `-q0 xxx` and `-q1 xxx`,
which describe the range of special-q to process. Temporary files are put
in the `/tmp` directory by default.
In order to run it on your own machine, there are some variables to
adjust at the beginning of the script. Two examples are already given, so
this should be easy to imitate. The number of instances of `finishbatch`
can also be adjusted depending on the number of cores available on the
machine.
In order to run [`rsa240-sieve-batch.sh`](rsa240-sieve-batch.sh) on your
own machine, there are some variables to adjust at the beginning of the
script. Two examples are already given, so this should be easy to
imitate. The number of instances of `finishbatch` can also be adjusted
depending on the number of cores available on the machine.
When the paths are properly set (either by having `CADO_BUILD` and
`DATA` set correctly, or by tweaking the script), a typical invocation
......@@ -281,38 +282,39 @@ is as follows:
./rsa240-sieve-batch.sh -q0 2100000000 -q1 2100100000
```
The script prints on stdout the start and end date, and in the output of
`las` that can be found in `$DATA/log/las.${q0}-${q1}.out`, the number
of special-qs that have been processed can be found. From this one can
again deduce the cost in core.seconds to process one special-q and then
the overall cost of sieving the q-range [2.1e9,7.4e9].
`las`, which can be found in `$DATA/log/las.${q0}-${q1}.out`, the number
of special-qs that have been processed can be found. From this
information one can again deduce the cost in core.seconds to process one
special-q and then the overall cost of sieving the q-range [2.1e9,7.4e9].
The design of this script imposes to have a rather long range of
special-q to handle for each run of `rsa240-sieve-batch.sh`. Indeed, during the
last minutes, the finishbatch jobs need to take care of the last survivor
files while las is no longer running, so that the node is not fully
occupied. If the `rsa240-sieve-batch.sh` job takes a few hours, this fade-out
phase takes negligible time. Both for the benchmark and in production it
is then necessary to have jobs taking at least a few hours.
special-q to handle for each run of `rsa240-sieve-batch.sh`. Indeed,
during the last minutes, the `finishbatch` jobs need to take care of the
last survivor files while `las` is no longer running, so that the node is
not fully occupied. If the `rsa240-sieve-batch.sh` job takes a few hours,
this fade-out phase takes negligible time. Both for the benchmark and in
production it is then necessary to have jobs taking at least a few hours.
On our sample machine, here is an example of a benchmark:
```shell
./rsa240-sieve-batch.sh -q0 2100000000 -q1 2100100000 > /tmp/sieve-batch.out
./rsa240-sieve-batch.sh -q0 2100000000 -q1 2100100000 > /tmp/rsa240-sieve-batch.out
# [ wait ... ]
start=$(date -d "`grep "^Starting" /tmp/sieve-batch.out | head -1 | awk -F " at " '//{print $2}'`" +%s)
end=$(date -d "`grep "^End" /tmp/sieve-batch.out | tail -1 | awk -F " at " '//{print $2}'`" +%s)
start=$(date -d "`grep "^Starting" /tmp/rsa240-sieve-batch.out | head -1 | awk -F " at " '//{print $2}'`" +%s)
end=$(date -d "`grep "^End" /tmp/rsa240-sieve-batch.out | tail -1 | awk -F " at " '//{print $2}'`" +%s)
nb_q=`grep "# Discarded 0 special-q's out of" /tmp/log/las.2100000000-2100100000.out | awk '{print $(NF-1)}'`
echo -n "Cost in core.sec per special-q: "; echo "($end-$start)/$nb_q*32" | bc -l
# Cost in core.sec per special-q: 67.43915571828559121248
```
```python
# sage
# [sage]
cost_in_core_sec=(log_integral(7.4e9)-log_integral(2.1e9))*67.4
cost_in_core_hours=cost_in_core_sec/3600
cost_in_core_years=cost_in_core_hours/24/365
print (cost_in_core_hours, cost_in_core_years)
# (4.46604511452076e6, 509.822501657621)
```
With this experiment, we get 67.4 core.sec per special-q, and therefore
......@@ -496,13 +498,14 @@ export MPI
./rsa240-linalg-3-krylov.sh sequence=2 start=0
./rsa240-linalg-3-krylov.sh sequence=3 start=0
```
where the last 4 lines (steps `3-krylov`) correspond to the 4 "sequences" (vector blocks
numbered `0-64`, `64-128`, `128-192`, and `192-256`). These sequences can
be run concurrently on different sets of nodes, with no synchronization
needed. Each of these 4 sequences needs about 25 days to complete. Jobs can be interrupted, and must simply be restarted exactly
from where they left off. E.g., if the latest of the `V64-128.*` files in
`$DATA` is `V64-128.86016`, then the job for sequence 1 can be restarted
with:
where the last 4 lines (steps `3-krylov`) correspond to the 4 "sequences"
(vector blocks numbered `0-64`, `64-128`, `128-192`, and `192-256`).
These sequences can be run concurrently on different sets of nodes, with
no synchronization needed. Each of these 4 sequences needs about 25 days
to complete. Jobs can be interrupted, and must simply be restarted
exactly from where they left off. E.g., if the latest of the `V64-128.*`
files in `$DATA` is `V64-128.86016`, then the job for sequence 1 can be
restarted with:
```shell
./rsa240-linalg-3-krylov.sh sequence=1 start=86016
```
......@@ -519,9 +522,10 @@ export MPI
```
Once this is done, data must be collated before being processed by the
later steps. After step `5-acollect` below, a file named `A0-256.0-1654784` with
size 27111981056 bytes will be in `$DATA`. Step `6-lingen` below runs on
16 nodes, and completes in slightly less than 10 hours.
later steps. After step `5-acollect` below, a file named
`A0-256.0-1654784` with size 27111981056 bytes will be in `$DATA`. Step
`6-lingen` below runs on 16 nodes, and completes in slightly less than 10
hours.
```shell
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
......
# Additional info on filtering for RSA-240
Filtering was run exclusively on the machine `wurst`.
A first step of the filtering process in cado-nfs is to create the
so-called "renumber table", as follows.
```
......
#!/bin/bash
hn=`hostname`
set -e
: ${DATA?missing}
: ${CADO_BUILD?missing}
: ${wdir="/tmp"}
: ${result_dir="$DATA"}
set +e
# batch that number of survivors per file, to be sent to finishbatch
# 16M survivors creates a product tree 0.75 times the product tree of the primes
# 10M survivors creates a product tree 0.48 times the product tree of the primes
# 16M survivors create a product tree 0.75 times the product tree of the primes
# 10M survivors create a product tree 0.48 times the product tree of the primes
filesize=10000000
# A qrange of 100000 should (easily) fit in 2 hours.
......
This diff is collapsed.
......@@ -2,30 +2,15 @@
set -e
hn=`hostname`
if (echo $hn | grep juwels > /dev/null); then
cluster="juwels"
elif (echo $hn | grep grvingt > /dev/null); then
cluster="grvingt"
else
echo "Unknown cluster. Good bye."
exit 1
fi
if [ $cluster == "grvingt" ]; then
path_rsa250="/grvingt/pgaudry/rsa250"
wdir="/tmp"
result_dir="$path_rsa250"
cadobuild="/grvingt/pgaudry/cado-nfs/build"
else
path_rsa250="$PROJECT/gaudry/rsa250"
wdir="/tmp"
result_dir="$path_rsa250"
cadobuild="$PROJECT/gaudry/cado-nfs/build"
fi
: ${DATA?missing}
: ${CADO_BUILD?missing}
: ${wdir="/tmp"}
: ${result_dir="$DATA"}
set +e
# batch that number of survivors per file, to be sent to finishbatch
# 10M survivors creates a product tree 0.5 times the product tree of the primes
# 10M survivors create a product tree 0.5 times the product tree of the primes
filesize=10000000
# A qrange of 100000 should (easily) fit in 2 hours.
......@@ -84,9 +69,9 @@ loop_finishbatch() {
# run finishbatch on it
echo -n "[ $id ]: Starting finishbatch on $file.$id at "; date
$cadobuild/sieve/ecm/finishbatch -poly \
$path_rsa250/rsa250.poly -lim0 0 -lim1 2147483647 \
-lpb0 36 -lpb1 37 -batch0 $path_rsa250/rsa250.batch0\
$CADO_BUILD/sieve/ecm/finishbatch -poly \
$DATA/rsa250.poly -lim0 0 -lim1 2147483647 \
-lpb0 36 -lpb1 37 -batch0 $DATA/rsa250.batch0\
-batchlpb0 31 -batchmfb0 72 -batchlpb1 30 -batchmfb1 74 -doecm\
-ncurves 80 -t 8 -in "$workdir/running/$file.$id" \
> "$resdir/$file"
......@@ -108,10 +93,10 @@ done
echo -n "Starting las at "; date
$cadobuild/sieve/las \
-poly $path_rsa250/rsa250.poly \
-fb1 $path_rsa250/rsa250.fb1.gz \
-fbc $path_rsa250/rsa250.fbc \
$CADO_BUILD/sieve/las \
-poly $DATA/rsa250.poly \
-fb1 $DATA/rsa250.fb1.gz \
-fbc $DATA/rsa250.fbc \
-lim0 0 -lim1 2147483647 -lpb0 36 -lpb1 37 -sqside 1 -A 33 \
-mfb0 250 -mfb1 74 -lambda0 5.0 -lambda1 2.2 \
-bkmult 1,1l:1.15,1s:1.5,2s:1.1 -bkthresh1 80000000 \
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment