Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
cado-nfs
records
Commits
81cebb32
Commit
81cebb32
authored
Jun 04, 2020
by
Emmanuel Thomé
Browse files
WIP update 250 and have it more or less in sync with 240
parent
2f71bf42
Changes
6
Expand all
Hide whitespace changes
Inline
Side-by-side
dlp240/README.md
View file @
81cebb32
...
...
@@ -153,16 +153,16 @@ be computed exactly or estimated using the logarithmic integral function.
For the target interval, there are 3.67e9 special-q.
```
python
# sage
#
[
sage
]
ave_rel_per_sq
=
0.61
## pick value ouput by las
number_of_sq
=
3.67e9
tot_rels
=
ave_rel_per_sq
*
number_of_sq
print
(
tot_rels
)
```
This estimate
of
2.2
e9
relations can be made more precise by increasing
the
number of special-q that are sampled for sieving. It is also possible
to
have different nodes sampl
ing
different sub-ranges of the global range
to
get the result faster. We consider that sampling 1024 special-qs is
This estimate
(
2.2
G
relations
)
can be made more precise by increasing
the
number of special-q that are sampled for sieving. It is also possible
to
have different nodes sampl
e
different sub-ranges of the global range
to
get the result faster. We consider that sampling 1024 special-qs is
enough to get a reliable estimate.
## Estimating the cost of sieving
...
...
@@ -189,6 +189,12 @@ Then a typical benchmark is as follows:
time
$CADO_BUILD
/sieve/las
-v
-poly
dlp240.poly
-t
auto
-fb0
$DATA
/dlp240.fb0.gz
-allow-compsq
-qfac-min
8192
-qfac-max
100000000
-allow-largesq
-A
31
-lim1
0
-lim0
536870912
-lpb0
35
-lpb1
35
-mfb1
250
-mfb0
70
-batchlpb0
29
-batchlpb1
28
-batchmfb0
70
-batchmfb1
70
-lambda1
5.2
-lambda0
2.2
-batch
-batch1
$DATA
/dlp240.batch1
-sqside
0
-bkmult
1.10
-q0
150e9
-q1
300e9
-fbc
/tmp/dlp240.fbc
-random-sample
2048
```
(note: the first time this command line is run, it takes some time to
create the "cache" file
`/tmp/dlp240.fbc`
. If you want to avoid this, you
may run the command with
`-random-sample 1024`
replaced by
`-random-sample 0`
first, which will _only_ create the cache file. Then
run the command above.)
On our sample machine, the result of the above line is:
```
real 22m19.032s
...
...
@@ -204,7 +210,7 @@ Finally, it remains to multiply by the number of special-q in this
subrange. We get (in Sagemath):
```
python
# sage
#
[
sage
]
cost_in_core_sec
=
3.67e9
*
20.9
cost_in_core_hours
=
cost_in_core_sec
/
3600
cost_in_core_years
=
cost_in_core_hours
/
24
/
365
...
...
rsa240/README.md
View file @
81cebb32
...
...
@@ -25,8 +25,8 @@ The cado-nfs documentation should be followed, in order to obtain a
complete build. Note in particular that some of the experiments below
require the use of the
[
hwloc
](
https://www.open-mpi.org/projects/hwloc/
)
library, and also some MPI implementation.
[
Open
MPI
](
https://www.open-mpi.org/
)
is routinely used for tests, but cado-nfs
also
works with Intel MPI, for instance. The bottom line is that although
MPI
](
https://www.open-mpi.org/
)
is routinely used for tests, but cado-nfs
also
works with Intel MPI, for instance. The bottom line is that although
these external pieces software are marked as _optional_ for cado-nfs,
they must be regarded as real prerequisites for large experiments.
...
...
@@ -109,13 +109,14 @@ To estimate the number of relations produced by a set of parameters:
-
We create a "hint" file where we tell which strategy to use for which
special-q size.
-
We random-sample in the global q-range, using sieving and not batch:
this produces the same relations. This is slower but -batch is
currently incompatible with on-line duplicate removal.
this produces the same relations. This is slower but
`
-batch
`
is
currently incompatible with on-line
(on-the-fly)
duplicate removal.
Here is what it gives with final parameters used in the computation. The
computation of the factor base is done with the following command. Here,
`-t 16`
specifies the number of threads (more is practically useless, since
gzip soon becomes the limiting factor).
Here is what it gives with the parameters that were used in the computation.
The computation of the factor base is done with the following command.
Here,
`-t 16`
specifies the number of threads (more is practically
useless, since gzip soon becomes the limiting factor).
```
shell
$CADO_BUILD
/sieve/makefb
-poly
rsa240.poly
-side
1
-lim
2100000000
-maxbits
16
-t
16
-out
$DATA
/rsa240.fb1.gz
...
...
@@ -160,17 +161,17 @@ of degree-1 prime ideals below a bound. In
[
Sagemath
](
https://www.sagemath.org/
)
code, this gives:
```
python
# sage
#
[
sage
]
ave_rel_per_sq
=
19.6
## pick value ouput by las
number_of_sq
=
log_integral
(
7.4e9
)
-
log_integral
(
8e8
)
tot_rels
=
ave_rel_per_sq
*
number_of_sq
print
(
tot_rels
)
```
This estimate (5.9G relations) can be made more precise by increasing the
number of
special-q that are sampled for sieving. It is also possible to
have
different nodes sample different sub-ranges of the global range to
get
the result faster. We consider that sampling 1024 special-qs is
enough
to get a reliable estimate.
This estimate (5.9G relations) can be made more precise by increasing the
number of
special-q that are sampled for sieving. It is also possible to
have
different nodes sample different sub-ranges of the global range to
get
the result faster. We consider that sampling 1024 special-qs is
enough
to get a reliable estimate.
## Estimating the cost of sieving
...
...
@@ -189,14 +190,14 @@ conditions were reached during production as well, most of the time.
In production there is no need to activate the on-the-fly duplicate
removal which is supposed to be cheap but maybe not negligible. There is
no need to pass the hint file, since we are going to run the siever
on
different parts of the q-range, and on each of them the parameters are
also
no need to pass the hint file, since we are going to run the siever
on
different parts of the q-range, and on each of them the parameters are
constant. Finally, during a benchmark, it is important to emulate the
fact that the cached factor base (the
`/tmp/rsa240.fbc`
file) is
precomputed and
hot (i.e., cached in memory by the OS and/or the
hard-drive),
because this is the situation in production; for this, it
suffices
to start a first run and interrupt it as soon as the cache is
written (or
pass
`-nq 0`
).
fact that the cached factor base (the
`/tmp/rsa240.fbc`
file) is
precomputed and
hot (i.e., cached in memory by the OS and/or the
hard-drive),
because this is the situation in production; for this, it
suffices
to start a first run and interrupt it as soon as the cache is
written (or
pass
`-nq 0`
).
#### Cost of 2-sided sieving in the q-range [8e8,2.1e9]
...
...
@@ -207,11 +208,11 @@ sieving is used on both sides, the typical command-line is as follows:
time
$CADO_BUILD
/sieve/las
-poly
rsa240.poly
-fb1
$DATA
/rsa240.fb1.gz
-lim0
1800000000
-lim1
2100000000
-lpb0
36
-lpb1
37
-q0
8e8
-q1
2.1e9
-sqside
1
-A
32
-mfb0
72
-mfb1
111
-lambda0
2.2
-lambda1
3.2
-random-sample
1024
-t
auto
-bkmult
1,1l:1.15,1s:1.4,2s:1.1
-v
-bkthresh1
90000000
-adjust-strategy
2
-fbc
/tmp/rsa240.fbc
```
(note: the first time this command line is run, it takes some time
to
create the "cache" file
`/tmp/rsa240.fbc`
. If you want to avoid this, you
may
run the command with
`-random-sample 1024`
replaced by
`-random-sample 0`
first, which will _only_ create the cache file. Then
run the command
above.)
(note: the first time this command line is run, it takes some time
to
create the "cache" file
`/tmp/rsa240.fbc`
. If you want to avoid this, you
may
run the command with
`-random-sample 1024`
replaced by
`-random-sample 0`
first, which will _only_ create the cache file. Then
run the command
above.)
While
`las`
tries to print some running times, some start-up or finish
tasks might be skipped; furthermore the CPU-time gets easily confused by
...
...
@@ -234,7 +235,7 @@ Finally, it remains to multiply by the number of special-q in this
subrange. We get (in Sagemath):
```
python
# sage
#
[
sage
]
cost_in_core_sec
=
(
log_integral
(
2.1e9
)
-
log_integral
(
8e8
))
*
4554.3
*
32
/
1024
cost_in_core_hours
=
cost_in_core_sec
/
3600
cost_in_core_years
=
cost_in_core_hours
/
24
/
365
...
...
@@ -247,32 +248,32 @@ With this experiment, we get therefore about 279 core.years for this sub-range.
#### Cost of 1-sided sieving + batch in the q-range [2.1e9,7.4e9]
For special-qs larger then 2.1e9, since we are using batch smoothness
detection on side 0, we have to precompute the
`rsa240.batch0`
file
that
contains the product of all primes to be extracted. (Note th
e
`-batch1`
option is mandatory, even if for our parameters, no file is
produced on
side 1.)
detection on side 0, we have to precompute the
`rsa240.batch0`
file
which
contains the product of all primes to be extracted. (Note th
at the
`-batch1`
option is mandatory, even if for our parameters, no file is
produced on
side 1.)
```
shell
$CADO_BUILD
/sieve/ecm/precompbatch
-poly
rsa240.poly
-lim0
0
-lim1
2100000000
-batch0
$DATA
/rsa240.batch0
-batch1
$DATA
/rsa240.batch1
-batchlpb0
31
-batchlpb1
30
```
Then, we can use the
[
`rsa240-sieve-batch.sh`
](
rsa240-sieve-batch.sh
)
shell-script
given in this repository. This launches:
-
one instance of las,
that
does the sieving on side 1 and print the
Then, we can use the
[
`rsa240-sieve-batch.sh`
](
rsa240-sieve-batch.sh
)
shell script
given in this repository. This launches:
-
one instance of
`
las
`
,
which
does the sieving on side 1 and print
s
the
survivors to files;
-
6 instances of the
`finishbatch`
program
, which
process
es
the
se files as
they are produced
. These programs
do the batch smoothness detection,
and
produce relations.
-
6 instances of the
`finishbatch`
program
. Those instances
process the
files as
they are produced
,
do the batch smoothness detection,
and
produce relations.
The script takes two command-line arguments
`-q0 xxx`
and
`-q1 xxx`
,
which describe the range of special-q to process. Temporary files are put
in the
`/tmp`
directory by default.
In order to run
it on your own machine, there are some variables to
adjust at the beginning of the script. Two examples are already given, so
this should be
ea
s
y
to imitate. The number of instances of
`finishbatch`
can also be adjusted depending on the number of cores available on the
machine.
In order to run
[
`rsa240-sieve-batch.sh`
](
rsa240-sieve-batch.sh
)
on your
own machine, there are some variables to adjust at the beginning of the
script. Two examples are alr
ea
d
y
given, so this should be easy to
imitate. The number of instances of
`finishbatch`
can also be adjusted
depending on the number of cores available on the
machine.
When the paths are properly set (either by having
`CADO_BUILD`
and
`DATA`
set correctly, or by tweaking the script), a typical invocation
...
...
@@ -281,38 +282,39 @@ is as follows:
./rsa240-sieve-batch.sh
-q0
2100000000
-q1
2100100000
```
The script prints on stdout the start and end date, and in the output of
`las`
that
can be found in
`$DATA/log/las.${q0}-${q1}.out`
, the number
of special-qs that have been processed can be found. From this
one can
again deduce the cost in core.seconds to process one
special-q and then
the overall cost of sieving the q-range [2.1e9,7.4e9].
`las`
, which
can be found in
`$DATA/log/las.${q0}-${q1}.out`
, the number
of special-qs that have been processed can be found. From this
information one can
again deduce the cost in core.seconds to process one
special-q and then
the overall cost of sieving the q-range [2.1e9,7.4e9].
The design of this script imposes to have a rather long range of
special-q to handle for each run of
`rsa240-sieve-batch.sh`
. Indeed,
during the
last minutes, the finishbatch jobs need to take care of the
last survivor
files while las is no longer running, so that the node is
not fully
occupied. If the
`rsa240-sieve-batch.sh`
job takes a few hours,
this fade-out
phase takes negligible time. Both for the benchmark and in
production it
is then necessary to have jobs taking at least a few hours.
special-q to handle for each run of
`rsa240-sieve-batch.sh`
. Indeed,
during the
last minutes, the
`
finishbatch
`
jobs need to take care of the
last survivor
files while
`
las
`
is no longer running, so that the node is
not fully
occupied. If the
`rsa240-sieve-batch.sh`
job takes a few hours,
this fade-out
phase takes negligible time. Both for the benchmark and in
production it
is then necessary to have jobs taking at least a few hours.
On our sample machine, here is an example of a benchmark:
```
shell
./rsa240-sieve-batch.sh
-q0
2100000000
-q1
2100100000
>
/tmp/sieve-batch.out
./rsa240-sieve-batch.sh
-q0
2100000000
-q1
2100100000
>
/tmp/
rsa240-
sieve-batch.out
# [ wait ... ]
start
=
$(
date
-d
"
`
grep
"^Starting"
/tmp/sieve-batch.out |
head
-1
|
awk
-F
" at "
'//{print $2}'
`
"
+%s
)
end
=
$(
date
-d
"
`
grep
"^End"
/tmp/sieve-batch.out |
tail
-1
|
awk
-F
" at "
'//{print $2}'
`
"
+%s
)
start
=
$(
date
-d
"
`
grep
"^Starting"
/tmp/
rsa240-
sieve-batch.out |
head
-1
|
awk
-F
" at "
'//{print $2}'
`
"
+%s
)
end
=
$(
date
-d
"
`
grep
"^End"
/tmp/
rsa240-
sieve-batch.out |
tail
-1
|
awk
-F
" at "
'//{print $2}'
`
"
+%s
)
nb_q
=
`
grep
"# Discarded 0 special-q's out of"
/tmp/log/las.2100000000-2100100000.out |
awk
'{print $(NF-1)}'
`
echo
-n
"Cost in core.sec per special-q: "
;
echo
"(
$end
-
$start
)/
$nb_q
*32"
| bc
-l
# Cost in core.sec per special-q: 67.43915571828559121248
```
```
python
# sage
#
[
sage
]
cost_in_core_sec
=
(
log_integral
(
7.4e9
)
-
log_integral
(
2.1e9
))
*
67.4
cost_in_core_hours
=
cost_in_core_sec
/
3600
cost_in_core_years
=
cost_in_core_hours
/
24
/
365
print
(
cost_in_core_hours
,
cost_in_core_years
)
# (4.46604511452076e6, 509.822501657621)
```
With this experiment, we get 67.4 core.sec per special-q, and therefore
...
...
@@ -496,13 +498,14 @@ export MPI
./rsa240-linalg-3-krylov.sh
sequence
=
2
start
=
0
./rsa240-linalg-3-krylov.sh
sequence
=
3
start
=
0
```
where the last 4 lines (steps
`3-krylov`
) correspond to the 4 "sequences" (vector blocks
numbered
`0-64`
,
`64-128`
,
`128-192`
, and
`192-256`
). These sequences can
be run concurrently on different sets of nodes, with no synchronization
needed. Each of these 4 sequences needs about 25 days to complete. Jobs can be interrupted, and must simply be restarted exactly
from where they left off. E.g., if the latest of the
`V64-128.*`
files in
`$DATA`
is
`V64-128.86016`
, then the job for sequence 1 can be restarted
with:
where the last 4 lines (steps
`3-krylov`
) correspond to the 4 "sequences"
(vector blocks numbered
`0-64`
,
`64-128`
,
`128-192`
, and
`192-256`
).
These sequences can be run concurrently on different sets of nodes, with
no synchronization needed. Each of these 4 sequences needs about 25 days
to complete. Jobs can be interrupted, and must simply be restarted
exactly from where they left off. E.g., if the latest of the
`V64-128.*`
files in
`$DATA`
is
`V64-128.86016`
, then the job for sequence 1 can be
restarted with:
```
shell
./rsa240-linalg-3-krylov.sh
sequence
=
1
start
=
86016
```
...
...
@@ -519,9 +522,10 @@ export MPI
```
Once this is done, data must be collated before being processed by the
later steps. After step
`5-acollect`
below, a file named
`A0-256.0-1654784`
with
size 27111981056 bytes will be in
`$DATA`
. Step
`6-lingen`
below runs on
16 nodes, and completes in slightly less than 10 hours.
later steps. After step
`5-acollect`
below, a file named
`A0-256.0-1654784`
with size 27111981056 bytes will be in
`$DATA`
. Step
`6-lingen`
below runs on 16 nodes, and completes in slightly less than 10
hours.
```
shell
export
matrix
=
$DATA
/rsa240.matrix11.200.bin
export
DATA
...
...
rsa240/filtering.md
View file @
81cebb32
# Additional info on filtering for RSA-240
Filtering was run exclusively on the machine
`wurst`
.
A first step of the filtering process in cado-nfs is to create the
so-called "renumber table", as follows.
```
...
...
rsa240/rsa240-sieve-batch.sh
View file @
81cebb32
#!/bin/bash
hn
=
`
hostname
`
set
-e
:
${
DATA
?missing
}
:
${
CADO_BUILD
?missing
}
:
${
wdir
=
"/tmp"
}
:
${
result_dir
=
"
$DATA
"
}
set
+e
# batch that number of survivors per file, to be sent to finishbatch
# 16M survivors create
s
a product tree 0.75 times the product tree of the primes
# 10M survivors create
s
a product tree 0.48 times the product tree of the primes
# 16M survivors create a product tree 0.75 times the product tree of the primes
# 10M survivors create a product tree 0.48 times the product tree of the primes
filesize
=
10000000
# A qrange of 100000 should (easily) fit in 2 hours.
...
...
rsa250/README.md
View file @
81cebb32
This diff is collapsed.
Click to expand it.
rsa250/sieve-batch.sh
→
rsa250/
rsa250-
sieve-batch.sh
View file @
81cebb32
...
...
@@ -2,30 +2,15 @@
set
-e
hn
=
`
hostname
`
if
(
echo
$hn
|
grep
juwels
>
/dev/null
)
;
then
cluster
=
"juwels"
elif
(
echo
$hn
|
grep
grvingt
>
/dev/null
)
;
then
cluster
=
"grvingt"
else
echo
"Unknown cluster. Good bye."
exit
1
fi
if
[
$cluster
==
"grvingt"
]
;
then
path_rsa250
=
"/grvingt/pgaudry/rsa250"
wdir
=
"/tmp"
result_dir
=
"
$path_rsa250
"
cadobuild
=
"/grvingt/pgaudry/cado-nfs/build"
else
path_rsa250
=
"
$PROJECT
/gaudry/rsa250"
wdir
=
"/tmp"
result_dir
=
"
$path_rsa250
"
cadobuild
=
"
$PROJECT
/gaudry/cado-nfs/build"
fi
:
${
DATA
?missing
}
:
${
CADO_BUILD
?missing
}
:
${
wdir
=
"/tmp"
}
:
${
result_dir
=
"
$DATA
"
}
set
+e
# batch that number of survivors per file, to be sent to finishbatch
# 10M survivors create
s
a product tree 0.5 times the product tree of the primes
# 10M survivors create a product tree 0.5 times the product tree of the primes
filesize
=
10000000
# A qrange of 100000 should (easily) fit in 2 hours.
...
...
@@ -84,9 +69,9 @@ loop_finishbatch() {
# run finishbatch on it
echo
-n
"[
$id
]: Starting finishbatch on
$file
.
$id
at "
;
date
$
cadobuild
/sieve/ecm/finishbatch
-poly
\
$
path_rsa250
/rsa250.poly
-lim0
0
-lim1
2147483647
\
-lpb0
36
-lpb1
37
-batch0
$
path_rsa250
/rsa250.batch0
\
$
CADO_BUILD
/sieve/ecm/finishbatch
-poly
\
$
DATA
/rsa250.poly
-lim0
0
-lim1
2147483647
\
-lpb0
36
-lpb1
37
-batch0
$
DATA
/rsa250.batch0
\
-batchlpb0
31
-batchmfb0
72
-batchlpb1
30
-batchmfb1
74
-doecm
\
-ncurves
80
-t
8
-in
"
$workdir
/running/
$file
.
$id
"
\
>
"
$resdir
/
$file
"
...
...
@@ -108,10 +93,10 @@ done
echo
-n
"Starting las at "
;
date
$
cadobuild
/sieve/las
\
-poly
$
path_rsa250
/rsa250.poly
\
-fb1
$
path_rsa250
/rsa250.fb1.gz
\
-fbc
$
path_rsa250
/rsa250.fbc
\
$
CADO_BUILD
/sieve/las
\
-poly
$
DATA
/rsa250.poly
\
-fb1
$
DATA
/rsa250.fb1.gz
\
-fbc
$
DATA
/rsa250.fbc
\
-lim0
0
-lim1
2147483647
-lpb0
36
-lpb1
37
-sqside
1
-A
33
\
-mfb0
250
-mfb1
74
-lambda0
5.0
-lambda1
2.2
\
-bkmult
1,1l:1.15,1s:1.5,2s:1.1
-bkthresh1
80000000
\
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment