Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
cadonfs
records
Commits
80d86bbe
Commit
80d86bbe
authored
May 30, 2020
by
Emmanuel Thomé
Browse files
still work in progress
parent
f1ce13c7
Changes
5
Hide whitespace changes
Inline
Sidebyside
dlp240/README.md
View file @
80d86bbe
# DLP240
# DLP

240
This repository contains information to reproduce the DLP240 discrete
logarithm record.
...
...
@@ 42,7 +42,7 @@ A typical command line for an individual work unit was:
$CADO_BUILD/polyselect/dlpolyselect N 124620366781718784065835044608106590434820374651678805754818788883289666801188210855036039570272508747509864768438458621054865537970253930571891217684318286362846948405301614416430468066875699415246993185704183030512549594371372159029285303 df 4 dg 3 area 2.0890720927744e+20 Bf 34359738368.0 Bg 34359738368.0 bound 150 modm 1000003 modr 42 t 8
```
where `modr 42` gives the index of the task, and all tasks between 0 and
1000002
have been
run.
1000002
were
run.
A ranking between all the computed pairs was based on MurphyE as computed
by `dlpolyselect` with the parameters
...
...
@@ 63,18 +63,18 @@ makes little sense to report the total number of CPUyears really used.
The calendar time was 18 days.
When a node of `grvingt` is fully charged with 8 jobs of 8 threads, then
one task as above
takes
1200 wall clock seconds on average
to be
processed.
This must be multiplied by the 4 physical cores it uses to get
one task as above
is processed in
1200 wall clock seconds on average
.
This must be multiplied by the 4 physical cores it uses to get
the number of coreseconds per modr value. And we have 10^6 of them to
process. This adds

up to 152 core.years for the whole polynomial selection.
process. This adds
up to 152 core.years for the whole polynomial selection.
Some samplesieving was done on the top

100 polynomials according to
Some samplesieving was done on the top
100 polynomials according to
MurphyE. Although there is a clear correlation between the efficiency of
a polynomial pair
s
and its MurphyE value, the ranking is definitely not perfect.
In particular, the
numberone
polynomial pair according to MurphyE finds
10% less relations than the best ones.
a polynomial pair and its MurphyE value, the ranking is definitely not perfect.
In particular, the
best ranked
polynomial pair according to MurphyE finds
10% less relations than the
(truly)
best ones.
Additional sample

sieving was performed on the few best candidates. With
Additional sample
sieving was performed on the few best candidates. With
a test on 128,000 specialq, 3 polynomials could not be separated.
We ended up using the following [`dlp240.poly`](dlp240.poly):
...
...
@@ 105,8 +105,8 @@ Here is what it gives with final parameters used in the computation. Here,
gzip soon becomes the limiting factor).
```
$CADO_BUILD/sieve/makefb poly dlp240.poly side 0 lim 536870912 maxbits 16 t 16 out dlp240.fb0.gz
$CADO_BUILD/sieve/makefb poly dlp240.poly side 1 lim 268435456 maxbits 16 t 16 out dlp240.fb1.gz
$CADO_BUILD/sieve/makefb poly dlp240.poly side 0 lim 536870912 maxbits 16 t 16 out
$DATA/
dlp240.fb0.gz
$CADO_BUILD/sieve/makefb poly dlp240.poly side 1 lim 268435456 maxbits 16 t 16 out
$DATA/
dlp240.fb1.gz
```
These files have size 209219374 and 103814592 bytes, respectively. They
...
...
@@ 118,7 +118,7 @@ number of unique relations per specialq matters. The timing does not
matter.
```
$CADO_BUILD/sieve/las poly dlp240.poly fb0 dlp240.fb0.gz fb1 dlp240.fb1.gz lim0 536870912 lim1 268435456 lpb0 35 lpb1 35 q0 150e9 q1 300e9 dup dupqmin 150000000000,0 sqside 0 A 31 mfb0 70 mfb1 70 lambda0 2.2 lambda1 2.2 randomsample 1024 allowcompsq qfacmin 8192 qfacmax 100000000 allowlargesq bkmult 1.10 t auto v fbc /tmp/fbc
$CADO_BUILD/sieve/las poly dlp240.poly fb0
$DATA/
dlp240.fb0.gz fb1
$DATA/
dlp240.fb1.gz lim0 536870912 lim1 268435456 lpb0 35 lpb1 35 q0 150e9 q1 300e9 dup dupqmin 150000000000,0 sqside 0 A 31 mfb0 70 mfb1 70 lambda0 2.2 lambda1 2.2 randomsample 1024 allowcompsq qfacmin 8192 qfacmax 100000000 allowlargesq bkmult 1.10 t auto v fbc /tmp/
dlp240.
fbc
```
In less than half an hour on our target machine `grvingt`, this gives
...
...
@@ 139,33 +139,34 @@ number_of_sq = 3.67e9
tot_rels = ave_rel_per_sq * number_of_sq
print (tot_rels)
```
This estimate of 2.2e9 relations can be made more precise by increasing
the number of
specialq that are sampled for sieving. It is also possible
to have
different nodes sampling different subranges of the global range
to get
the result faster. We consider that sampling 1024 specialqs is
enough
to get a reliable estimate.
This estimate of 2.2e9 relations can be made more precise by increasing
the number of
specialq that are sampled for sieving. It is also possible
to have
different nodes sampling different subranges of the global range
to get
the result faster. We consider that sampling 1024 specialqs is
enough
to get a reliable estimate.
## Estimating the cost of sieving
In production there is no need to activate the onthefly duplicate
removal which is supposed to be cheap but maybe not negligible and it is
important to emulate the fact that the cached factor base (the /tmp/fbc
file) is precomputed and hot (i.e., cached in memory by the OS and/or the
harddrive), because this is the situation in production; for this, it
suffices to start a first run and interrupt it as soon as the cache is
written. Of course, we use the batch smoothness detection on side 1 and
we have to precompute the product of all primes to be extracted. This
means that on the other hand, the file `dlp240.fb1.gz` is _not_ needed in
production.
important to emulate the fact that the cached factor base (the
`/tmp/dlp240.fbc` file) is precomputed and hot (i.e., cached in memory by
the OS and/or the harddrive), because this is the situation in
production; for this, it suffices to start a first run and interrupt it
as soon as the cache is written. Of course, we use the batch smoothness
detection on side 1 and we have to precompute the product of all primes
to be extracted. This means that on the other hand, the file
`$DATA/dlp240.fb1.gz` is _not_ needed in production (we only used it for
the estimation of the number of unique relations).
```
$CADO_BUILD/sieve/ecm/precompbatch poly dlp240.poly lim1 0 lim0 536870912 batch0
dlp240.batch0
batch1 dlp240.batch1 batchlpb0 29 batchlpb1 28
$CADO_BUILD/sieve/ecm/precompbatch poly dlp240.poly lim1 0 lim0 536870912 batch0
/dev/null
batch1
$DATA/
dlp240.batch1 batchlpb0 29 batchlpb1 28
```
Then a typical benchmark is as follows:
```
time $CADO_BUILD/sieve/las v poly dlp240.poly t auto fb0 dlp240.fb0.gz allowcompsq qfacmin 8192 qfacmax 100000000 allowlargesq A 31 lim1 0 lim0 536870912 lpb0 35 lpb1 35 mfb1 250 mfb0 70 batchlpb0 29 batchlpb1 28 batchmfb0 70 batchmfb1 70 lambda1 5.2 lambda0 2.2 batch batch
0 dlp240.batch0 batch1
dlp240.batch1 sqside 0 bkmult 1.10 q0 150e9 q1 300e9 fbc /tmp/dlp240.fbc randomsample 2048
time $CADO_BUILD/sieve/las v poly dlp240.poly t auto fb0
$DATA/
dlp240.fb0.gz allowcompsq qfacmin 8192 qfacmax 100000000 allowlargesq A 31 lim1 0 lim0 536870912 lpb0 35 lpb1 35 mfb1 250 mfb0 70 batchlpb0 29 batchlpb1 28 batchmfb0 70 batchmfb1 70 lambda1 5.2 lambda0 2.2 batch batch
1 $DATA/
dlp240.batch1 sqside 0 bkmult 1.10 q0 150e9 q1 300e9 fbc /tmp/dlp240.fbc randomsample 2048
```
On our sample machine, the result of the above line is:
...
...
@@ 174,8 +175,8 @@ real 22m19.032s
user 1315m10.459s
sys 5m56.262s
```
Then the `22m19.032s` value must
be taken and
appropriately
divided and
multiplied to be
convert
ed in
physical coreseconds. For instance, in our
Then the `22m19.032s` value must appropriately
scaled, in order to
convert
it to
physical coreseconds. For instance, in our
case, since there are 32 cores and we sieved 2048 specialqs, this gives
`(22*60+19.0)*32/2048=20.9` core.seconds per specialq.
...
...
@@ 213,7 +214,8 @@ random distribution). This reports an anticipated time of about 2.65
seconds per iteration (running on 4 nodes of the `grvingt` cluster).
To obtain timings in a different way, the following procedure can also be
used, maybe as a complement to the above, to generate a complete fake matrix of the required size with the
used, maybe as a complement to the above, to generate a complete fake
matrix of the required size with the
[`generate_random_matrix.sh`](generate_random_matrix.sh) script (which
takes well over an hour), and measure the time for 128 iterations (which
takes only a few minutes). Within the script
...
...
@@ 225,52 +227,51 @@ adjusted to the users' needs.
DATA=$DATA CADO_BUILD=$CADO_BUILD MPI=$MPI nrows=37000000 density=250 nthreads=32 ./dlp240linalg0aestimate_linalg_time_coarse_method_b.sh
```
This second method reports about 3.1 seconds per iteration. Allowing for some
inaccuracy, these experiments are sufficient to build confidence that the
time per iteration in the krylov (a.k.a. "sequence") step of block
Wiedemann is close to seconds per iteration.
The time per iteration in the mksol (a.k.a. "evaluation") step
is in the same ballpark. The time for krylov+mksol can then be estimated
as the product of this timing with `(1+n/m+1/n)*N`, with `N` the
number of rows, and `m` and `n` the block Wiedemann parameters (we chose
`m=48` and `n=16`). Applied to our use case, this gives an anticipated
cost of `(1+n/m+1/n)*N*3*4*32/3600/24/365=628` coreyears for
Krylov+Mksol (4 and 32 representing the fact that we used 4node jobs
with 32physical cores per node).
The "lingen" (linear generator) step of
block Wiedemann was perceived as he main potential stumbling block for
the computation. We had to ensure that it would be doable with the
resources we had. To this end, a "tuning" of the lingen program can be
done with the `tune` flag, so as to get an advance look at the cpu and
memory requirements for that step. These tests were sufficient to
convince us that we had several possible parameter settings to choose
from, and that this computation was doable.
This second method reports about 3.1 seconds per iteration. Allowing for
some inaccuracy, these experiments are sufficient to build confidence
that the time per iteration in the krylov (a.k.a. "sequence") step of
block Wiedemann is close to seconds per iteration. The time per
iteration in the mksol (a.k.a. "evaluation") step is in the same
ballpark. The time for krylov+mksol can then be estimated as the product
of this timing with `(1+n/m+1/n)*N`, with `N` the number of rows, and `m`
and `n` the block Wiedemann parameters (we chose `m=48` and `n=16`).
Applied to our use case, this gives an anticipated cost of
`(1+n/m+1/n)*N*3*4*32/3600/24/365=628` coreyears for Krylov+Mksol (4 and
32 representing the fact that we used 4node jobs with 32physical cores
per node).
The "lingen" (linear generator) step of block Wiedemann was perceived as
the main potential stumbling block for the computation. We had to ensure
that it would be doable with the resources we had. To this end, a
"tuning" of the lingen program can be done with the `tune` flag, so as
to get an advance look at the cpu and memory requirements for that step.
These tests were sufficient to convince us that we had several possible
parameter settings to choose from, and that this computation was doable.
## Validating the claimed sieving results
The benchmark command

lines above can be used almost asis for
The benchmark command
lines above can be used almost asis for
reproducing the full computation. It is just necessary to remove the
randomsample option and to adjust the q0 and q1 to create many small
work units that in the end cover exactly the global qrange.
`randomsample` option and to adjust the `q0` and `q1` parameters in
order to create many small work units that in the end cover exactly the
global qrange.
Since we do not expect anyone to spend again as much computing resources
to perform again exactly the same computation, we provide in the
[`rel_count`](rel_count) file the count of how many (nonunique) relations were
[`
dlp240
rel_count`](
dlp240
rel_count) file the count of how many (nonunique) relations were
produced for each 1G specialq subrange.
We can then have a visual plot of this data, as shown in
[`rel_count.pdf`](rel_count.pdf), where the xcoordinate denotes the
specialq (in multiples of 1G).
The plot is very regular except for specialq's
around 150G and 225G. The irregularities in these areas correspond to
the beginning of the computation when we were still adjusting our
scripts. We had two independent servers in charge of distributing sieving
tasks, one dealing with [150G,225G], and the other one dealing with
[225G,300G].
In order to validate our computation, it is possible to recompute only
[`dlp240plot_rel_count.pdf`](dlp240plot_rel_count.pdf), where the
xcoordinate denotes the specialq (in multiples of 1G). The plot is
very regular except for specialq's around 150G and 225G. The
irregularities in these areas correspond to the beginning of the
computation when we were still adjusting our scripts. We had two
independent servers in charge of distributing sieving tasks, one dealing
with [150G,225G], and the other one dealing with [225G,300G].
In order to validate our computation, it is possible to recompute only
one of the subranges (not one in the irregular areas) and check that the
number of relations is the one we report. This still requires significant
resources. If only a single node is available for the validation, it is
...
...
@@ 286,34 +287,68 @@ follows.
The filtering follows the same general workflow as in the [rsa240
case](../rsa240/filtering.md), with some notable changes:
 important companion files must be generated beforehand with
 not one, but two programs must be used to generate important companion
files beforehand:
```
$CADO_BUILD/numbertheory/badideals poly dlp240.poly ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 badidealinfo $DATA/dlp240.badidealinfo badideals $DATA/dlp240.badideals
$CADO_BUILD/sieve/freerel poly dlp240.poly renumber $DATA/dlp240.renumber.gz lpb0 35 lpb1 35 out $DATA/dlp240.freerel.gz badideals $DATA/dlp240.badideals lcideals t 32
```
 commandline flag `dl badidealinfo $DATA/dlp240.badidealinfo` must be added to the `dup2` program.

the
commandline flag
s
`dl badidealinfo $DATA/dlp240.badidealinfo` must be added to the `dup2` program.
 the `merge` and `replay` programs must be replaced by `mergedl` and
`replaydl`
`replaydl`
, respectively
 the `replaydl` command line lists an extra output file
`dlp240.ideals` that is extremely important for the rest of the
computation.
Several filtering experiments were done during the sieving phase.
The final one can be reproduced as follows, with revision `492b804fc`:
### Duplicate removal
Duplicate removal used in the default cadonfs way. We did several
filtering runs as relations kept arriving. For each of this run, the
integer shell variable `$EXP` was increased by one (starting from 1).
In the command below, `$new_files` is expected to expand to a file
containing a list of file names of new relations (relative to `$DATA`) to
add to the stored set of relations.
```
mkdir p $DATA/dedup/{0..3}
$CADO_BUILD/filter/dup1 prefix dedup out $DATA/dedup/ basepath $DATA filelist $new_files n 2 > $DATA/dup1.$EXP.stdout 2> $DATA/dup1.$EXP.stderr
grep '^# slice.*received' $DATA/dup1.$EXP.stderr $DATA/dup1.$EXP.per_slice.txt
```
This first pass takes about 3 hours. Numbers of relations per slice are
printed by the program and must be saved for later use (hence the
`$DATA/dup1.$EXP.per_slice.txt` file).
The second pass of duplicate removal works independently on each of the
nonoverlapping slices (the number of slices can thus be used as a sort
of timememory tradeoff.
```
for i in {0..3} ; do
nrels=`
awk '/slice '$i' received/ { x+=$5 } END { print x; }' $DATA/dup1.
*
.per_slice.txt
`
$CADO_BUILD/filter/dup2 nrels $nrels renumber $DATA/rsa240.renumber $DATA/dedup/$i/dedup*gz dl badidealinfo $DATA/dlp240.badidealinfo > $DATA/dup2.$EXP.$i.stdout 2> $DATA/dup2.$EXP.$i.stderr
done
```
### "purge", a.k.a. singleton and "clique" removal.
Here is the command line of the last filtering run that we used (revision `492b804fc`), with `EXP=7`:
```
$CADO_BUILD/filter/purge out $DATA/purged7.gz nrels 2380725637 outdel $DATA/relsdel7.gz keep 3 colminindex 0 colmaxindex 2960421140 t 56 required_excess 0.0 files
nrels=$(awk '/remaining/ { x+=$4; } END { print x }' $DATA/dup2.$EXP.[03].stderr)
colmax=$(awk '/INFO: size = / { print $5 }' $DATA/dup2.$EXP.0.stderr)
$CADO_BUILD/filter/purge out $DATA/purged$EXP.gz nrels $nrels outdel $DATA/relsdel$EXP.gz keep 3 colminindex 0 colmaxindex $colmax t 56 required_excess 0.0 $DATA/dedup/*/dedup*gz
```
where `files` is the list of files with unique relations (output of `dup2`).
This took about 7.5 hours on the machine wurst, with 575GB of peak memory.
The merge step can be reproduced as follows:
### The "merge" step
The merge step can be reproduced as follows (still with `EXP=7` for the
final experiment).
```
$CADO_BUILD/filter/mergedl mat $DATA/purged
7
.gz out $DATA/history250_
7
target_density 250 skip 0 t 28
$CADO_BUILD/filter/mergedl mat $DATA/purged
$EXP
.gz out $DATA/history250_
$EXP
target_density 250 skip 0 t 28
```
and took about 20 minutes on the machine wurst, with a peak memory of 118GB.
### The "replay" step
Finally the replay step can be reproduced as follows:
```
$CADO_BUILD/filter/replaydl purged $DATA/purged
7
.gz his $DATA/history250_
7
.gz out $DATA/dlp240.matrix
7
.250.bin index $DATA/dlp240.index
7
.gz ideals $DATA/dlp240.ideals
7
.gz
$CADO_BUILD/filter/replaydl purged $DATA/purged
$EXP
.gz his $DATA/history250_
$EXP
.gz out $DATA/dlp240.matrix
$EXP
.250.bin index $DATA/dlp240.index
$EXP
.gz ideals $DATA/dlp240.ideals
$EXP
.gz
```
## Estimating linear algebra time more precisely, and choosing parameters
...
...
dlp240/rel_count.pdf
→
dlp240/
dlp240plot_
rel_count.pdf
View file @
80d86bbe
File moved
dlp240/rel_count
→
dlp240/
dlp240
rel_count
View file @
80d86bbe
File moved
rsa240/README.md
View file @
80d86bbe
...
...
@@ 84,7 +84,7 @@ provide guidance for all possible setups.
## Searching for a polynomial pair
See the
[
`polyselect.
txt
`
](
polyselect.
txt
)
file in this repository.
See the
[
`polyselect.
md
`
](
polyselect.
md
)
file in this repository.
And the winner is the
[
`rsa240.poly`
](
rsa240.poly
)
file:
...
...
@@ 225,11 +225,10 @@ real 75m54.351s
user 4768m41.853s
sys 43m15.877s
```
Then the
`75m54.351s=4554.3s`
value must be appropriately divided and
multiplied so as to convert it into physical coreseconds. For instance,
in our case, since there are 32 physical cores and we sieved 1024
specialqs, this gives
`4554.3*32/1024=142.32`
core.seconds per
specialq.
Then the
`75m54.351s=4554.3s`
value must be appropriately scaled in order
to convert it into physical coreseconds. For instance, in our case,
since there are 32 physical cores and we sieved 1024 specialqs, this
gives
`4554.3*32/1024=142.32`
core.seconds per specialq.
Finally, it remains to multiply by the number of specialq in this
subrange. We get (in Sagemath):
...
...
rsa240/filtering.md
View file @
80d86bbe
...
...
@@ 11,15 +11,15 @@ versions of cadonfs changed the format of this file.)
## duplicate removal
Duplicate removal was done with revision 50ad0f1fd of cadonfs.
cadonfs
proceeds through two passes. We used the default cadonfs
setting which,
on the first pass, splits the input into
`2^2=4`
independent slices, with
no overlap. cadonfs supports doing this step
in an incremental way, so
that we assume below the the shell variable
`EXP`
expands to an integer
indicating the filtering experiment number.
In the command below,
`$new_files`
is expected to expand to a file
containing a list of
file names of new relations (relative to
`$DATA`
) to
add to the stored
set of relations.
Duplicate removal was done with revision
`
50ad0f1fd
`
of cadonfs.
cadonfs
proceeds through two passes. We used the default cadonfs
setting which,
on the first pass, splits the input into
`2^2=4`
independent slices, with
no overlap. cadonfs supports doing this step
in an incremental way, so
that we assume below the the shell variable
`EXP`
expands to an integer
indicating the filtering experiment number.
In the command below,
`$new_files`
is expected to expand to a file
containing a list of
file names of new relations (relative to
`$DATA`
) to
add to the stored
set of relations.
```
mkdir p $DATA/dedup/{0..3}
$CADO_BUILD/filter/dup1 prefix dedup basepath $DATA filelist $new_files out $DATA/dedup/ n 2 > $DATA/dup1.$EXP.stdout 2> $DATA/dup1.$EXP.stderr
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment