Commit 7f6cbd1e authored by Emmanuel Thomé's avatar Emmanuel Thomé
Browse files

more reproducibility data

parent 80d86bbe
......@@ -196,6 +196,12 @@ we obtain about 2430 core.years for the total sieving time.
## Estimating linear algebra time (coarsely)
Linear algebra works with MPI. For this section, as well as all linear
algebra-related sections, we assume that you built cado-nfs with MPI
enabled (i.e., the `MPI` shell variable was set to the path of your MPI
installation), and that `CADO_BUILD` points to the directory where the
corresponding binaries were built.
To determine ahead of time the linear algebra time for a sparse binary
matrix with (say) 37M rows/columns and 250 non-zero entries per row, it
is possible to _stage_ a real set-up, just for the purpose of
......@@ -230,14 +236,14 @@ DATA=$DATA CADO_BUILD=$CADO_BUILD MPI=$MPI nrows=37000000 density=250 nthreads=3
This second method reports about 3.1 seconds per iteration. Allowing for
some inaccuracy, these experiments are sufficient to build confidence
that the time per iteration in the krylov (a.k.a. "sequence") step of
block Wiedemann is close to seconds per iteration. The time per
iteration in the mksol (a.k.a. "evaluation") step is in the same
ballpark. The time for krylov+mksol can then be estimated as the product
of this timing with `(1+n/m+1/n)*N`, with `N` the number of rows, and `m`
and `n` the block Wiedemann parameters (we chose `m=48` and `n=16`).
Applied to our use case, this gives an anticipated cost of
block Wiedemann is close 3 seconds per iteration, perhaps slightly less.
The time per iteration in the mksol (a.k.a. "evaluation") step is in the
same ballpark. The time for krylov+mksol can then be estimated as the
product of this timing with `(1+n/m+1/n)*N`, with `N` the number of rows,
and `m` and `n` the block Wiedemann parameters (we chose `m=48` and
`n=16`). Applied to our use case, this gives an anticipated cost of
`(1+n/m+1/n)*N*3*4*32/3600/24/365=628` core-years for Krylov+Mksol (4 and
32 representing the fact that we used 4-node jobs with 32-physical cores
32 representing the fact that we used 4-node jobs with 32 physical cores
per node).
......@@ -285,8 +291,8 @@ managable number of large files (150 files of 3.2GB each). These had to
undergo filtering in order to produce a linear system. The process is as
follows.
The filtering follows the same general workflow as in the [rsa-240
case](../rsa240/filtering.md), with some notable changes:
The filtering follows roughly the same general workflow as in the
[rsa-240 case](../rsa240/filtering.md), with some notable changes:
- not one, but two programs must be used to generate important companion
files beforehand:
```
......@@ -299,6 +305,8 @@ case](../rsa240/filtering.md), with some notable changes:
- the `replay-dl` command line lists an extra output file
`dlp240.ideals` that is extremely important for the rest of the
computation.
- as the linear system to be solved is inhomogenous, another program
must be called in order to compute the right-hand side of the system.
### Duplicate removal
......@@ -311,28 +319,32 @@ add to the stored set of relations.
```
mkdir -p $DATA/dedup/{0..3}
$CADO_BUILD/filter/dup1 -prefix dedup -out $DATA/dedup/ -basepath $DATA -filelist $new_files -n 2 > $DATA/dup1.$EXP.stdout 2> $DATA/dup1.$EXP.stderr
grep '^# slice.*received' $DATA/dup1.$EXP.stderr $DATA/dup1.$EXP.per_slice.txt
grep '^# slice.*received' $DATA/dup1.$EXP.stderr > $DATA/dup1.$EXP.per_slice.txt
```
This first pass takes about 3 hours. Numbers of relations per slice are
printed by the program and must be saved for later use (hence the
`$DATA/dup1.$EXP.per_slice.txt` file).
The second pass of duplicate removal works independently on each of the
non-overlapping slices (the number of slices can thus be used as a sort
of time-memory tradeoff.
non-overlapping slices. The number of slices can thus be used as a sort
of time-memory tradeoff (here, `-n 2` tells the program to do `2^2=4`
slices).
```
for i in {0..3} ; do
nrels=`awk '/slice '$i' received/ { x+=$5 } END { print x; }' $DATA/dup1.*.per_slice.txt`
$CADO_BUILD/filter/dup2 -nrels $nrels -renumber $DATA/rsa240.renumber $DATA/dedup/$i/dedup*gz -dl -badidealinfo $DATA/dlp240.badidealinfo > $DATA/dup2.$EXP.$i.stdout 2> $DATA/dup2.$EXP.$i.stderr
$CADO_BUILD/filter/dup2 -nrels $nrels -renumber $DATA/dlp240.renumber.gz -dl -badidealinfo $DATA/dlp240.badidealinfo $DATA/dedup/$i/dedup*gz > $DATA/dup2.$EXP.$i.stdout 2> $DATA/dup2.$EXP.$i.stderr
done
```
(Note: in newer versions of cado-nfs, after june 2020, the `-badidealinfo
$DATA/dlp240.badidealinfo` arguments to the `dup2` program must be
replaced by `-poly dlp240.poly`.)
### "purge", a.k.a. singleton and "clique" removal.
### The "purge" step, a.k.a. singleton and "clique" removal.
Here is the command line of the last filtering run that we used (revision `492b804fc`), with `EXP=7`:
```
nrels=$(awk '/remaining/ { x+=$4; } END { print x }' $DATA/dup2.$EXP.[0-3].stderr)
colmax=$(awk '/INFO: size = / { print $5 }' $DATA/dup2.$EXP.0.stderr)
colmax=2960421140
$CADO_BUILD/filter/purge -out $DATA/purged$EXP.gz -nrels $nrels -outdel $DATA/relsdel$EXP.gz -keep 3 -col-min-index 0 -col-max-index $colmax -t 56 -required_excess 0.0 $DATA/dedup/*/dedup*gz
```
This took about 7.5 hours on the machine wurst, with 575GB of peak memory.
......@@ -343,7 +355,7 @@ final experiment).
```
$CADO_BUILD/filter/merge-dl -mat $DATA/purged$EXP.gz -out $DATA/history250_$EXP -target_density 250 -skip 0 -t 28
```
and took about 20 minutes on the machine wurst, with a peak memory of 118GB.
This took about 20 minutes on the machine wurst, with a peak memory of 118GB.
### The "replay" step
Finally the replay step can be reproduced as follows:
......@@ -351,10 +363,167 @@ Finally the replay step can be reproduced as follows:
$CADO_BUILD/filter/replay-dl -purged $DATA/purged$EXP.gz -his $DATA/history250_$EXP.gz -out $DATA/dlp240.matrix$EXP.250.bin -index $DATA/dlp240.index$EXP.gz -ideals $DATA/dlp240.ideals$EXP.gz
```
### Computing the right-hand side.
This is done with a program called `sm`. There are several variants of this program, and several ways to invoke it. Here is the command line that we used. Note that we use the file `$DATA/dlp240.index$EXP.gz` that was just created by the above step.
```
$CADO_BUILD/filter/sm -poly dlp240.poly -purged $DATA/purged$EXP.gz -index $DATA/dlp240.index$EXP.gz -out $DATA/dlp240.matrix$EXP.250.sm -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -t 56
```
This took about four hours on the machine wurst.
## Estimating linear algebra time more precisely, and choosing parameters
The filtering output is controlled by a wealth of tunable parameters.
However on the very coarse-grain level we focus on two of them:
* _when_ we decide to stop relation collection.
* _how dense_ we want the final matrix to be.
Sieving more is expected to have a beneficial impact on the matrix size,
but this benefit can become marginal, reaching a point of diminishing
returns eventually. Allowing for a denser matrix also makes it possible
to have fewer rows in the final matrix, which is good for various memory
related concerns.
We did several filtering experiments based on the DLP-240 data set, as
relations kept coming in. For each of these experiments, we give the
number of raw relations, the number of relations after the initial
"purge" step, as well as the number of rows of the final matrix after
"merge", for target densities d=100, d=150, and d=200.
| | rels | purged | d=150 | d=200 | d=250
| -------------|-------|--------|-------|-------|------
| experiment 4 | 2.07G | 1.87G | 51.6M | 46.1M | (skip)
| experiment 5 | 2.30G | 1.59G | 45.2M | 40.4M | (skip)
| experiment 7 | 2.38G | 1.50G | 42.9M | 38.9M | 36.2M
Each of these experiments produced a matrix, and it was possible to run a
few iterations of each, in order to guide the final choice. For this,
a single command line is sufficient. For consistency with the other
scripts, it is placed as well in a script in this repository, namely
[`dlp240-linalg-0b-test-few-iterations.sh`](dlp240-linalg-0b-test-few-iterations.sh).
This script needs the `MPI`, `DATA`, `matrix`, and `CADO_BUILD` to be
set. It can be used as follows, where `$matrix` points to one of the
matrices that have been produced by the filter code (after the `replay`
step). For this quick bench, the right-hand-side file is not necessary.
```
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
export MPI
./dlp240-linalg-0b-test-few-iterations.sh
```
This can be used to decide what density should be preferred (the curve is
usually pretty flat anyway).
With the chosen matrix for the DLP-240 computation (experiment 7, density
250), we observed an average time of roughly 2.6 seconds per iteration
on the `grvingt` platform, subject to some variations (which is somewhat
on the optimistic end of our expected timing range).
## Reproducing the linear algebra results
The input to the linear algebra step consists in four files.
- The matrix file `$DATA/dlp240.matrix7.250.bin` (69 GB)
- The companion file `$DATA/dlp240.matrix7.250.rw.bin` (139 MB)
- The companion file `$DATA/dlp240.matrix7.250.cw.bin` (139 MB)
- The right-hand side file `$DATA/dlp240.matrix7.250.sm` (33 GB)
This linear system has 36190697 rows and columns.
We decided to use the block Wiedemann parameters `m=48` and `n=16`,
running on 4-node jobs.
The first part of the computation can be done with these scripts:
```
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
export MPI
./dlp240-linalg-1-prep.sh
./dlp240-linalg-2-secure.sh
./dlp240-linalg-3-krylov.sh sequence=0 start=0
./dlp240-linalg-3-krylov.sh sequence=1 start=0
...
./dlp240-linalg-3-krylov.sh sequence=15 start=0
```
where the last 16 lines (steps `3-krylov`) correspond to the 16 "sequences" (vector blocks
numbered `0-1`, `1-2`, until `15-16`). These sequences can
be run concurrently on different sets of nodes, with no synchronization
needed. Each of these 16 sequences needs about 90 days to complete (in practice, we used a different platform than the one we report timings for, but the timings and calendar time was in the same ballpark). Jobs can be interrupted, and must simply be restarted exactly
from where they left off. E.g., if the latest of the `V1-2.*` files in
`$DATA` is `V1-2.86016`, then the job for sequence 1 can be restarted
with:
```
./dlp240-linalg-3-krylov.sh sequence=1 start=86016
```
Cheap sanity checks can be done periodically with the following script,
which does all checks it can do (note that the command is happy if it
finds _no_ check to do as well!)
```
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
export MPI
./dlp240-linalg-4-check-krylov.sh
```
Once this is done, data must be collated before being processed by the
later steps. After step `5-acollect` below, a file named `A0-16.0-3016704` with
size 240950181888 bytes will be in `$DATA`. Step `6-lingen` below runs on
36 nodes, and completes in approximately one week (periodic
checkpoint/restart is supported).
```
export matrix=$DATA/dlp240.matrix7.250.bin
export DATA
export CADO_BUILD
export MPI
./dlp240-linalg-5-acollect.sh
./dlp240-linalg-6-lingen.sh
./dlp240-linalg-7-check-lingen.sh
./dlp240-linalg-8-mksol.sh start=0
./dlp240-linalg-8-mksol.sh start=32768
./dlp240-linalg-8-mksol.sh start=65536
# ... 67 other commands of the same kind (70 in total) ...
./dlp240-linalg-8-mksol.sh start=2260992
./dlp240-linalg-9-finish.sh
```
All steps `8-mksol.sh` above can be run in parallel (they use the `V*`
files produced in steps `3-krylov` above as a means to jump-start the
computation in the middle). Each uses 8 nodes and takes about 13 hours to
complete (1.43 seconds per iteration). Note that in order to bench the
mksol timings ahead of time, it is possible to create fake files with
random data, named as follows
```
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.0-1
-rw-rw-r-- 1 ethome users 104 Nov 3 2019 F.sols0-1.0-1.rhs
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.1-2
-rw-rw-r-- 1 ethome users 104 Nov 3 2019 F.sols0-1.1-2.rhs
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.10-11
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.11-12
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.12-13
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.13-14
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.14-15
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.15-16
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.2-3
-rw-rw-r-- 1 ethome users 104 Nov 3 2019 F.sols0-1.2-3.rhs
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.3-4
-rw-rw-r-- 1 ethome users 104 Nov 3 2019 F.sols0-1.3-4.rhs
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.4-5
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.5-6
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.6-7
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.7-8
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.8-9
-rw-rw-r-- 1 ethome users 235239680 Nov 3 2019 F.sols0-1.9-10
```
(the size above is the final size. For a quick test, it is sufficient to
replace the big files with files of size
`13*8*32768=3407872` bytes.)
After having successfully followed the steps above, a file named
`W.sols0-1` will be in `$DATA`. This file represents a kernel vector.
## Reproducing the individual logarithm result
In principle the general script `scripts/descent.py` in the `cado-nfs`
......@@ -364,7 +533,7 @@ optimized for large computations, and in particular it starts by reading
the whole database of known discrete logarithms in central memory which
is slow and required a machine with a huge amount of memory.
In the file [`howto-descent.txt`](howto-descent.txt), we explain what we did to make our lives
In the file [`howto-descent.md`](howto-descent.md), we explain what we did to make our lives
simpler with this step. We do not claim full reproducibility here, since
this is admittedly hackish (a small C-code is also given, that searches
in the database file without having an in-memory image). In any case,
......
......@@ -47,7 +47,6 @@ cargs=(
sequential_cache_read=2
wdir=$DATA
matrix="$matrix"
rhs="$rhs"
cpubinding="Package=>1x2 L2Cache*16=>16x2 PU*2=>1x1"
)
......@@ -64,4 +63,6 @@ if [ "${#cache_files[@]}" != 256 ] ; then
done
"${mpi[@]}" $CADO_BUILD/linalg/bwc/dispatch "${cargs[@]}" ys=0..1
fi
# dispatch doesn't grok rhs, let's add it last.
cargs+=(rhs="$rhs")
"${mpi[@]}" $CADO_BUILD/linalg/bwc/prep "${cargs[@]}"
......@@ -24,7 +24,7 @@ set -ex
export DISPLAY= # safety precaution
mpi=(
$MPI/bin/mpiexec -n 16
$MPI/bin/mpiexec -n 36
-machinefile $OAR_NODEFILE
--map-by node
--mca plm_rsh_agent oarsh
......@@ -33,17 +33,18 @@ $MPI/bin/mpiexec -n 16
--mca mtl ^psm2,ofi,cm
--mca btl '^openib'
)
mkdir $DATA/cp || :
args=(
m=48 n=16
prime=62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651
mpi=4x4
mpi=6x6
wdir=$DATA
--afile A0-16.0-3016704
--ffile F
--split-output-file 1
tuning_timing_cache_filename="`dirname $0`"/dlp240-linalg-6-lingen-tim240.txt
tuning_schedule_filename="`dirname $0`"/dlp240-linalg-6-lingen-ts240.txt
max_ram=12
max_ram=128
# thr=8x4 (do not specify thr)
tree_stats_max_nesting=3
basecase_keep_until=1.1
......
#!/bin/bash
for x in "$@" ; do
if [[ $x =~ ^[0-9a-zA-Z_/.-]+=[0-9a-zA-Z_/.-]+$ ]] ; then
eval $x
fi
done
for f in CADO_BUILD DATA ; do
if ! [ "${!f}" ] ; then
echo "\$$f must be defined" >&2
exit 1
fi
done
for f in lingen_verify_checkpoints_p_13 ; do
if ! [ -x "$CADO_BUILD/linalg/bwc/$f" ] ; then
echo "missing binary $CADO_BUILD/linalg/bwc/$f ; compile it first" >&2
exit 1
fi
done
set -ex
# This is not an MPI program, but we need to know the MPI setting in
# order to properly detect files that were saved in multiple pieces.
args=(
m=48 n=16
prime=62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651
mpi=4x4
wdir=$DATA/cp
)
$CADO_BUILD/linalg/bwc/lingen_verify_checkpoints_p_13 "${args[@]}"
Recompile cado-nfs:
===================
Recompile cado-nfs
==================
1) Increase the value of NB_MAX_METHODS in the #define in the
sieve/ecm/facul.hpp file. (600 should be enough)
1. Increase the value of `NB_MAX_METHODS` in the `#define` in the
`sieve/ecm/facul.hpp` file. (600 should be enough)
2) Make sure that you have GMP-ECM is installed and your machine and
2. Make sure that you have GMP-ECM is installed and your machine and
visible from cado-nfs's compilation toolchain.
You might have to add something like
```
GMPECM=/path/to/gmpecm
```
in your local.sh in order to activate the compilation of
misc/descent_init_Fp which depends on GMP-ECM. Note that it might be
necessary to configure gmp-ecm with the --enable-shared option so that it
`misc/descent_init_Fp` which depends on GMP-ECM. Note that it might be
necessary to configure gmp-ecm with the `--enable-shared` option so that it
is recognized by cmake.
3) As usual for a large experiment, add the following in local.sh
3. As usual for a large experiment, add the following in `local.sh`
```
FLAGS_SIZE="-DSIZEOF_P_R_VALUES=8 -DSIZEOF_INDEX=8"
```
4) Compile...
Prepare a working directory:
============================
Prepare a working directory
===========================
```
# Choose a place
wdir=/path/to/dlp240
......@@ -34,39 +39,45 @@ dlp240.renumber
dlp240.roots0.gz
dlp240.roots1.gz
dlp240.hint
```
Prepare the target:
===================
Prepare the target
==================
In what follows, $target must be a shell variable containing the target
written in decimal. It can be for instance:
0) Harcoded target:
0. Harcoded target:
```
target="123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100"
```
1) Target taken from a sentence:
1. Target taken from a sentence:
```
target_str="The magic words are still Squeamish Ossifrage"
target_hex=`echo -n $target_str | xxd -p -c 256`
target_hex=${target_hex^^}
target=`echo "ibase=16; $target_hex" | BC_LINE_LENGTH=0 bc`
```
2) Target taken from a formula computed with Magma / Sage:
2. Target taken from a formula computed with Magma / Sage:
```
target=`echo "Ceiling(2^793*Pi(RealField(250)));" | magma -b`
```
Set the short name for the target:
==================================
The descent.py script uses a short name for the target. Since we are cheating
with this, we need to use this internal name:
```
short_target="${target:0:10}...${target: -10}"
```
Initialize the descent:
=======================
```
# choose the path to your CADO_NFS build directory
CADO_BUILD=/path/to/cado-nfs/build
......@@ -94,10 +105,11 @@ fac_v = 3,0 7,6 41,22 113,105 401,246 24184471581643,22596272999833 176604796267
Youpi: e = 585233622 is a winner
Total CPU time: 453555.0 s
EOF
```
Create the todo file:
=====================
Create the todo file
====================
```
cd $wdir
cat <<EOF | magma -b
Init := Split(Read("dlp240.descent.${short_target}.upper.init"));
......@@ -117,16 +129,18 @@ for x in Out do
fprintf "dlp240.descent.${short_target}.upper.todo", "%o %o %o\n", x[1],x[2],x[3];
end for;
EOF
```
Run las_descent manually:
=========================
Run las_descent manually
========================
```
$CADO_BUILD/sieve/las_descent --renumber $wdir/dlp240.renumber --log $wdir/dlp240.reconstructlog.dlog --recursive-descent --allow-largesq --never-discard --fb1 $wdir/dlp240.roots1.gz --poly $wdir/dlp240.poly --fb0 $wdir/dlp240.roots0.gz --descent-hint-table $wdir/dlp240.hint --I 16 --lim0 1073741824 --lim1 1073741824 --lpb0 35 --mfb0 105 --lpb1 35 --mfb1 105 -t 16 --todo $wdir/dlp240.descent.${short_target}.upper.todo -v -bkmult 1,1s:1.09123 > $wdir/dlp240.descent.${short_target}.middle.rels 2>&1
```
Extract all the known logs of the ideals involved in the descent trees
and create a fake .dlog file:
======================================================================
Extract all the known logs of the ideals involved in the descent trees and create a fake .dlog file
===================================================================================================
```
cd $wdir
tempfile1=`mktemp`
tempfile2=`mktemp`
......@@ -180,13 +194,14 @@ cat $tempfile2 | xargs ./a.out dlp240.reconstructlog.dlog > $tempfile3
head -100000 dlp240.reconstructlog.dlog > dlp240.crafted.dlog
cat $tempfile3 | grep -v "not" >> dlp240.crafted.dlog
tail -10 dlp240.reconstructlog.dlog >> dlp240.crafted.dlog
```
Call the python script that glues things together:
==================================================
Call the python script that glues things together
=================================================
```
ln -s $wdir/dlp240.crafted.dlog $wdir/dlp240.dlog
# most parameters are not used, since we have precomputed everything !
$CADO_BUILD/scripts/descent.py --target $target --gfpext 1 --prefix dlp240 --datadir $wdir --cadobindir $CADO_BUILD --descent-hint $wdir/dlp240.hint --init-I 10 --init-ncurves 5 --init-lpb 22 --init-lim 1500 --init-mfb 34 --init-tkewness 100000 --I 16 --lpb0 35 --lpb1 35 --mfb0 70 --mfb1 70 --lim0 1073741824 --lim1 1073741824 --ell $ell
```
......@@ -318,6 +318,12 @@ we obtain about 510 core.years for this sub-range.
## Estimating linear algebra time (coarsely)
Linear algebra works with MPI. For this section, as well as all linear
algebra-related sections, we assume that you built cado-nfs with MPI
enabled (i.e., the `MPI` shell variable was set to the path of your MPI
installation), and that `CADO_BUILD` points to the directory where the
corresponding binaries were built.
The matrix size for RSA-240 is about 282M, with density 200 per row.
However, it is possible, and actually useful, to have an idea of the
computational cost of linear algebra before the matrix is actually ready,
......@@ -368,7 +374,7 @@ number of rows, and `m` and `n` the block Wiedemann parameters (we chose
`m=512` and `n=256`). Applied to our use case, this gives an anticipated
cost of `(1+n/m+64/n)*(N/64)*1.3*8*32/3600/24/365=86.6` core-years for
Krylov+Mksol (8 and 32 representing the fact that we used 8-node jobs
with 32-physical cores per node).
with 32 physical cores per node).
Because the parallel code for the "lingen" (linear generator) step of
block Wiedemann was not ready when the computation started, we did no
......@@ -471,9 +477,13 @@ iteration on the `grvingt` platform, subject to some variations.
The scripts above are of course part of a more general picture that does
the full block Wiedemann algorithm.
We decided to use the block Wiedemann parameters `m=512` and `n=256`,
giving rise to `n/64=4` sequences to be computed indepedently. We used
8-node jobs.
The first part of the computation can be done with these scripts:
```
export matrix=/data/experiment/chosen_matrix.bin
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
export CADO_BUILD
export MPI
......@@ -499,7 +509,7 @@ Cheap sanity checks can be done periodically with the following script,
which does all checks it can do (note that the command is happy if it
finds _no_ check to do as well!)
```
export matrix=/data/experiment/chosen_matrix.bin
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
export CADO_BUILD
export MPI
......@@ -511,7 +521,7 @@ later steps. After step `5-acollect` below, a file named `A0-256.0-1654784` with
size 27111981056 bytes will be in `$DATA`. Step `6-lingen` below runs on
16 nodes, and completes in slightly less than 10 hours.
```
export matrix=/data/experiment/chosen_matrix.bin
export matrix=$DATA/rsa240.matrix11.200.bin
export DATA
export CADO_BUILD
export MPI
......@@ -521,8 +531,8 @@ export MPI
./rsa240-linalg-8-mksol.sh start=0
./rsa240-linalg-8-mksol.sh start=32768
./rsa240-linalg-8-mksol.sh start=65536
# ... 28 other commands of the same kind (31 in total) ...
./rsa240-linalg-8-mksol.sh start=983040
# ... 31 other commands of the same kind (34 in total) ...
./rsa240-linalg-8-mksol.sh start=1081344
./rsa240-linalg-9-finish.sh
```
......@@ -539,7 +549,7 @@ follows
-rw-r--r-- 1 ethome users 564674048 Nov 20 21:47 F.sols0-64.64-128
```
(the size above is the final size. For a quick test, a size of
`512*64/64*32768=2097152` bytes would be enough.)
`64*64/8*32768=16777216` bytes would be enough.)
After having successfully followed the steps above, a file named
`W.sols0-64` will be in `$DATA`. This file represents a kernel vector.
......
......@@ -3,13 +3,13 @@
A first step of the filtering process in cado-nfs is to create the
so-called "renumber table", as follows.
```
$CADO_BUILD/sieve/freerel -poly rsa240.poly -renumber $DATA/rsa240.renumber -lpb0 36 -lpb1 37 -out $DATA/rsa240.freerel -t 32
$CADO_BUILD/sieve/freerel -poly rsa240.poly -renumber $DATA/rsa240.renumber.gz -lpb0 36 -lpb1 37 -out $DATA/rsa240.freerel -t 32
```
where `-t 32` specifies the number of thread. This was done with revision
`30a5f3eae` of cado-nfs, and takes several hours. (Note that newer
versions of cado-nfs changed the format of this file.)
## duplicate removal
## Duplicate removal
Duplicate removal was done with revision `50ad0f1fd` of cado-nfs.
cado-nfs proceeds through two passes. We used the default cado-nfs
......@@ -23,27 +23,32 @@ add to the stored set of relations.
```
mkdir -p $DATA/dedup/{0..3}
$CADO_BUILD/filter/dup1 -prefix dedup -basepath $DATA -filelist $new_files -out $DATA/dedup/ -n 2 > $DATA/dup1.$EXP.stdout 2> $DATA/dup1.$EXP.stderr
grep '^# slice.*received' $DATA/dup1.$EXP.stderr $DATA/dup1.$EXP.per_slice.txt
grep '^# slice.*received' $DATA/dup1.$EXP.stderr > $DATA/dup1.$EXP.per_slice.txt
```
This first pass takes about 6 hours. Numbers of relations per slice are
printed by the program and must be saved for later use (hence the
`$DATA/dup1.$EXP.per_slice.txt` file).
The second pass of duplicate removal works independently on each of the
non-overlapping slices (the number of slices can thus be used as a sort
of time-memory tradeoff).
non-overlapping slices. The number of slices can thus be used as a sort
of time-memory tradeoff.
```
for i in {0..3} ; do
nrels=`awk '/slice '$i' received/ { x+=$5 } END { print x; }' $DATA/dup1.*.per_slice.txt`
$CADO_BUILD/filter/dup2 -nrels $nrels -renumber $DATA/rsa240.renumber $DATA/dedup/$i/dedup*gz > $DATA/dup2.$EXP.$i.stdout 2> $DATA/dup2.$EXP.$i.stderr
$CADO_BUILD/filter/dup2 -nrels $nrels -renumber $DATA/rsa240.renumber.gz $DATA/dedup/$i/dedup*gz > $DATA/dup2.$EXP.$i.stdout 2> $DATA/dup2.$EXP.$i.stderr
done
```
(Note: in newer versions of cado-nfs, after june 2020, the `dup2`
programs also requires the arguments `-poly rsa240.poly`.)
## "purge", a.k.a. singleton and "clique" removal.
## The "purge" step, a.k.a. singleton and "clique" removal.
This step was done with revision `50ad0f1fd` of cado-nfs. We assume below
that `$EXP` is consistent with the latest pass of duplicate removal that
was done following the steps above.
```
nrels=$(awk '/remaining/ { x+=$4; } END { print x }' $DATA/dup2.$EXP.[0-3].stderr)
colmax=$(awk '/INFO: size = / { print $5 }' $DATA/dup2.$EXP.0.stderr)
colmax=8460702956
$CADO_BUILD/filter/purge -out purged$EXP.gz -nrels $nrels -keep 160 -col-min-index 0 -col-max-index $colmax -t 56 -required_excess 0.0 $DATA/dedup/*/dedup*gz
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment