Commit 78eb0fdc authored by Emmanuel Thomé's avatar Emmanuel Thomé
Browse files

more rsa250-linalg stuff

parent 81cebb32
......@@ -9,7 +9,7 @@ Several chapters are covered.
* [Searching for a polynomial pair](#searching-for-a-polynomial-pair)
* [Estimating the number of (unique) relations](#estimating-the-number-of-unique-relations)
* [Estimating the cost of sieving](#estimating-the-cost-of-sieving)
* [Estimating linear algebra time (coarsely)](#estimating-linear-algebra-time-coarsely)
* [Estimating the linear algebra time (coarsely)](#estimating-the-linear-algebra-time-coarsely)
* [Validating the claimed sieving results](#validating-the-claimed-sieving-results)
* [Reproducing the filtering results](#reproducing-the-filtering-results)
* [Duplicate removal](#duplicate-removal)
......@@ -17,7 +17,7 @@ Several chapters are covered.
* [The "merge" step](#the-merge-step)
* [The "replay" step](#the-replay-step)
* [Computing the right-hand side.](#computing-the-right-hand-side)
* [Estimating linear algebra time more precisely, and choosing parameters](#estimating-linear-algebra-time-more-precisely-and-choosing-parameters)
* [Estimating the linear algebra time more precisely, and choosing parameters](#estimating-the-linear-algebra-time-more-precisely-and-choosing-parameters)
* [Reproducing the linear algebra results](#reproducing-the-linear-algebra-results)
* [Back-substituting the linear algebra result in collected relations](#back-substituting-the-linear-algebra-result-in-collected-relations)
* [Reproducing the individual logarithm result](#reproducing-the-individual-logarithm-result)
......@@ -143,7 +143,9 @@ $CADO_BUILD/sieve/las -poly dlp240.poly -fb0 $DATA/dlp240.fb0.gz -fb1 $DATA/dlp2
In less than half an hour on our target machine `grvingt`, this gives
the estimate that 0.61 unique relations per special-q can be expected
based on these parameters.
based on these parameters. (Note that this test can also be done in
parallel over several nodes, using the `-seed [[seed value]]` argument in
order to vary the random picks.)
In order to deduce an estimate of the total number of (de-duplicated)
relations, it remains to multiply the average number of relations per
......@@ -220,7 +222,7 @@ print (cost_in_core_hours, cost_in_core_years)
With this experiment, we get 20.9 core.sec per special-q, and therefore
we obtain about 2430 core.years for the total sieving time.
## Estimating linear algebra time (coarsely)
## Estimating the linear algebra time (coarsely)
Linear algebra works with MPI. For this section, as well as all linear
algebra-related sections, we assume that you built cado-nfs with MPI
......@@ -283,11 +285,10 @@ parameter settings to choose from, and that this computation was doable.
## Validating the claimed sieving results
The benchmark command lines above can be used almost as-is for
reproducing the full computation. It is just necessary to remove the
`-random-sample` option and to adjust the `-q0` and `-q1` parameters in
order to create many small work units that in the end cover exactly the
global q-range.
The benchmark command lines above can be used almost as is to reproduce
the full computation. It is just necessary to remove the `-random-sample`
option and to adjust the `-q0` and `-q1` parameters in order to create
many small work units that in the end cover exactly the global q-range.
Since we do not expect anyone to spend again as much computing resources
to perform again exactly the same computation, we provide in the
......@@ -325,7 +326,7 @@ The filtering follows roughly the same general workflow as in the
$CADO_BUILD/numbertheory/badideals -poly dlp240.poly -ell 62310183390859392032917522304053295217410187325839402877409394441644833400594105427518019785136254373754932384219229310527432768985126965285945608842159143181423474202650807208215234033437849707623496592852091515256274797185686079514642651 -badidealinfo $DATA/dlp240.badidealinfo -badideals $DATA/dlp240.badideals
$CADO_BUILD/sieve/freerel -poly dlp240.poly -renumber $DATA/dlp240.renumber.gz -lpb0 35 -lpb1 35 -out $DATA/dlp240.freerel.gz -badideals $DATA/dlp240.badideals -lcideals -t 32
```
- the command-line flags `-dl -badidealinfo $DATA/dlp240.badidealinfo` must be added to the `dup2` program.
- the command line flags `-dl -badidealinfo $DATA/dlp240.badidealinfo` must be added to the `dup2` program.
- the `merge` and `replay` programs must be replaced by `merge-dl` and
`replay-dl`, respectively
- the `replay-dl` command line lists an extra output file
......@@ -397,7 +398,7 @@ $CADO_BUILD/filter/sm -poly dlp240.poly -purged $DATA/purged$EXP.gz -index $DATA
```
This took about four hours on the machine wurst.
## Estimating linear algebra time more precisely, and choosing parameters
## Estimating the linear algebra time more precisely, and choosing parameters
The filtering output is controlled by a wealth of tunable parameters.
However on the very coarse-grain level we focus on two of them:
......
......@@ -8,10 +8,10 @@ Several chapters are covered.
* [Searching for a polynomial pair](#searching-for-a-polynomial-pair)
* [Estimating the number of (unique) relations](#estimating-the-number-of-unique-relations)
* [Estimating the cost of sieving](#estimating-the-cost-of-sieving)
* [Estimating linear algebra time (coarsely)](#estimating-linear-algebra-time-coarsely)
* [Estimating the linear algebra time (coarsely)](#estimating-the-linear-algebra-time-coarsely)
* [Validating the claimed sieving results](#validating-the-claimed-sieving-results)
* [Reproducing the filtering results](#reproducing-the-filtering-results)
* [Estimating linear algebra time more precisely, and choosing parameters](#estimating-linear-algebra-time-more-precisely-and-choosing-parameters)
* [Estimating the linear algebra time more precisely, and choosing parameters](#estimating-the-linear-algebra-time-more-precisely-and-choosing-parameters)
* [Reproducing the linear algebra results](#reproducing-the-linear-algebra-results)
## Software prerequisites, and reference hardware configuration
......@@ -150,7 +150,9 @@ $CADO_BUILD/sieve/las -poly rsa240.poly -fb1 $DATA/rsa240.fb1.gz -lim0 180000000
In slightly more than an hour on our target machine `grvingt`, this gives
the estimate that 19.6 unique relations per special-q can be expected
based on these parameters.
based on these parameters. (Note that this test can also be done in
parallel over several nodes, using the `-seed [[seed value]]` argument in
order to vary the random picks.)
In order to deduce an estimate of the total number of (de-duplicated)
relations, it remains to multiply the average number of relations per
......@@ -166,6 +168,7 @@ ave_rel_per_sq = 19.6 ## pick value ouput by las
number_of_sq = log_integral(7.4e9) - log_integral(8e8)
tot_rels = ave_rel_per_sq * number_of_sq
print (tot_rels)
# 5.88556387364565e9
```
This estimate (5.9G relations) can be made more precise by increasing the
number of special-q that are sampled for sieving. It is also possible to
......@@ -202,7 +205,7 @@ written (or pass `-nq 0`).
#### Cost of 2-sided sieving in the q-range [8e8,2.1e9]
In order to measure the cost of sieving in the special-q subrange where
sieving is used on both sides, the typical command-line is as follows:
sieving is used on both sides, the typical command line is as follows:
```shell
time $CADO_BUILD/sieve/las -poly rsa240.poly -fb1 $DATA/rsa240.fb1.gz -lim0 1800000000 -lim1 2100000000 -lpb0 36 -lpb1 37 -q0 8e8 -q1 2.1e9 -sqside 1 -A 32 -mfb0 72 -mfb1 111 -lambda0 2.2 -lambda1 3.2 -random-sample 1024 -t auto -bkmult 1,1l:1.15,1s:1.4,2s:1.1 -v -bkthresh1 90000000 -adjust-strategy 2 -fbc /tmp/rsa240.fbc
......@@ -265,7 +268,7 @@ shell script given in this repository. This launches:
files as they are produced, do the batch smoothness detection, and
produce relations.
The script takes two command-line arguments `-q0 xxx` and `-q1 xxx`,
The script takes two command line arguments `-q0 xxx` and `-q1 xxx`,
which describe the range of special-q to process. Temporary files are put
in the `/tmp` directory by default.
......@@ -320,7 +323,7 @@ print (cost_in_core_hours, cost_in_core_years)
With this experiment, we get 67.4 core.sec per special-q, and therefore
we obtain about 510 core.years for this sub-range.
## Estimating linear algebra time (coarsely)
## Estimating the linear algebra time (coarsely)
Linear algebra works with MPI. For this section, as well as all linear
algebra-related sections, we assume that you built cado-nfs with MPI
......@@ -387,10 +390,10 @@ going to be minor anyway.
## Validating the claimed sieving results
The benchmark command lines above can be used almost as-is for
reproducing the full computation. It is just necessary to remove the
`-random-sample` option and to adjust the `-q0` and `-q1` to create many
small work units that in the end cover exactly the global q-range.
The benchmark command lines above can be used almost as is to reproduce
the full computation. It is just necessary to remove the `-random-sample`
option and to adjust the `-q0` and `-q1` to create many small work units
that in the end cover exactly the global q-range.
Since we do not expect anyone to spend again as much computing resources
to perform again exactly the same computation, we provide in the
......@@ -426,7 +429,7 @@ The filtering step in cado-nfs proceeds through several steps.
The file [`filtering.md`](filtering.md) in this repository gives more
information on these steps.
## Estimating linear algebra time more precisely, and choosing parameters
## Estimating the linear algebra time more precisely, and choosing parameters
The filtering output is controlled by a wealth of tunable parameters.
However on the very coarse-grain level we focus on two of them:
......
......@@ -100,9 +100,11 @@ matter.
$CADO_BUILD/sieve/las -poly rsa250.poly -fb1 $DATA/rsa250.fb1.gz -lim0 2147483647 -lim1 2147483647 -lpb0 36 -lpb1 37 -q0 1e9 -q1 12e9 -dup -dup-qmin 0,1000000000 -sqside 1 -A 33 -mfb0 72 -mfb1 111 -lambda0 2.2 -lambda1 3.2 -random-sample 1024 -t auto -bkmult 1,1l:1.25975,1s:1.5,2s:1.1 -v -bkthresh1 80000000 -adjust-strategy 2 -fbc /tmp/rsa250.fbc -hint-table rsa250.hint
```
In slightly more than an hour on our target machine `grvingt`, this gives
In approximately three hours on our target machine `grvingt`, this gives
the estimate that 12.7 unique relations per special-q can be expected
based on these parameters.
based on these parameters. (Note that this test can also be done in
parallel over several nodes, using the `-seed [[seed value]]` argument in
order to vary the random picks.)
In order to deduce an estimate of the total number of (de-duplicated)
relations, it remains to multiply the average number of relations per
......@@ -118,7 +120,7 @@ ave_rel_per_sq = 12.7 ## pick value ouput by las
number_of_sq = log_integral(12e9) - log_integral(1e9)
tot_rels = ave_rel_per_sq * number_of_sq
print (tot_rels)
# we obtain 6.23205306433878e9
# 6.23205306433878e9
```
This estimate (6.2G relations) can be made more precise by increasing the
number of special-q that are sampled for sieving. It is also possible to
......@@ -155,7 +157,7 @@ written (or pass `-nq 0`).
#### Cost of 2-sided sieving in the q-range [1e9,4e9]
In order to measure the cost of sieving in the special-q subrange where
sieving is used on both sides, the typical command-line is as follows:
sieving is used on both sides, the typical command line is as follows:
```shell
time $CADO_BUILD/sieve/las -poly rsa250.poly -fb1 $DATA/rsa250.fb1.gz -lim0 2147483647 -lim1 2147483647 -lpb0 36 -lpb1 37 -q0 1e9 -q1 4e9 -sqside 1 -A 33 -mfb0 72 -mfb1 111 -lambda0 2.2 -lambda1 3.2 -random-sample 1024 -t auto -bkmult 1,1l:1.25975,1s:1.5,2s:1.1 -v -bkthresh1 80000000 -adjust-strategy 2 -fbc /tmp/rsa250.fbc
......@@ -182,7 +184,7 @@ sys 70m15.469s
Then the `128m8.106=7688.1s` value must be appropriately scaled in order
to convert it into physical core-seconds. For instance, in our case,
since there are 32 physical cores and we sieved 1024 special-qs, this
gives (128*60+8.1)*32/1024=240.25 core.seconds per special-q.
gives `(128*60+8.1)*32/1024=240.25` core.seconds per special-q.
Finally, it remains to multiply by the number of special-q in this
subrange. We get (in Sagemath):
......@@ -219,7 +221,7 @@ shell script given in this repository. This launches:
files as they are produced, do the batch smoothness detection, and
produce relations.
The script takes two command-line arguments `-q0 xxx` and `-q1 xxx`,
The script takes two command line arguments `-q0 xxx` and `-q1 xxx`,
which describe the range of special-q to process. Temporary files are put
in the `/tmp` directory by default.
......@@ -276,22 +278,39 @@ we obtain about 1300 core.years for this sub-range.
## Estimating linear algebra time (coarsely)
The general description of the [RSA-240
case](../rsa240/README.md#estimating-the-linear-algebra-time-coarsely)
applies identically to RSA-250.
We reproduce here the adaptations of the command lines to the RSA-250
case. To estimate the time per iteration of a matrix with (say) 400M
rows/columns and density 250, it is possible to do the following.
```
nrows=400000000 density=250 nthreads=32 ./rsa250-linalg-0a-estimate_linalg_time_coarse_method_b.sh
```
This reports about **TODO** seconds per iteration, yielding an
anticipated cost of
`(1+n/m+64/n)*(N/64)*`**TODO**`*16*32/3600/24/365=`**TODO** core-years
for Krylov+Mksol.
## Validating the claimed sieving results
The benchmark command-lines above can be used almost as-is for
reproducing the full computation. It is just necessary to remove the
`-random-sample` option and to adjust the `-q0` and `-q1` to create many
small work units that in the end cover exactly the global q-range.
The benchmark command lines above can be used almost as is to reproduce
the full computation. It is just necessary to remove the `-random-sample`
option and to adjust the `-q0` and `-q1` to create many small work units
that in the end cover exactly the global q-range.
Since we do not expect anyone to spend again as much computing resources
to perform again exactly the same computation, we provide in the
[`rel_count`](rel_count) file the count of how many (non-unique)
[`rsa250-rel_count`](rsa250-rel_count) file the count of how many (non-unique)
relations were produced for each 100M special-q sub-range.
We can then have a visual plot of this data, as shown in
[`plot_rel_count.pdf`](plot_rel_count.pdf) where we see the drop in the
number of relations produced per special-q when changing the strategy.
The plot is very regular on the two sub-ranges.
[`rsa250-plot_rel_count.pdf`](rsa250-plot_rel_count.pdf) where we see the
drop in the number of relations produced per special-q when changing the
strategy. The plot is very regular on the two sub-ranges.
In order to validate our computation, it is possible to re-compute only
one of the sub-ranges and check that the number of relations is the one
......@@ -314,19 +333,72 @@ peak memory.
The merge step can be reproduced as follows (revision `eaeb2053d`):
```shell
$CADO_BUILD/filter/merge -out history5 -t 112 -target_density 250 -mat $DATA/purged3.gz -skip 32
$CADO_BUILD/filter/merge -out $DATA/history5 -t 112 -target_density 250 -mat $DATA/purged3.gz -skip 32
```
and took about 3 hours on the machine wurst, with a peak memory of 1500GB.
Finally the replay step can be reproduced as follows:
```shell
$CADO_BUILD/filter/replay -purged $DATA/purged3.gz -his history5 -out rsa250.matrix.250.bin
$CADO_BUILD/filter/replay -purged $DATA/purged3.gz -his $DATA/history5 -out $DATA/rsa250.matrix.250.bin
```
The matrix that we eventually selected has 404711409 rows, and an average
density of 250 coefficients per row.
## Estimating linear algebra time more precisely, and choosing parameters
Again, the general description of the [RSA-240
case](../rsa240/README.md#estimating-the-linear-algebra-more-precisely-and-choosing-parameters)
applies identically to RSA-250.
In order to bench the time per iteration for the selected matrix, we can
use the following script.
```shell
export matrix=$DATA/rsa250.matrix.250.bin
export DATA
export CADO_BUILD
export MPI
./rsa250-linalg-0b-test-few-iterations.sh
```
This reports an average of 1.34 seconds per iteration.
## Reproducing the linear algebra results
We decided to use the block Wiedemann parameters `m=1024` and `n=512`,
giving rise to `n/64=8` sequences to be computed indepedently. We used
16-node jobs.
The first part of the computation can be done with these scripts:
```shell
export matrix=$DATA/rsa250.matrix.250.bin
export DATA
export CADO_BUILD
export MPI
./rsa250-linalg-1-prep.sh
./rsa250-linalg-2-secure.sh
./rsa250-linalg-3-krylov.sh sequence=0 start=0
# [...] 14 more
./rsa250-linalg-3-krylov.sh sequence=15 start=0
```
And the rest:
```shell
export matrix=$DATA/rsa250.matrix.250.bin
export DATA
export CADO_BUILD
export MPI
./rsa240-linalg-5-acollect.sh
./rsa240-linalg-6-lingen.sh
# there is no step 7, for consistency with the discrete log case.
./rsa240-linalg-8-mksol.sh start=0
./rsa240-linalg-8-mksol.sh start=32768
./rsa240-linalg-8-mksol.sh start=65536
# ... 21 other commands of the same kind (25 in total) ...
./rsa240-linalg-8-mksol.sh start=786432
./rsa240-linalg-9-finish.sh
```
## Reproducing the characters step
Let **W** be the kernel vector computed by the linear algebra step.
......
#!/bin/bash
# This script is buggy, as of commit 8a72ccdde in cado-nfs.
# There are two problems.
# * The locally generated random matrices do not fit the claimed
# distribution.
#
# For a matrix with 300M rows/cols and expected density 200, generated
# over 8*64 threads, it seems that the "density" parameters must be
# multiplied by the ratio 5/8=0.625, i.e. set to 125, in order to obtain
# a matrix with the desired characteristics.
#
# * Furthermore, the timings of the communication step seem to be
# somewhat off compared to what we get in production runs (production
# runs achieve better performance), and we have no explanation for it.
#
for x in "$@" ; do
if [[ $x =~ ^[0-9a-zA-Z_/.-]+=[0-9a-zA-Z_/.-]+$ ]] ; then
eval $x
fi
done
for f in CADO_BUILD MPI ; do
if ! [ "${!f}" ] ; then
echo "\$$f must be defined" >&2
exit 1
fi
done
for f in krylov ; do
if ! [ -x "$CADO_BUILD/linalg/bwc/$f" ] ; then
echo "missing binary $CADO_BUILD/linalg/bwc/$f ; compile it first" >&2
exit 1
fi
done
set -ex
: ${nrows=400000000}
: ${density=250}
export DISPLAY= # safety precaution
mpi=(
$MPI/bin/mpiexec -n 16
-machinefile $OAR_NODEFILE
--map-by node
--mca plm_rsh_agent oarsh
--mca mtl ofi
--mca mtl_ofi_prov psm2
--mca btl '^openib'
)
kargs=(
m=128 n=64 prime=2
wdir=/tmp
mpi=4x4 thr=4x16
sequential_cache_build=16
ys=0..64
random_matrix=$nrows,density=$density
cpubinding="Package=>2x1 L2Cache*16=>2x16"
start=0 interval=128 end=128
)
"${mpi[@]}" $CADO_BUILD/linalg/bwc/krylov "${kargs[@]}"
#!/bin/bash
for x in "$@" ; do
if [[ $x =~ ^[0-9a-zA-Z_/.-]+=[0-9a-zA-Z_/.-]+$ ]] ; then
eval $x
fi
done
for f in CADO_BUILD DATA MPI ; do
if ! [ "${!f}" ] ; then
echo "\$$f must be defined" >&2
exit 1
fi
done
for f in random_matrix mf_scan2 dispatch krylov ; do
if ! [ -x "$CADO_BUILD/linalg/bwc/$f" ] ; then
echo "missing binary $CADO_BUILD/linalg/bwc/$f ; compile it first" >&2
exit 1
fi
done
: ${nrows=400000000}
: ${density=250}
export nthreads=32 matrixname="$DATA/test.bin"
export matrix="$matrixname"
if ! [ -f $matrixname ] ; then
export nrows
export density
"`dirname $0`"/generate_random_matrix.sh
fi
set -ex
export DISPLAY= # safety precaution
mpi=(
$MPI/bin/mpiexec -n 16
-machinefile $OAR_NODEFILE
--map-by node
--mca plm_rsh_agent oarsh
--mca mtl ofi
--mca mtl_ofi_prov psm2
--mca btl '^openib'
)
cargs=(
m=128 n=64 prime=2
mpi=4x4 thr=4x16
sequential_cache_build=16
sequential_cache_read=2
ys=0..64
wdir=$DATA
static_random_matrix="$DATA/test.bin"
cpubinding="Package=>2x1 L2Cache*16=>2x16"
)
cbase=$(basename $matrix .bin).16x64
cdir=$DATA/$cbase
cache_files=()
if [ -d "$cdir" ] ; then cache_files=(`find $cdir/ -name $cbase.*.bin`) ; fi
if [ "${#cache_files[@]}" != 1024 ] ; then
# regenerate caches. this tends to have a negative effect on the
# runtime, so we'll do _just_ that at first, and restart the full
# process afterwards.
"${mpi[@]}" $CADO_BUILD/linalg/bwc/dispatch "${cargs[@]}"
fi
"${mpi[@]}" $CADO_BUILD/linalg/bwc/krylov "${cargs[@]}" start=0 interval=128 end=128
#!/bin/bash
for x in "$@" ; do
if [[ $x =~ ^[0-9a-zA-Z_/.-]+=[0-9a-zA-Z_/.-]+$ ]] ; then
eval $x
fi
done
for f in CADO_BUILD DATA MPI matrix ; do
if ! [ "${!f}" ] ; then
echo "\$$f must be defined" >&2
exit 1
fi
done
for f in dispatch krylov ; do
if ! [ -x "$CADO_BUILD/linalg/bwc/$f" ] ; then
echo "missing binary $CADO_BUILD/linalg/bwc/$f ; compile it first" >&2
exit 1
fi
done
set -ex
export DISPLAY= # safety precaution
mpi=(
$MPI/bin/mpiexec -n 16
-machinefile $OAR_NODEFILE
--map-by node
--mca plm_rsh_agent oarsh
--mca mtl ofi
--mca mtl_ofi_prov psm2
--mca btl '^openib'
)
cargs=(
m=128 n=64 prime=2
mpi=4x4 thr=4x16
sequential_cache_build=16
sequential_cache_read=2
wdir=$DATA
static_random_matrix="$matrix"
cpubinding="Package=>2x1 L2Cache*16=>2x16"
)
cbase=$(basename $matrix .bin).16x64
cdir=$DATA/$cbase
cache_files=()
if [ -d "$cdir" ] ; then cache_files=(`find $cdir/ -name $cbase.*.bin`) ; fi
if [ "${#cache_files[@]}" != 1024 ] ; then
# regenerate caches. this tends to have a negative effect on the
# runtime, so we'll do _just_ that at first, and restart the full
# process afterwards.
for f in "${matrix%bin}"{rw,cw}".bin" ; do
if ! (cd "$DATA" ; [ -f "$f" ]) ; then echo "$f must exist" >&2 ; exit 1 ; fi
done
"${mpi[@]}" $CADO_BUILD/linalg/bwc/dispatch "${cargs[@]}" ys=0..64
fi
"${mpi[@]}" $CADO_BUILD/linalg/bwc/krylov "${cargs[@]}" ys=0..64 start=0 interval=128 end=128
#!/bin/bash
for x in "$@" ; do
if [[ $x =~ ^[0-9a-zA-Z_/.-]+=[0-9a-zA-Z_/.-]+$ ]] ; then
eval $x
fi
done
for f in CADO_BUILD matrix MPI DATA ; do
if ! [ "${!f}" ] ; then
echo "\$$f must be defined" >&2
exit 1
fi
done
for f in prep ; do
if ! [ -x "$CADO_BUILD/linalg/bwc/$f" ] ; then
echo "missing binary $CADO_BUILD/linalg/bwc/$f ; compile it first" >&2
exit 1
fi
done
set -ex
export DISPLAY= # safety precaution
mpi=(
$MPI/bin/mpiexec -n 16
-machinefile $OAR_NODEFILE
--map-by node
--mca plm_rsh_agent oarsh
--mca mtl ofi
--mca mtl_ofi_prov psm2
--mca btl '^openib'
)
cargs=(
m=1024 n=512 prime=2
mpi=4x4 thr=4x16
sequential_cache_build=16
sequential_cache_read=2
wdir=$DATA
matrix="$matrix"
cpubinding="Package=>2x1 L2Cache*16=>2x16"
)
cbase=$(basename $matrix .bin).16x64
cdir=$DATA/$cbase
cache_files=()
if [ -d "$cdir" ] ; then cache_files=(`find $cdir/ -name $cbase.*.bin`) ; fi
if [ "${#cache_files[@]}" != 1024 ] ; then
# regenerate caches. this tends to have a negative effect on the
# runtime, so we'll do _just_ that at first, and restart the full
# process afterwards.
for f in "${matrix%bin}"{rw,cw}".bin" ; do
if ! (cd "$DATA" ; [ -f "$f" ]) ; then echo "$f must exist" >&2 ; exit 1 ; fi
done
"${mpi[@]}" $CADO_BUILD/linalg/bwc/dispatch "${cargs[@]}" ys=0..64
fi
"${mpi[@]}" $CADO_BUILD/linalg/bwc/prep "${cargs[@]}"
#!/bin/bash
for x in "$@" ; do
if [[ $x =~ ^[0-9a-zA-Z_/.-]+=[0-9a-zA-Z_/.-]+$ ]] ; then
eval $x
fi
done
for f in CADO_BUILD matrix MPI DATA ; do
if ! [ "${!f}" ] ; then
echo "\$$f must be defined" >&2
exit 1
fi
done
for f in secure ; do
if ! [ -x "$CADO_BUILD/linalg/bwc/$f" ] ; then
echo "missing binary $CADO_BUILD/linalg/bwc/$f ; compile it first" >&2
exit 1
fi
done
set -ex
export DISPLAY= # safety precaution
mpi=(
$MPI/bin/mpiexec -n 16
-machinefile $OAR_NODEFILE
--map-by node
--mca plm_rsh_agent oarsh
--mca mtl ofi
--mca mtl_ofi_prov psm2
--mca btl '^openib'
)
cargs=(
m=1024 n=512 prime=2
mpi=4x4 thr=4x16
sequential_cache_build=16
sequential_cache_read=2
wdir=$DATA
matrix="$matrix"
cpubinding="Package=>2x1 L2Cache*16=>2x16"
)
# check at various distances. Also include checks that are shifted by a
# constant number of iterations (64 here)
sargs=(
interval=2048
checkpoint_precious=32768
check_stops=64,2048,2112,8192,32768,32832
)
cbase=$(basename $matrix .bin).16x64
cdir=$DATA/$cbase