bwc: multi_matrix crashes in prep
Working on the double_matrix branch will ultimately require that bwc work with several (two) matrices. Currently, this fails with:
charles@dell:~/cado-nfs/work$ ../build/dell/linalg/bwc/prep multi_matrix=1 matrix=$PWD/L.bin,$PWD/R.bin wdir=$PWD/bwc mn=64 balancing_options=reorder=columns thr=2x2
[...]
Creating balancing file L.2x2/L.2x2.bin
# Warning: parameter reorder is checked by this program but is undocumented.
Warning. More columns than rows. There could be bugs.
/home/charles/cado-nfs/work/L.bin: 8868 rows 16446 cols (7578 extra cols) weight 37831
Padding to a square matrix
Padding to 8868+7580=16448 rows which is 2 blocks of 2*4112=8224 rows
read /home/charles/cado-nfs/work/L.rw.bin in 0.0 s (2755.2 MB / s)
Padding to 16446+2=16448 columns which is 2 blocks of 2*4112=8224 columns
read /home/charles/cado-nfs/work/L.cw.bin in 0.0 s (5518.4 MB / s)
16446 columns ; avg 2.3 sdev 1.9 [scan time 0.0 s]
sort time 0.0 s
heap fill time 0.0
column slice 0, span=8224, weight=18915
column slice 1, span=8224, weight=18916
Writing balancing data to L.2x2/L.2x2.bin
Creating balancing file R.2x2/R.2x2.bin
# Warning: parameter reorder is checked by this program but is undocumented.
/home/charles/cado-nfs/work/R.bin: 16446 rows 8676 cols (7770 extra rows) weight 201362
Padding to a square matrix
Padding to 16446+2=16448 rows which is 2 blocks of 2*4112=8224 rows
read /home/charles/cado-nfs/work/R.rw.bin in 0.0 s (5016.7 MB / s)
Padding to 8676+7772=16448 columns which is 2 blocks of 2*4112=8224 columns
read /home/charles/cado-nfs/work/R.cw.bin in 0.0 s (4281.2 MB / s)
8676 columns ; avg 23.2 sdev 63.8 [scan time 0.0 s]
sort time 0.0 s
heap fill time 0.0
column slice 0, span=8224, weight=100681
column slice 1, span=8224, weight=100681
Writing balancing data to R.2x2/R.2x2.bin
Now trying to load matrix cache files
J0 dell done reading (result=0)
Matrix dispatching starts
Beginning balancing with 1 readers for file /home/charles/cado-nfs/work/L.bin
Job 0 is reader number 0
Job 0 (reader number 0) reads rows 0 to 8868 and expects 182.42 kB
pass 1, J0 (reader 0/1): 16.00 kB in 0.0s, 151.74 MB/s
[...]
pass 2, J0 (reader 0/1): 182.42 kB in 0.0s, 162.68 MB/s (done)
Matrix: total 8868 rows 16446 cols 37831 coeffs
Now trying to load matrix cache files
J0 dell done reading (result=0)
Matrix dispatching starts
Beginning balancing with 1 readers for file /home/charles/cado-nfs/work/R.bin
Job 0 is reader number 0
Job 0 (reader number 0) reads rows 0 to 16446 and expects 0.83 MB
pass 1, J0 (reader 0/1): 16.02 kB in 0.0s, 270.02 MB/s
[...]
pass 2, J0 (reader 0/1): 0.83 MB in 0.0s, 178.57 MB/s (done)
Matrix: total 16446 rows 8676 cols 201362 coeffs
Matrix rank is at most 16446 (based on zero columns and rows encountered)
// Random generator seeded with 1690702356
// Generating new x,y vector pair (trial # 0)
Creating random vector V0-64.0... done [69.28 kB in 0.00s, 0.80 GB/s]
Loading V0-64.0 ... done [69.28 kB in 0.00s, 310.48 MB/s]
// generated V0-64.0 (trial # 0)
code BUG() : condition w->siblings failed in matmul_top_mul_cpu at /home/charles/cado-nfs/linalg/bwc/matmul_top.c:2250 -- Abort
code BUG() : condition w->siblings failed in matmul_top_mul_cpu at /home/charles/cado-nfs/linalg/bwc/matmul_top.c:2250 -- Abort
code BUG() : condition w->siblings failed in matmul_top_mul_cpu at /home/charles/cado-nfs/linalg/bwc/matmul_top.c:2250 -- Abort
*** Error: caught signal "Aborted"
code BUG() : condition w->siblings failed in matmul_top_mul_cpu at /home/charles/cado-nfs/linalg/bwc/matmul_top.c:2250 -- Abort
*** Error: caught signal "Aborted"
*** Error: caught signal "Aborted"
*** Error: caught signal "Aborted"
======= Backtrace: =========
../build/dell/linalg/bwc/prep(+0x2a0464) [0x563eb19d6464]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bcf0) [0x7f4ae083bcf0]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x11b) [0x7f4ae089226b]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x16) [0x7f4ae083bc46]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd7) [0x7f4ae08227fc]
../build/dell/linalg/bwc/prep(matmul_top_mul_cpu+0x10d) [0x563eb179292d]
../build/dell/linalg/bwc/prep(matmul_top_mul+0x91) [0x563eb1792bc1]
../build/dell/linalg/bwc/prep(_Z9prep_progP20parallelizing_info_sP12param_list_sPv+0x476) [0x563eb1786586]
../build/dell/linalg/bwc/prep(pi_go_helper_func+0x7f) [0x563eb178973f]
/lib/x86_64-linux-gnu/libc.so.6(+0x90402) [0x7f4ae0890402]
/lib/x86_64-linux-gnu/libc.so.6(+0x11f590) [0x7f4ae091f590]
Aborted (core dumped)
The matrices are attached : R.rw.bin, R.dense.rw.bin, R.dense.cw.bin, R.dense.bin, R.cw.bin, R.bin, L.rw.bin, L.cw.bin, L.bin.
Steps to reproduce the matrices:
- Checkout the
double_matrix
branch and compile ./cado-nfs.py 90377629292003121684002147101760858109247336549001090677693 workdir=$PWD/foo
-
build/``hostname``/filter/replay-dblemat -purged $PWD/foo/c60.purged.gz -his $PWD/foo/c60.history.gz -outL $PWD/foo/L.bin -outR $PWD/foo/R.bin
(this is why thedouble_matrix
branch is required)