Wrong computations with GPUs in testing spotrf
Hello,
I tried to use GPU on dalton's william0. When using the GPU, I randomly get -1 as gflops results. I found it means something wrong happened during the operation, but... what ?
Output with one GPU:
% STARPU_SCHED=dmda STARPU_NCPU=14 ~/chameleon/build/testing/chameleon_stesting -o potrf -n 4800:50000:3200 -H --gpus 1 130
Id Function threads gpus P Q nb uplo n lda seedA time gflops
0 spotrf 14 1 1 1 320 Upper 4800 4800 846930886 4.656610e-02 7.918963e+02
1 spotrf 14 1 1 1 320 Upper 8000 8000 1681692777 1.617732e-01 1.055173e+03
2 spotrf 14 1 1 1 320 Upper 11200 11200 1714636915 3.946546e-01 1.186790e+03
3 spotrf 14 1 1 1 320 Upper 14400 14400 1957747793 7.997773e-01 1.244636e+03
4 spotrf 14 1 1 1 320 Upper 17600 17600 424238335 1.437493e+00 1.264293e+03
5 spotrf 14 1 1 1 320 Upper 20800 20800 719885386 2.356611e+00 1.272952e+03
6 spotrf 14 1 1 1 320 Upper 24000 24000 1649760492 3.603367e+00 1.278884e+03
7 spotrf 14 1 1 1 320 Upper 27200 27200 596516649 5.235331e+00 -1.000000e+00
8 spotrf 14 1 1 1 320 Upper 30400 30400 1189641421 7.324176e+00 -1.000000e+00
9 spotrf 14 1 1 1 320 Upper 33600 33600 1025202362 9.892633e+00 1.278215e+03
10 spotrf 14 1 1 1 320 Upper 36800 36800 1350490027 1.295008e+01 1.282825e+03
11 spotrf 14 1 1 1 320 Upper 40000 40000 783368690 1.668555e+01 1.278599e+03
12 spotrf 14 1 1 1 320 Upper 43200 43200 1102520059 2.103423e+01 -1.000000e+00
Output without GPU:
% STARPU_SCHED=dmda STARPU_NCPU=14 ~/chameleon/build/testing/chameleon_stesting -o potrf -n 4800:50000:3200 -H
Id Function threads gpus P Q nb uplo n lda seedA time gflops
0 spotrf 14 0 1 1 320 Upper 4800 4800 846930886 1.087851e-01 3.389759e+02
1 spotrf 14 0 1 1 320 Upper 8000 8000 1681692777 4.046456e-01 4.218474e+02
2 spotrf 14 0 1 1 320 Upper 11200 11200 1714636915 1.068933e+00 4.381679e+02
3 spotrf 14 0 1 1 320 Upper 14400 14400 1957747793 2.255813e+00 4.412740e+02
4 spotrf 14 0 1 1 320 Upper 17600 17600 424238335 4.075885e+00 4.458942e+02
5 spotrf 14 0 1 1 320 Upper 20800 20800 719885386 6.708160e+00 4.471947e+02
6 spotrf 14 0 1 1 320 Upper 24000 24000 1649760492 1.024576e+01 4.497753e+02
7 spotrf 14 0 1 1 320 Upper 27200 27200 596516649 1.493925e+01 4.490356e+02
8 spotrf 14 0 1 1 320 Upper 30400 30400 1189641421 2.079964e+01 4.502617e+02
9 spotrf 14 0 1 1 320 Upper 33600 33600 1025202362 2.808611e+01 4.502196e+02
10 spotrf 14 0 1 1 320 Upper 36800 36800 1350490027 3.714362e+01 4.472555e+02
11 spotrf 14 0 1 1 320 Upper 40000 40000 783368690 4.779727e+01 4.463462e+02
12 spotrf 14 0 1 1 320 Upper 43200 43200 1102520059 5.993126e+01 4.484269e+02
I tried with GDB to get the value of hres
when the result is wrong:
Id Function threads gpus P Q nb uplo n lda seedA time gflops
0 spotrf 14 1 1 1 320 Upper 4800 4800 846930886 3.694001e-02 9.982542e+02
1 spotrf 14 1 1 1 320 Upper 8000 8000 1681692777 1.220934e-01 1.398099e+03
2 spotrf 14 1 1 1 320 Upper 11200 11200 1714636915 2.988072e-01 1.567472e+03
3 spotrf 14 1 1 1 320 Upper 14400 14400 1957747793 6.008164e-01 1.656798e+03
4 spotrf 14 1 1 1 320 Upper 17600 17600 424238335 1.091363e+00 1.665270e+03
5 spotrf 14 1 1 1 320 Upper 20800 20800 719885386 7.253219e+00 4.135893e+02
6 spotrf 14 1 1 1 320 Upper 24000 24000 1649760492 6.708895e+00 6.868922e+02
7 spotrf 14 1 1 1 320 Upper 27200 27200 596516649 8.508491e+00 7.884186e+02
Thread 1 "chameleon_stest" hit Breakpoint 2, CHAMELEON_spotrf_Tile (uplo=ChamUpper, A=0x55555702ae70) at /home/pswartva/chameleon/build/compute/spotrf.c:211
211 chameleon_sequence_destroy( chamctxt, sequence );
(gdb) p status
$4 = 23041
(gdb) c
Continuing.
8 spotrf 14 1 1 1 320 Upper 30400 30400 1189641421 6.025625e+01 -1.000000e+00
^C
Thread 1 "chameleon_stest" received signal SIGINT, Interrupt.
Documentation of CHAMELEON_zpotrf_Tile
states:
* @retval >0 if i, the leading minor of order i of A is not positive definite, so the
* factorization could not be completed, and the solution has not been computed.
But the matrix is generated to be definite positive, isn't it ? So, is there a bug somewhere ? It seems to occur only when using GPUs.
StarPU configuration:
../configure --prefix=/home/pswartva/starpu-build/ --disable-mpi --disable-opencl --disable-fortran --disable-build-doc --enable-blas-lib=mkl --with-mkl-cflags=-I/usr/include/mkl --with-mkl-ldflags="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl"
Chameleon configuration:
cmake .. -DCHAMELEON_USE_MPI=OFF -DCHAMELEON_ENABLE_EXAMPLE=OFF -DCHAMELEON_ENABLE_TESTING=ON -DBLA_VENDOR=Intel10_64lp_seq -DCHAMELEON_USE_CUDA=ON -DCMAKE_BUILD_TYPE=Debug