Fix overflow in flop computation in testings
With big matrices, testing_zlacpy reports negative Gflop/s:
% ./testing/chameleon_stesting -o lacpy --n 3200:60000:6400 -H
Id Function threads gpus P Q mtxfmt nb uplo m n lda ldb seedA tsub time gflops
0 slacpy 36 0 1 1 0 320 Upper 3200 3200 3200 3200 846930886 0.000000e+00 1.656305e-03 1.236874e+01
1 slacpy 36 0 1 1 0 320 Upper 9600 9600 9600 9600 1681692777 0.000000e+00 2.398928e-02 7.684232e+00
2 slacpy 36 0 1 1 0 320 Upper 16000 16000 16000 16000 1714636915 0.000000e+00 5.037857e-02 1.016369e+01
3 slacpy 36 0 1 1 0 320 Upper 22400 22400 22400 22400 1957747793 0.000000e+00 1.102050e-01 9.106347e+00
4 slacpy 36 0 1 1 0 320 Upper 28800 28800 28800 28800 424238335 0.000000e+00 2.285272e-01 7.259258e+00
5 slacpy 36 0 1 1 0 320 Upper 35200 35200 35200 35200 719885386 0.000000e+00 2.889825e-01 8.575434e+00
6 slacpy 36 0 1 1 0 320 Upper 41600 41600 41600 41600 1649760492 0.000000e+00 4.286759e-01 8.074172e+00
7 slacpy 36 0 1 1 0 320 Upper 48000 48000 48000 48000 596516649 0.000000e+00 5.470726e-01 -7.278446e+00
8 slacpy 36 0 1 1 0 320 Upper 54400 54400 54400 54400 1189641421 0.000000e+00 6.930918e-01 -3.853899e+00
The problem comes from an overflow in the N*(N+1)
multiplication here.
I fixed this with the following MR. Is it the best way to solve this, so should I apply the change to all other flops_*
functions ?