Mentions légales du service

Skip to content

Weird behavior when using StarPU

@fpruvost and @ltaief reported that the actual running time of chameleon timings were much slower that what was printed by the output when using StarPU. the problem does not appear when using Quark. (@agullo, @thibault)

With StarPU and 15 threads on a 32 cores architecture from KAUST:

$ time numactl --interleave=all ./time_dpotrf_tile --n_range=10000:10000:10000 --k=10000 --nb=320 --threads=15 --nowarmup
Profiling throught FxT has not been enabled in StarPU runtime (configure StarPU with --with-fxt)
#
# CHAMELEON 0.9.1, ./time_dpotrf_tile
# Nb threads: 15
# Nb GPUs:    0
# NB:         320
# IB:         32
# eps:        1.110223e-16
#
#     M       N  K/NRHS   seconds   Gflop/s Deviation
  10000   10000   10000     0.778    428.57 +-   0.00  

real	0m0.937s
user	0m12.296s
sys	0m0.426s

And with 31 threads:

$ time numactl --interleave=all ./time_dpotrf_tile --n_range=10000:10000:10000 --k=10000 --nb=320 --threads=31 --nowarmup
Profiling throught FxT has not been enabled in StarPU runtime (configure StarPU with --with-fxt)
#
# CHAMELEON 0.9.1, ./time_dpotrf_tile
# Nb threads: 31
# Nb GPUs:    0
# NB:         320
# IB:         32
# eps:        1.110223e-16
#
#     M       N  K/NRHS   seconds   Gflop/s Deviation
  10000   10000   10000     0.462    721.70 +-   0.00  

real	0m5.742s
user	2m53.004s
sys	0m3.008s

While when using Quark (and Plasma, not within chameleon but the behavior is similar)

time numactl --interleave=all ./time_dgemm_tile --n_range=10000:10000:10000 --k=10000 --nb=320 --threads=31 --nowarmup
#
# PLASMA 2.8.0, ./time_dgemm_tile
# Nb threads: 31
# NB:         320
# IB:         32
# eps:        1.110223e-16
#
#     M       N  K/NRHS   seconds   Gflop/s Deviation
  10000   10000   10000     2.282    876.24      0.00

real	0m2.516s
user	1m8.134s
sys	0m1.449s