Reuse performance model: impossible to calibrate ?plgsy kernels
Dear Chameleon Developers,
I am currently experimenting troubles with performance model calibration when targeting CUDA GPUs (A100) with StarPU runtime. In my runs, I use parallel workers.
It seems it is no longer possible to reuse performance models (e.g. with DMDA) without performing StarPU calibration. Indeed, performance measurements for ?plgsy kernels are not found from one run to another. Calibration is then forced for the run:
[...][__starpu_history_based_job_expected_perf] Warning: model dplgsy is not calibrated enough for cpu0_impl0 (Comb6) size 31490048 footprint cb131111 (only 0 measurements), forcing calibration for this run. Use the STARPU_CALIBRATE environment variable to control this. You probably need to run again to continue calibrating the model, until this warning disappears.
This calibration is forced even if STARPU_CALIBRATE is set to 0. No warning is printed for the other codelets (?getrf_nopiv, ?gemm and so on).
I am not able to reproduce the problem if I do not make use of parallel workers (hope it helps).
Best regards,