Bugfix - Load imbalance with QR/LQ algorithms
-
Fix the data distribution of the D matrix used by GPU kernels, and StarPU to break down the anti-dependency between the upper and lower part of the diagonal tiles in QR/LQ algorithms. This D matrix was stored only on the process 0 and was creating memory and computation imbalance.
-
Fix the workspace sizes with StarPU which was 4 times larger than expected.