Starpu/alloc on the fly (!140) · Merge requests · solverstack / Chameleon · GitLab

Mentions légales du service

Mathieu Faverge requested to merge faverge/chameleon:starpu/alloc_on_the_fly into master Jan 30, 2019

Modify the way the workspaces are allocated to be able to allocate them on the fly.

The objectives are:

Allocate less memory in QR like algorithms, as only the useful T tiles are allocated.
Be more asynchronous in some algorithm as QR again, or norms by avoiding the required sequence_wait at the end of the call before freeing the allocated workspaces. This is also used in the upcoming SUMMA algorithms.

The changes are:

switch geadd to axpy in norm computations as they may be optimized with mkl
switch workspaces from global allocation to tile allocation
update QR kernels that generating the T tiles to set it to 0 first. This can not be done through global memset anymore, and to avoid an complete allocation of the matrix, this is moved in the codelets to initialized only the touched tiles.

Edited Jan 31, 2019 by Mathieu Faverge