Starpu/alloc on the fly
All threads resolved!
All threads resolved!
Modify the way the workspaces are allocated to be able to allocate them on the fly.
The objectives are:
- Allocate less memory in QR like algorithms, as only the useful T tiles are allocated.
- Be more asynchronous in some algorithm as QR again, or norms by avoiding the required sequence_wait at the end of the call before freeing the allocated workspaces. This is also used in the upcoming SUMMA algorithms.
The changes are:
- switch geadd to axpy in norm computations as they may be optimized with mkl
- switch workspaces from global allocation to tile allocation
- update QR kernels that generating the T tiles to set it to 0 first. This can not be done through global memset anymore, and to avoid an complete allocation of the matrix, this is moved in the codelets to initialized only the touched tiles.
Edited by Mathieu Faverge
Merge request reports
Activity
Filter activity
changed milestone to %Chameleon 1.0.0
mentioned in issue #34 (closed)
- Resolved by Mathieu Faverge
Thanks a lot @faverge for noticing this!
Indeed the dependencies length were very wrong, I thought I caught them all when rebasing my original PR... Thanks for fixing them!
- Resolved by Mathieu Faverge
added 1 commit
enabled an automatic merge when the pipeline for c100ef0f succeeds
mentioned in commit fa6d78a3
Please register or sign in to reply