Data eviction in STARPU_CPU_RAM with accelerators and without disks
Is your feature request related to a problem? Please describe.
Nowadays, it is not rare to find computational nodes with an amount of CPU RAM that is lower than the aggregated GPU RAM. In some class of algorithms, being able to allocate the maximum possible amount of data is critical to get a significant fraction of peak performance.
However, these algorithms might still need to perform some operations on the CPU, and possibly on the overall amount of data. In this case, it seems we currently face a problem: data in StarPU are evictable from STARPU_CPU_RAM only if there is a disk (for out-of-core). Hence, in case the total amount of memory required to perform CPU tasks exceeds the amount of CPU RAM, StarPU enters into a memory reclaiming phase and tries to evict a memory bloc which is not evictable. This results in a deadlock.
Describe the solution you'd like
We would like to be able to evict memory into STARPU_CPU_RAM when GPUs are available.
Describe alternatives you've considered
Managing a handmade set of temporary data/buffers on the CPU to pipeline the execution of CPU tasks is certainly possible but implies additional memory copies.
Additional context
We work into some well known linear algebra application using StarPU.