We have explained how StarPU can overlap computation and data transfers
thanks to DMAs. This is however only possible when CUDA has control over the
application buffers. The application should thus use <tt>starpu_malloc</tt>
application buffers. The application should thus use <a href=""><tt>starpu_malloc()</tt></a>
when allocating its buffer, to permit asynchronous DMAs from and to
Take the vector example again, and fix the allocation, to make it use
<a href=""><tt>starpu_malloc()</tt></a>.
