@@ -419,9 +419,9 @@ have been easily ported to StarPU by simply using <tt>starpu_insert_task</tt>.
<p>
Take the vector example again, and add partitioning support to it, using the
matrix-matrix multiplication as an example. Here we will use the
<ahref="http://runtime.bordeaux.inria.fr/StarPU/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
<ahref="/files/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
@@ -443,14 +443,14 @@ This is based on StarPU's documentation
<p>
We have explained how StarPU can overlap computation and data transfers
thanks to DMAs. This is however only possible when CUDA has control over the
application buffers. The application should thus use <ahref="http://runtime.bordeaux.inria.fr/StarPU/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
application buffers. The application should thus use <ahref="/files/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
when allocating its buffer, to permit asynchronous DMAs from and to
it.
</p>
<p>
Take the vector example again, and fix the allocation, to make it use
@@ -463,9 +463,9 @@ have been easily ported to StarPU by simply using <tt>starpu_insert_task</tt>.
<p>
Take the vector example again, and add partitioning support to it, using the
matrix-matrix multiplication as an example. Here we will use the
<ahref="http://runtime.bordeaux.inria.fr/StarPU/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
<ahref="/files/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
@@ -487,14 +487,14 @@ This is based on StarPU's documentation
<p>
We have explained how StarPU can overlap computation and data transfers
thanks to DMAs. This is however only possible when CUDA has control over the
application buffers. The application should thus use <ahref="http://runtime.bordeaux.inria.fr/StarPU/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
application buffers. The application should thus use <ahref="/files/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
when allocating its buffer, to permit asynchronous DMAs from and to
it.
</p>
<p>
Take the vector example again, and fix the allocation, to make it use
@@ -467,9 +467,9 @@ have been easily ported to StarPU by simply using <tt>starpu_insert_task</tt>.
<p>
Take the vector example again, and add partitioning support to it, using the
matrix-matrix multiplication as an example. Here we will use the
<ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
<ahref="/files/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
@@ -491,14 +491,14 @@ This is based on StarPU's documentation
<p>
We have explained how StarPU can overlap computation and data transfers
thanks to DMAs. This is however only possible when CUDA has control over the
application buffers. The application should thus use <ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
application buffers. The application should thus use <ahref="/files/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
when allocating its buffer, to permit asynchronous DMAs from and to
it.
</p>
<p>
Take the vector example again, and fix the allocation, to make it use
@@ -405,9 +405,9 @@ have been easily ported to StarPU by simply using <tt>starpu_insert_task</tt>.
<p>
Take the vector example again, and add partitioning support to it, using the
matrix-matrix multiplication as an example. Here we will use the
<ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
<ahref="/files/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
@@ -429,14 +429,14 @@ This is based on StarPU's documentation
<p>
We have explained how StarPU can overlap computation and data transfers
thanks to DMAs. This is however only possible when CUDA has control over the
application buffers. The application should thus use <ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
application buffers. The application should thus use <ahref="/files/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
when allocating its buffer, to permit asynchronous DMAs from and to
it.
</p>
<p>
Take the vector example again, and fix the allocation, to make it use
@@ -413,9 +413,9 @@ have been easily ported to StarPU by simply using <tt>starpu_insert_task</tt>.
<p>
Take the vector example again, and add partitioning support to it, using the
matrix-matrix multiplication as an example. Here we will use the
<ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
<ahref="/files/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
@@ -437,14 +437,14 @@ This is based on StarPU's documentation
<p>
We have explained how StarPU can overlap computation and data transfers
thanks to DMAs. This is however only possible when CUDA has control over the
application buffers. The application should thus use <ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
application buffers. The application should thus use <ahref="/files/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
when allocating its buffer, to permit asynchronous DMAs from and to
it.
</p>
<p>
Take the vector example again, and fix the allocation, to make it use
@@ -531,9 +531,9 @@ have been easily ported to StarPU by simply using <tt>starpu_insert_task</tt>.
<p>
Take the vector example again, and add partitioning support to it, using the
matrix-matrix multiplication as an example. Here we will use the
<ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
<ahref="/files/doc/html/group__API__Data__Partition.html#ga212189d3b83dfa4e225609b5f2bf8461"><tt>starpu_vector_filter_block()</tt></a> filter function. You can see the list of
@@ -577,14 +577,14 @@ This is based on StarPU's documentation
<p>
We have explained how StarPU can overlap computation and data transfers
thanks to DMAs. This is however only possible when CUDA has control over the
application buffers. The application should thus use <ahref="http://starpu.gforge.inria.fr/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
application buffers. The application should thus use <ahref="/files/doc/html/group__API__Standard__Memory__Library.html#ga49603eaea3b05e8ced9ba1bd873070c3"><tt>starpu_malloc()</tt></a>
when allocating its buffer, to permit asynchronous DMAs from and to
it.
</p>
<p>
Take the vector example again, and fix the allocation, to make it use