Use asynchronous data acquire/release in data_getoncpu
With StarPU, the call to RUNTIME_desc_getoncpu()
could generate a big
data transfer block, visible at the end of
this trace
for example.
This can be fixed by using the asynchronous version of data acquire/release
functions, and calling RUNTIME_desc_getoncpu()
before morse_sequence_wait()
.
diff --git a/compute/zpotrf.c b/compute/zpotrf.c
index e69c04b..a5772ea 100644
--- a/compute/zpotrf.c
+++ b/compute/zpotrf.c
@@ -211,8 +211,8 @@ int MORSE_zpotrf_Tile(MORSE_enum uplo, MORSE_desc_t *A)
}
morse_sequence_create(morse, &sequence);
MORSE_zpotrf_Tile_Async(uplo, A, sequence, &request);
- morse_sequence_wait(morse, sequence);
RUNTIME_desc_getoncpu(A);
+ morse_sequence_wait(morse, sequence);
status = sequence->status;
morse_sequence_destroy(morse, sequence);
diff --git a/runtime/starpu/control/runtime_descriptor.c b/runtime/starpu/control/runtime_descriptor.c
index b148888..adf3323 100644
--- a/runtime/starpu/control/runtime_descriptor.c
+++ b/runtime/starpu/control/runtime_descriptor.c
@@ -312,8 +312,11 @@ int RUNTIME_desc_getoncpu( MORSE_desc_t *desc )
continue;
}
- starpu_data_acquire(*handle, STARPU_R);
- starpu_data_release(*handle);
+ /* Use the async acquire/release version
+ * this only works because we know that starpu_data_handle_t
+ * is actually a pointer */
+ starpu_data_acquire_cb(*handle, STARPU_R,
+ starpu_data_release, *handle);
handle++;
}
This needs to be done for every function in compute/
,
and potentially for other runtimes.
With this fix the transfer block vanishes, see this trace.