Chameleon issueshttps://gitlab.inria.fr/solverstack/chameleon/-/issues2022-11-11T10:00:42+01:00https://gitlab.inria.fr/solverstack/chameleon/-/issues/94Restore nobigmat option?2022-11-11T10:00:42+01:00THIBAULT Samuelsamuel.thibault@inria.frRestore nobigmat option?When running with StarPU and StarPU's NUMA support enabled, allocating the whole matrix in one chunk will not fly if it doesn't fit NUMA node zero. The nobigmat option of the previous testing infrastructure was useful to let StarPU alloc...When running with StarPU and StarPU's NUMA support enabled, allocating the whole matrix in one chunk will not fly if it doesn't fit NUMA node zero. The nobigmat option of the previous testing infrastructure was useful to let StarPU allocate on the fly, by passing CHAMELEON_MAT_ALLOC_TILE to CHAMELEON_Desc_Create. Could we restore this option?
@pswartvahttps://gitlab.inria.fr/solverstack/chameleon/-/issues/105Check pause / resume2022-02-17T09:16:59+01:00Mathieu FavergeCheck pause / resumeCheck that the runtime system is not paused when waiting for the end of a sequence.Check that the runtime system is not paused when waiting for the end of a sequence.Chameleon 1.2.0Mathieu FavergeMathieu Favergehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/36Intrioduce QR trees in TPGQRT and TPQRT algorithms for QDWH2020-12-02T10:26:07+01:00Mathieu FavergeIntrioduce QR trees in TPGQRT and TPQRT algorithms for QDWHEverything is in the title, we should first at the parameterized version of those algorithm, so we can work on dedicated trees for those operations.Everything is in the title, we should first at the parameterized version of those algorithm, so we can work on dedicated trees for those operations.Chameleon 1.1.0https://gitlab.inria.fr/solverstack/chameleon/-/issues/96Example in --help option2020-03-13T11:44:45+01:00Ghost UserExample in --help optionHello,
You should add the README example ./testing/chameleon_stesting -H -o gemm -t 2 -m 2000 -n 2000 -k 2000 at the end of the --help output as an exampleHello,
You should add the README example ./testing/chameleon_stesting -H -o gemm -t 2 -m 2000 -n 2000 -k 2000 at the end of the --help output as an examplehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/87Default number of threads is 1 in new testing2020-01-10T16:03:52+01:00Philippe SWARTVAGHERDefault number of threads is 1 in new testingWhen running `mpirun -n 2 -nodelist jack0,jack1 -DSTARPU_FXT_TRACE=1 -DSTARPU_FXT_PREFIX=$(pwd)/ ~/chameleon/build/new-testing/snew-testing -o potrf -H`, I get the following output:
```
# jack0: WARNING- InfinibandVerbs: device = mlx4_0;...When running `mpirun -n 2 -nodelist jack0,jack1 -DSTARPU_FXT_TRACE=1 -DSTARPU_FXT_PREFIX=$(pwd)/ ~/chameleon/build/new-testing/snew-testing -o potrf -H`, I get the following output:
```
# jack0: WARNING- InfinibandVerbs: device = mlx4_0; port 2 is not active.
# jack1: WARNING- InfinibandVerbs: device = mlx4_0; port 2 is not active.
[starpu][starpu_initialize] Warning: StarPU was configured with --with-fxt, which slows down a bit, limits scalability and makes worker initialization sequential
[starpu][starpu_initialize] Warning: StarPU was configured with --with-fxt, which slows down a bit, limits scalability and makes worker initialization sequential
# pioman: WARNING- Ignoring call to piom_ltask_set_bound_thread_indexes as PIOM_DEDICATED_WAIT=0.
Id Function threads gpus P Q nb uplo n lda seedA time gflops
# pioman: WARNING- Ignoring call to piom_ltask_set_bound_thread_indexes as PIOM_DEDICATED_WAIT=0.
0 spotrf 1 0 1 2 320 Upper 1000 1000 1804289383 1.952684e-02 1.709614e+01
Connection to jack1 closed.
```
The default number of used threads is 1. In previous timing binaries, it was by default the number of workers allowed by StarPU. If I precise `-t [x]`, the provided number of threads is well used.
Is it a bug ?Chameleon 1.0.0Mathieu FavergeMathieu Favergehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/34Improve workspace management2019-03-01T00:48:10+01:00Mathieu FavergeImprove workspace managementWorkspaces as TT,TS matrices could have a lower memory footprint, if we change the descriptors in the case of StarPU for descriptor without data, that will be allocated at first use.Workspaces as TT,TS matrices could have a lower memory footprint, if we change the descriptors in the case of StarPU for descriptor without data, that will be allocated at first use.Chameleon 0.9.2https://gitlab.inria.fr/solverstack/chameleon/-/issues/59Use asynchronous data acquire/release in data_getoncpu2018-02-02T21:19:33+01:00VILLEVEYGOUX LéoUse asynchronous data acquire/release in data_getoncpuWith StarPU, the call to `RUNTIME_desc_getoncpu()` could generate a big
data transfer block, visible at the end of
[this trace](/uploads/0dfea6830a5fb35ac7305ba386021786/before.paje.trace)
for example.
This can be fixed by using the as...With StarPU, the call to `RUNTIME_desc_getoncpu()` could generate a big
data transfer block, visible at the end of
[this trace](/uploads/0dfea6830a5fb35ac7305ba386021786/before.paje.trace)
for example.
This can be fixed by using the asynchronous version of data acquire/release
functions, and calling `RUNTIME_desc_getoncpu()` before `morse_sequence_wait()`.
```diff
diff --git a/compute/zpotrf.c b/compute/zpotrf.c
index e69c04b..a5772ea 100644
--- a/compute/zpotrf.c
+++ b/compute/zpotrf.c
@@ -211,8 +211,8 @@ int MORSE_zpotrf_Tile(MORSE_enum uplo, MORSE_desc_t *A)
}
morse_sequence_create(morse, &sequence);
MORSE_zpotrf_Tile_Async(uplo, A, sequence, &request);
- morse_sequence_wait(morse, sequence);
RUNTIME_desc_getoncpu(A);
+ morse_sequence_wait(morse, sequence);
status = sequence->status;
morse_sequence_destroy(morse, sequence);
diff --git a/runtime/starpu/control/runtime_descriptor.c b/runtime/starpu/control/runtime_descriptor.c
index b148888..adf3323 100644
--- a/runtime/starpu/control/runtime_descriptor.c
+++ b/runtime/starpu/control/runtime_descriptor.c
@@ -312,8 +312,11 @@ int RUNTIME_desc_getoncpu( MORSE_desc_t *desc )
continue;
}
- starpu_data_acquire(*handle, STARPU_R);
- starpu_data_release(*handle);
+ /* Use the async acquire/release version
+ * this only works because we know that starpu_data_handle_t
+ * is actually a pointer */
+ starpu_data_acquire_cb(*handle, STARPU_R,
+ starpu_data_release, *handle);
handle++;
}
```
This needs to be done for every function in `compute/`,
and potentially for other runtimes.
With this fix the transfer block vanishes, see
[this trace](/uploads/8867ab15d8d1c7b36ebce248c26de909/after.paje.trace).Mathieu FavergeMathieu Favergehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/33Diagonal copy support2017-07-04T11:22:53+02:00Mathieu FavergeDiagonal copy supportAll data descriptor for temporary copies of the diagonal to release dependencies on lower/upper parts should be moved to the driver level to avoid synchronization steps when possible. This is already done in the new HQR kernels but shoul...All data descriptor for temporary copies of the diagonal to release dependencies on lower/upper parts should be moved to the driver level to avoid synchronization steps when possible. This is already done in the new HQR kernels but should be done in:
* [x] pzgelqf.c
* [x] pzgelqfrh.c
* [x] pzgeqrf.c
* [x] pzgeqrfrh.c
* [x] pzhetrd_he2hb.c
* [x] pztpgqrt.c
* [x] pzunglq.c
* [x] pzunglqrh.c
* [x] pzungqr.c
* [x] pzungqrrh.c
* [x] pzunmlq.c
* [x] pzunmlqrh.c
* [x] pzunmqr.c
* [x] pzunmqrrh.cChameleon 1.0.0BOUCHERIE RaphaelBOUCHERIE Raphael