Chameleon issueshttps://gitlab.inria.fr/solverstack/chameleon/-/issues2022-02-17T09:16:59+01:00https://gitlab.inria.fr/solverstack/chameleon/-/issues/105Check pause / resume2022-02-17T09:16:59+01:00Mathieu FavergeCheck pause / resumeCheck that the runtime system is not paused when waiting for the end of a sequence.Check that the runtime system is not paused when waiting for the end of a sequence.Chameleon 1.2.0Mathieu FavergeMathieu Favergehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/87Default number of threads is 1 in new testing2020-01-10T16:03:52+01:00Philippe SWARTVAGHERDefault number of threads is 1 in new testingWhen running `mpirun -n 2 -nodelist jack0,jack1 -DSTARPU_FXT_TRACE=1 -DSTARPU_FXT_PREFIX=$(pwd)/ ~/chameleon/build/new-testing/snew-testing -o potrf -H`, I get the following output:
```
# jack0: WARNING- InfinibandVerbs: device = mlx4_0;...When running `mpirun -n 2 -nodelist jack0,jack1 -DSTARPU_FXT_TRACE=1 -DSTARPU_FXT_PREFIX=$(pwd)/ ~/chameleon/build/new-testing/snew-testing -o potrf -H`, I get the following output:
```
# jack0: WARNING- InfinibandVerbs: device = mlx4_0; port 2 is not active.
# jack1: WARNING- InfinibandVerbs: device = mlx4_0; port 2 is not active.
[starpu][starpu_initialize] Warning: StarPU was configured with --with-fxt, which slows down a bit, limits scalability and makes worker initialization sequential
[starpu][starpu_initialize] Warning: StarPU was configured with --with-fxt, which slows down a bit, limits scalability and makes worker initialization sequential
# pioman: WARNING- Ignoring call to piom_ltask_set_bound_thread_indexes as PIOM_DEDICATED_WAIT=0.
Id Function threads gpus P Q nb uplo n lda seedA time gflops
# pioman: WARNING- Ignoring call to piom_ltask_set_bound_thread_indexes as PIOM_DEDICATED_WAIT=0.
0 spotrf 1 0 1 2 320 Upper 1000 1000 1804289383 1.952684e-02 1.709614e+01
Connection to jack1 closed.
```
The default number of used threads is 1. In previous timing binaries, it was by default the number of workers allowed by StarPU. If I precise `-t [x]`, the provided number of threads is well used.
Is it a bug ?Chameleon 1.0.0Mathieu FavergeMathieu Favergehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/72Would need matrix name for tracing, benchmarking, etc. tools2023-07-03T16:00:22+02:00THIBAULT Samuelsamuel.thibault@inria.frWould need matrix name for tracing, benchmarking, etc. toolsHello,
When using tracing, benchmarking, etc. tools with StarPU, we have the tile coordinates thanks to the call to starpu_data_set_coordinates . We would however need to also call starpu_data_set_name to provide a name for the matrix. ...Hello,
When using tracing, benchmarking, etc. tools with StarPU, we have the tile coordinates thanks to the call to starpu_data_set_coordinates . We would however need to also call starpu_data_set_name to provide a name for the matrix. For instance with time_zgemm_tile, 3 matrices are used (A,B,C), and if tracing tools only provide the tile coordinates, one don't know whether it's A, B, or C.
AIUI, that'd require to add a string parameter to PASTE_CODE_ALLOCATE_MATRIX_TILE (or perhaps automatically stringify the descA parameter?), to pass it as a new parameter to CHAMELEON_Desc_Create_* and to chameleon_desc_init. Or do you prefer to introduce another function, that PASTE_CODE_ALLOCATE_MATRIX_TILE would call after Desc_Create ?
SamuelChameleon 1.3.0https://gitlab.inria.fr/solverstack/chameleon/-/issues/58Add the PLTMG matrix generator from PLASMA/DPLASMA2023-05-30T13:31:52+02:00Mathieu FavergeAdd the PLTMG matrix generator from PLASMA/DPLASMAChameleon 1.3.0https://gitlab.inria.fr/solverstack/chameleon/-/issues/39Adding gflop for each task2023-05-30T13:30:44+02:00THIBAULT Samuelsamuel.thibault@inria.frAdding gflop for each taskIt'd be useful to add the amount of GFlop for each StarPU codelet, this way:
starpu_task_insert(..., STARPU_FLOPS, MULS(nb) + ADDS(nb), ...);It'd be useful to add the amount of GFlop for each StarPU codelet, this way:
starpu_task_insert(..., STARPU_FLOPS, MULS(nb) + ADDS(nb), ...);Chameleon 1.3.0https://gitlab.inria.fr/solverstack/chameleon/-/issues/36Intrioduce QR trees in TPGQRT and TPQRT algorithms for QDWH2020-12-02T10:26:07+01:00Mathieu FavergeIntrioduce QR trees in TPGQRT and TPQRT algorithms for QDWHEverything is in the title, we should first at the parameterized version of those algorithm, so we can work on dedicated trees for those operations.Everything is in the title, we should first at the parameterized version of those algorithm, so we can work on dedicated trees for those operations.Chameleon 1.1.0https://gitlab.inria.fr/solverstack/chameleon/-/issues/35Add HQR support within SVD/EVD reduction algorithms2023-05-30T13:27:33+02:00Mathieu FavergeAdd HQR support within SVD/EVD reduction algorithmsOnce issue #33 has been solved, we should add the support for automatic trees within SVD/EVD reduction to band algorithms.Once issue #33 has been solved, we should add the support for automatic trees within SVD/EVD reduction to band algorithms.Chameleon 1.3.0LISITO AlyciaLISITO Alyciahttps://gitlab.inria.fr/solverstack/chameleon/-/issues/34Improve workspace management2019-03-01T00:48:10+01:00Mathieu FavergeImprove workspace managementWorkspaces as TT,TS matrices could have a lower memory footprint, if we change the descriptors in the case of StarPU for descriptor without data, that will be allocated at first use.Workspaces as TT,TS matrices could have a lower memory footprint, if we change the descriptors in the case of StarPU for descriptor without data, that will be allocated at first use.Chameleon 0.9.2https://gitlab.inria.fr/solverstack/chameleon/-/issues/33Diagonal copy support2017-07-04T11:22:53+02:00Mathieu FavergeDiagonal copy supportAll data descriptor for temporary copies of the diagonal to release dependencies on lower/upper parts should be moved to the driver level to avoid synchronization steps when possible. This is already done in the new HQR kernels but shoul...All data descriptor for temporary copies of the diagonal to release dependencies on lower/upper parts should be moved to the driver level to avoid synchronization steps when possible. This is already done in the new HQR kernels but should be done in:
* [x] pzgelqf.c
* [x] pzgelqfrh.c
* [x] pzgeqrf.c
* [x] pzgeqrfrh.c
* [x] pzhetrd_he2hb.c
* [x] pztpgqrt.c
* [x] pzunglq.c
* [x] pzunglqrh.c
* [x] pzungqr.c
* [x] pzungqrrh.c
* [x] pzunmlq.c
* [x] pzunmlqrh.c
* [x] pzunmqr.c
* [x] pzunmqrrh.cChameleon 1.0.0BOUCHERIE RaphaelBOUCHERIE Raphaelhttps://gitlab.inria.fr/solverstack/chameleon/-/issues/14Add truncated SVD support2023-05-30T13:27:46+02:00Mathieu FavergeAdd truncated SVD supportFollowing the discussion with Duygu Kan during PLA 2017Following the discussion with Duygu Kan during PLA 2017Chameleon 1.3.0Mathieu FavergeMathieu Favergehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/107Add default parameter values in testings' help2024-02-06T12:04:20+01:00Philippe SWARTVAGHERAdd default parameter values in testings' helpIt could be nice if the help of the testing binaries could tell what is default value of the parameters, when parameters are not specified on the CLI.It could be nice if the help of the testing binaries could tell what is default value of the parameters, when parameters are not specified on the CLI.https://gitlab.inria.fr/solverstack/chameleon/-/issues/98xlatms2020-10-12T17:21:03+02:00Mathieu Favergexlatms- [ ] Check why mode 5 creates NaN number with "s" and seedA = 2112255763
- [ ] Add supports for all distributions of random numbers- [ ] Check why mode 5 creates NaN number with "s" and seedA = 2112255763
- [ ] Add supports for all distributions of random numbershttps://gitlab.inria.fr/solverstack/chameleon/-/issues/96Example in --help option2020-03-13T11:44:45+01:00Ghost UserExample in --help optionHello,
You should add the README example ./testing/chameleon_stesting -H -o gemm -t 2 -m 2000 -n 2000 -k 2000 at the end of the --help output as an exampleHello,
You should add the README example ./testing/chameleon_stesting -H -o gemm -t 2 -m 2000 -n 2000 -k 2000 at the end of the --help output as an examplehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/94Restore nobigmat option?2022-11-11T10:00:42+01:00THIBAULT Samuelsamuel.thibault@inria.frRestore nobigmat option?When running with StarPU and StarPU's NUMA support enabled, allocating the whole matrix in one chunk will not fly if it doesn't fit NUMA node zero. The nobigmat option of the previous testing infrastructure was useful to let StarPU alloc...When running with StarPU and StarPU's NUMA support enabled, allocating the whole matrix in one chunk will not fly if it doesn't fit NUMA node zero. The nobigmat option of the previous testing infrastructure was useful to let StarPU allocate on the fly, by passing CHAMELEON_MAT_ALLOC_TILE to CHAMELEON_Desc_Create. Could we restore this option?
@pswartvahttps://gitlab.inria.fr/solverstack/chameleon/-/issues/59Use asynchronous data acquire/release in data_getoncpu2018-02-02T21:19:33+01:00VILLEVEYGOUX LéoUse asynchronous data acquire/release in data_getoncpuWith StarPU, the call to `RUNTIME_desc_getoncpu()` could generate a big
data transfer block, visible at the end of
[this trace](/uploads/0dfea6830a5fb35ac7305ba386021786/before.paje.trace)
for example.
This can be fixed by using the as...With StarPU, the call to `RUNTIME_desc_getoncpu()` could generate a big
data transfer block, visible at the end of
[this trace](/uploads/0dfea6830a5fb35ac7305ba386021786/before.paje.trace)
for example.
This can be fixed by using the asynchronous version of data acquire/release
functions, and calling `RUNTIME_desc_getoncpu()` before `morse_sequence_wait()`.
```diff
diff --git a/compute/zpotrf.c b/compute/zpotrf.c
index e69c04b..a5772ea 100644
--- a/compute/zpotrf.c
+++ b/compute/zpotrf.c
@@ -211,8 +211,8 @@ int MORSE_zpotrf_Tile(MORSE_enum uplo, MORSE_desc_t *A)
}
morse_sequence_create(morse, &sequence);
MORSE_zpotrf_Tile_Async(uplo, A, sequence, &request);
- morse_sequence_wait(morse, sequence);
RUNTIME_desc_getoncpu(A);
+ morse_sequence_wait(morse, sequence);
status = sequence->status;
morse_sequence_destroy(morse, sequence);
diff --git a/runtime/starpu/control/runtime_descriptor.c b/runtime/starpu/control/runtime_descriptor.c
index b148888..adf3323 100644
--- a/runtime/starpu/control/runtime_descriptor.c
+++ b/runtime/starpu/control/runtime_descriptor.c
@@ -312,8 +312,11 @@ int RUNTIME_desc_getoncpu( MORSE_desc_t *desc )
continue;
}
- starpu_data_acquire(*handle, STARPU_R);
- starpu_data_release(*handle);
+ /* Use the async acquire/release version
+ * this only works because we know that starpu_data_handle_t
+ * is actually a pointer */
+ starpu_data_acquire_cb(*handle, STARPU_R,
+ starpu_data_release, *handle);
handle++;
}
```
This needs to be done for every function in `compute/`,
and potentially for other runtimes.
With this fix the transfer block vanishes, see
[this trace](/uploads/8867ab15d8d1c7b36ebce248c26de909/after.paje.trace).Mathieu FavergeMathieu Favergehttps://gitlab.inria.fr/solverstack/chameleon/-/issues/23Add parameter to codelets for task locality2017-08-23T15:35:13+02:00Mathieu FavergeAdd parameter to codelets for task localityRight now the task locality is based on the RW data from each codelet. Some codelets have an attemps fo changing the task locality by looking at the data sizes. This should not be done at the codelet level as the amounbt of information i...Right now the task locality is based on the RW data from each codelet. Some codelets have an attemps fo changing the task locality by looking at the data sizes. This should not be done at the codelet level as the amounbt of information is really low, but at the algorithm level.
Thus, we need to add a locality parameter in the codelet, and affect this parameter in all algorithm. By default, I suggest we keep the owner compute rules, and then we will discuss the possibility to change this rules for trsv like function as it is done now at codelet level.