starpu merge requestshttps://gitlab.inria.fr/starpu/starpu/-/merge_requests2023-08-25T11:48:24+02:00https://gitlab.inria.fr/starpu/starpu/-/merge_requests/104examples/lu: allow the applications to run with several iterations2023-08-25T11:48:24+02:00Nathalie Furmentoexamples/lu: allow the applications to run with several iterationshttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/101Add new scheduler darts (Data-Aware Reactive Task Scheduling)2023-09-25T11:23:49+02:00Nathalie FurmentoAdd new scheduler darts (Data-Aware Reactive Task Scheduling)https://gitlab.inria.fr/starpu/starpu/-/merge_requests/99Add STARPU_MPI_GPUDIRECT support to HIP2023-08-24T21:39:27+02:00LorisAdd STARPU_MPI_GPUDIRECT support to HIP* Tested with HIP-ROCm and HIP-CUDA
* For HIP-CUDA we disable gpudirect if we can't detect the MPI is compiled with CUDA support (like it's done for CUDA)
* For HIP-ROCm however, I propose to enable it at the user request even if we can'...* Tested with HIP-ROCm and HIP-CUDA
* For HIP-CUDA we disable gpudirect if we can't detect the MPI is compiled with CUDA support (like it's done for CUDA)
* For HIP-ROCm however, I propose to enable it at the user request even if we can't detect ROCm support.
The reason is that MPIX_Query_rocm_support isn't available on current release of OpenMPI, but in our case we use UCX to provide ROCm support. While having GPU support in both MPI and UCX is preferred (for collective), we can still have GPU-aware features without support in MPI (afaik). We could also detect ROCm support using ucp_context_query, but this should already be done in MPIX_Query_cuda_support so we can expect it will be the same for ROCm.
Instead I added a warning to inform the user about what starpu is doing.https://gitlab.inria.fr/starpu/starpu/-/merge_requests/96Fix hip memory pinning and hipblas configure issue2023-05-09T17:08:39+02:00LorisFix hip memory pinning and hipblas configure issue- Fix hip memory pinning issue, should improve performance for both cuda and rocm backends.
- Fix hipblas configure issue where we would use the wrong hipblas.h headers (only concerns hip for cuda backend)- Fix hip memory pinning issue, should improve performance for both cuda and rocm backends.
- Fix hipblas configure issue where we would use the wrong hipblas.h headers (only concerns hip for cuda backend)https://gitlab.inria.fr/starpu/starpu/-/merge_requests/95mpi: merge detached_requests_mutex and progress_mutex2023-05-09T15:38:33+02:00Nathalie Furmentompi: merge detached_requests_mutex and progress_mutexhttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/94Resolve "MPI: fix starpu_wait_for_all() to wait for detached requests"2023-05-05T16:26:19+02:00Nathalie FurmentoResolve "MPI: fix starpu_wait_for_all() to wait for detached requests"Closes #32Closes #32https://gitlab.inria.fr/starpu/starpu/-/merge_requests/93Resolve "Fortran MPI task_insert interface is not up-to-date"2023-04-25T16:55:35+02:00Nathalie FurmentoResolve "Fortran MPI task_insert interface is not up-to-date"Closes #31Closes #31https://gitlab.inria.fr/starpu/starpu/-/merge_requests/92Add a pipeline to test chameleon2023-04-24T12:53:27+02:00Nathalie FurmentoAdd a pipeline to test chameleonhttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/91fix pinned memory leak for hip2023-04-12T16:38:11+02:00Lorisfix pinned memory leak for hipFix huge memory leak for HIP driver.
- hipHostFree was never called because _starpu_can_submit_cuda_task was used instead of _starpu_can_submit_hip_task.
- might be useful to implement a HIP version for free_pinned_cl to free memory in...Fix huge memory leak for HIP driver.
- hipHostFree was never called because _starpu_can_submit_cuda_task was used instead of _starpu_can_submit_hip_task.
- might be useful to implement a HIP version for free_pinned_cl to free memory in a task as we do for CUDA ?https://gitlab.inria.fr/starpu/starpu/-/merge_requests/90mpi/tags: add a tag management systems to allow the application to book a set...2023-05-04T13:45:16+02:00Mathieu Favergempi/tags: add a tag management systems to allow the application to book a set...Backport the tag management system from pastix and chameleon (solverstack/chameleon!373) directly into StarPU to enable its use through multiple libraries in a same application.Backport the tag management system from pastix and chameleon (solverstack/chameleon!373) directly into StarPU to enable its use through multiple libraries in a same application.https://gitlab.inria.fr/starpu/starpu/-/merge_requests/89StarpuPY: fix master slave mode2023-03-15T09:49:08+01:00Nathalie FurmentoStarpuPY: fix master slave modestarpupy: only keep 1 version of execute.sh script and make it possible to pass parameters after the program filenamestarpupy: only keep 1 version of execute.sh script and make it possible to pass parameters after the program filenamehttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/88Fix hipblas configure to allow compiling when hipblas isn't found2023-03-17T12:36:37+01:00LorisFix hipblas configure to allow compiling when hipblas isn't found - Disable hipblas examples when hipblas isn't found
- Only check rocblas.h and rocblas.so when compiling for AMD target
- Add missing header include of `cublas.h` when compiling for Nvidia target
- Reactivate `example/mult` test as i... - Disable hipblas examples when hipblas isn't found
- Only check rocblas.h and rocblas.so when compiling for AMD target
- Add missing header include of `cublas.h` when compiling for Nvidia target
- Reactivate `example/mult` test as it was ported to HIP but the compilation for HIP was disabled during some previous rebase to master
- Fix logic of `AC_CHECK_` for hip and hibblashttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/85relax sanity check on pending scheduling op while attempting data transfer pr...2023-02-20T10:46:26+01:00AUMAGE Olivierrelax sanity check on pending scheduling op while attempting data transfer progressionAddress GitHub issue 6:
- `Assertion failed: (0 && "!worker->state_sched_op_pending"), function __starpu_datawizard_progress, file datawizard.c, line 130.
`
– In the backtrace below, `state_sched_op_pending` is set at frame #18 / `_starp...Address GitHub issue 6:
- `Assertion failed: (0 && "!worker->state_sched_op_pending"), function __starpu_datawizard_progress, file datawizard.c, line 130.
`
– In the backtrace below, `state_sched_op_pending` is set at frame #18 / `_starpu_push_task_to_workers()` around the call to the sched policy function.
```
* thread #12, name = 'CPU 0', stop reason = hit program assert
frame #0: 0x000000019742e868 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000197465cec libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x000000019739e2c8 libsystem_c.dylib`abort + 180
frame #3: 0x000000019739d620 libsystem_c.dylib`__assert_rtn + 272
* frame #4: 0x0000000102b79338 libstarpu-1.4.1.dylib`__starpu_datawizard_progress(may_alloc=_STARPU_DATAWIZARD_DO_NOT_ALLOC, push_requests=1) at datawizard.c:130:2
frame #5: 0x0000000102b794a8 libstarpu-1.4.1.dylib`_starpu_datawizard_progress(may_alloc=_STARPU_DATAWIZARD_DO_NOT_ALLOC) at datawizard.c:159:2
frame #6: 0x0000000102b90b70 libstarpu-1.4.1.dylib`_starpu_allocate_interface(handle=0x000000012d814200, replicate=0x000000012d814358, dst_node=0, is_prefetch=STARPU_TASK_PREFETCH, only_fast_alloc=0) at memalloc.c:1604:3
frame #7: 0x0000000102b910d4 libstarpu-1.4.1.dylib`_starpu_allocate_memory_on_node(handle=0x000000012d814200, replicate=0x000000012d814358, is_prefetch=STARPU_TASK_PREFETCH, only_fast_alloc=0) at memalloc.c:1665:21
frame #8: 0x0000000102b70044 libstarpu-1.4.1.dylib`_starpu_create_request_to_fetch_data(handle=0x000000012d814200, dst_replicate=0x000000012d814358, mode=STARPU_W, task=0x00000001687af9d0, is_prefetch=STARPU_TASK_PREFETCH, async=1, callback_func=0x0000000000000000, callback_arg=0x0000000000000000, prio=0, origin="task_prefetch_data_on_node") at coherency.c:660:8
frame #9: 0x0000000102b70bc0 libstarpu-1.4.1.dylib`_starpu_fetch_data_on_node(handle=0x000000012d814200, node=0, dst_replicate=0x000000012d814358, mode=STARPU_W, detached=1, task=0x00000001687af9d0, is_prefetch=STARPU_TASK_PREFETCH, async=1, callback_func=0x0000000000000000, callback_arg=0x0000000000000000, prio=0, origin="task_prefetch_data_on_node") at coherency.c:874:6
frame #10: 0x0000000102b70d7c libstarpu-1.4.1.dylib`task_prefetch_data_on_node(handle=0x000000012d814200, node=0, replicate=0x000000012d814358, mode=STARPU_W, task=0x00000001687af9d0, prio=0) at coherency.c:897:9
frame #11: 0x0000000102b7173c libstarpu-1.4.1.dylib`_starpu_prefetch_task_input_prio(task=0x00000001687af9d0, target_node=-1, worker=0, prio=0, prefetch=STARPU_PREFETCH) at coherency.c:1020:4
frame #12: 0x0000000102b7181c libstarpu-1.4.1.dylib`starpu_prefetch_task_input_prio(task=0x00000001687af9d0, target_node=-1, worker=0, prio=0) at coherency.c:1033:9
frame #13: 0x0000000102b71b08 libstarpu-1.4.1.dylib`starpu_prefetch_task_input_for_prio(task=0x00000001687af9d0, worker=0, prio=0) at coherency.c:1070:9
frame #14: 0x0000000102b71b9c libstarpu-1.4.1.dylib`starpu_prefetch_task_input_for(task=0x00000001687af9d0, worker=0) at coherency.c:1078:9
frame #15: 0x0000000102b48548 libstarpu-1.4.1.dylib`push_task_on_best_worker(task=0x00000001687af9d0, best_workerid=0, predicted=2.0157669999999999, predicted_transfer=0, prio=0, sched_ctx_id=0) at deque_modeling_policy_data_aware.c:373:3
frame #16: 0x0000000102b49a0c libstarpu-1.4.1.dylib`_dmda_push_task(task=0x00000001687af9d0, prio=0, sched_ctx_id=0, da=1, simulate=0, sorted_decision=0) at deque_modeling_policy_data_aware.c:755:10
frame #17: 0x0000000102b49cc8 libstarpu-1.4.1.dylib`dmda_push_task(task=0x00000001687af9d0) at deque_modeling_policy_data_aware.c:789:9
frame #18: 0x0000000102b2d2b8 libstarpu-1.4.1.dylib`_starpu_push_task_to_workers(task=0x00000001687af9d0) at sched_policy.c:778:11
frame #19: 0x0000000102b2ca58 libstarpu-1.4.1.dylib`_starpu_repush_task(j=0x0000000169c63c00) at sched_policy.c:650:8
frame #20: 0x0000000102b2c010 libstarpu-1.4.1.dylib`_starpu_push_task(j=0x0000000169c63c00) at sched_policy.c:544:9
frame #21: 0x0000000102acbb10 libstarpu-1.4.1.dylib`_starpu_enforce_deps_starting_from_task(j=0x0000000169c63c00) at jobs.c:991:8
frame #22: 0x0000000102af7bd0 libstarpu-1.4.1.dylib`_starpu_notify_cg(pred=0x0000000169c61e00, cg=0x000060001739a800) at cg.c:277:6
frame #23: 0x0000000102af8084 libstarpu-1.4.1.dylib`_starpu_notify_cg_list(pred=0x0000000169c61e00, successors=0x0000000169c62020) at cg.c:377:3
frame #24: 0x0000000102b021d0 libstarpu-1.4.1.dylib`_starpu_notify_task_dependencies(j=0x0000000169c61e00) at task_deps.c:66:2
frame #25: 0x0000000102af8510 libstarpu-1.4.1.dylib`_starpu_notify_dependencies(j=0x0000000169c61e00) at dependencies.c:32:2
frame #26: 0x0000000102acaa58 libstarpu-1.4.1.dylib`_starpu_handle_job_termination(j=0x0000000169c61e00) at jobs.c:542:3
frame #27: 0x0000000102c1ff7c libstarpu-1.4.1.dylib`_starpu_cpu_driver_execute_task(cpu_worker=0x0000000102cdc748, task=0x00000001687aeee0, j=0x0000000169c61e00) at driver_cpu.c:576:3
frame #28: 0x0000000102c20138 libstarpu-1.4.1.dylib`_starpu_cpu_driver_run_once(cpu_worker=0x0000000102cdc748) at driver_cpu.c:614:9
frame #29: 0x0000000102c20890 libstarpu-1.4.1.dylib`_starpu_cpu_worker(arg=0x0000000102cdc748) at driver_cpu.c:732:3
frame #30: 0x000000019746606c libsystem_pthread.dylib`_pthread_start + 148
```https://gitlab.inria.fr/starpu/starpu/-/merge_requests/84define un/pack_meta ops for data interfaces2023-02-03T15:19:44+01:00Nathalie Furmentodefine un/pack_meta ops for data interfaces- this allows master slave mode to use interfaces with dynamic contents- this allows master slave mode to use interfaces with dynamic contentshttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/83Resolve "Troubles into _starpu_mpi_wrapup_data()"2023-02-06T17:43:04+01:00Antoine JegoResolve "Troubles into _starpu_mpi_wrapup_data()"Closes #30Closes #30Antoine JegoAntoine Jegohttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/81Starpurm example2023-01-30T15:48:52+01:00AUMAGE OlivierStarpurm examplehttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/79Fix deadlock in sendrecv_gemm_bench2023-01-21T16:59:46+01:00Philippe SWARTVAGHERFix deadlock in sendrecv_gemm_benchProbably introduced by d5a18aee3ec5fcc21db416195026fa9020d81253.
Reverts 20ddd0954, c70e0cc7c, 23b5757e0 and f511e3c41.
We cannot do `starpu_pause()` in the main thread and submit tasks in the thread that performs the network ping-pongs!Probably introduced by d5a18aee3ec5fcc21db416195026fa9020d81253.
Reverts 20ddd0954, c70e0cc7c, 23b5757e0 and f511e3c41.
We cannot do `starpu_pause()` in the main thread and submit tasks in the thread that performs the network ping-pongs!https://gitlab.inria.fr/starpu/starpu/-/merge_requests/78fix redux wrapup when not enough contributors2023-01-30T16:31:48+01:00Antoine Jegofix redux wrapup when not enough contributorsThis MR fixes wrapup of distributed-memory reduction patterns when not enough contributors are involved.
When a redux pattern only involves one node (typically, the owner of the result), they exit the submission of the reduction pattern...This MR fixes wrapup of distributed-memory reduction patterns when not enough contributors are involved.
When a redux pattern only involves one node (typically, the owner of the result), they exit the submission of the reduction pattern without removing the related entry. The entry should be evicted in this case.Antoine JegoAntoine Jegohttps://gitlab.inria.fr/starpu/starpu/-/merge_requests/77Sendrecv benchmark with GPUDirect option2023-01-18T13:56:53+01:00Matthieu KuhnSendrecv benchmark with GPUDirect optionAdd an option to allocate buffers on cuda device for sendrecv benchmark.Add an option to allocate buffers on cuda device for sendrecv benchmark.https://gitlab.inria.fr/starpu/starpu/-/merge_requests/76Asynchronous HIP driver2023-02-10T12:46:43+01:00Matthieu KuhnAsynchronous HIP driverSimilarly to CUDA driver, enable HIP driver to run asynchronously.Similarly to CUDA driver, enable HIP driver to run asynchronously.