mpi all_reduce (!68) · Merge requests · starpu / starpu

Antoine Jego requested to merge all_redux into replicated_tasks Oct 20, 2022

This MR targets replicated_tasks as it can be tied with the use of alternative_source.

StarPU lacks an all-reduce. While this operation could be done with a reduce then a broadcast, it might be interesting to have a shorter operation.

This MR proposes something simple akin to the butterfly pattern in an FFT. It works with non-power-of-2 contributions by adding an extra step. If we look at the litterature for MPI collectives in the 00s, a lot of patterns for all-reduce exists, sometimes involving halving the results (it makes sense with matrices, and could be achieved with partitioning in StarPU). A trade-off exists between latency and bandwidth.

The present implementation should be latency optimal (i.e. there are fewer steps than reduce + bcast).

Edited Oct 20, 2022 by Antoine Jego

Admin message

mpi all_reduce

Merge request reports