Mentions légales du service

Skip to content

mpi all_reduce

Antoine Jego requested to merge all_redux into replicated_tasks

This MR targets replicated_tasks as it can be tied with the use of alternative_source.

StarPU lacks an all-reduce. While this operation could be done with a reduce then a broadcast, it might be interesting to have a shorter operation.

This MR proposes something simple akin to the butterfly pattern in an FFT. It works with non-power-of-2 contributions by adding an extra step. If we look at the litterature for MPI collectives in the 00s, a lot of patterns for all-reduce exists, sometimes involving halving the results (it makes sense with matrices, and could be achieved with partitioning in StarPU). A trade-off exists between latency and bandwidth.

The present implementation should be latency optimal (i.e. there are fewer steps than reduce + bcast).

  • docs
  • example
    • "simple" all-reduce, providing a benchmark
    • all-reduce + alternative_source ?
  • fortran interface
Edited by Antoine Jego

Merge request reports