Fix REDUX usage
While discussing with Mathieu this morning, we realized that there was apparently a confusion about REDUX. Chameleon is currently using STARPU_MPI_REDUX, but I guess that was not what was intended.
STARPU_MPI_REDUX has the same semantic as STARPU_REDUX in terms of code, and notably starpu_mpi_redux_data still needs to be called to collect the results properly, it isn't implemented yet to automatically gather results when a tasks uses the data in non-redux mode (it shouldn't be very complex to do it in starpu_mpi_task_insert, it's just that nobody has taken up the task).
The difference is that STARPU_MPI_REDUX uses one buffer per MPI rank, thus parallelism only between ranks, while STARPU_REDUX uses one buffer per worker, thus complete parallelism (at the expense of memory use).
So all in all I believe that as of now chameleon should still be using STARPU_REDUX, and continue to #undef STARPU_REDUX when in MPI mode, until the automatic reduction gets implemented by somebody.