Fix REDUX usage

While discussing with Mathieu this morning, we realized that there was apparently a confusion about REDUX. Chameleon is currently using STARPU_MPI_REDUX, but I guess that was not what was intended.

STARPU_MPI_REDUX has the same semantic as STARPU_REDUX in terms of code, and notably starpu_mpi_redux_data still needs to be called to collect the results properly, it isn't implemented yet to automatically gather results when a tasks uses the data in non-redux mode (it shouldn't be very complex to do it in starpu_mpi_task_insert, it's just that nobody has taken up the task).

The difference is that STARPU_MPI_REDUX uses one buffer per MPI rank, thus parallelism only between ranks, while STARPU_REDUX uses one buffer per worker, thus complete parallelism (at the expense of memory use).

So all in all I believe that as of now chameleon should still be using STARPU_REDUX, and continue to #undef STARPU_REDUX when in MPI mode, until the automatic reduction gets implemented by somebody.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information