This:
- Do changes on data requests traces
- Add priorities on data requests from MPI
- By adding priority to starpu_data_acquire_on_node_cb_sequential_consistency_sync_jobids
- By adding new function starpu_mpi_irecv_detached_prio
- Add option (STARPU_MPI_EARLYDATA_ALLOCATE) to MPI driver to do early data request allocations and do not block too much.
- Add option (STARPU_CUDA_ONLY_FAST_ALLOC_OTHER_MEMNODES) to CUDA workers do not do slow allocations on other memnodes (RAM pinned memory allocations)
- During the beginning of the execution the CUDA workers will not be slowed down.
- Removes datawizard_progress from fetch_data_on_node as it can fail.
- Add priorities on data requests from _starpu_fetch_task_input