Introduce slurm-mpirun scheduler
This MR was motivated by a recent issue encountered by T. Terraz on the MUSE cluster of Meso@LR. The cluster based on slurm does not support heterogeneous job submission with srun
. To circumvent this issue, the idea is to use mpirun
and its MPMD syntax but still in an indirect fashion (i.e. submission with sbatch scripts).
The adopted solution is to implement a new scheduler slurm-openmpi
which inherits from the slurm
one and which submits jobs by mixing openmpi
and slurm
schedulers.
To keep things as flexible as possible without doing any assumption about the resource usage, the configuration must comply with the following:
-
#SBATCH
level options correspond to those passed throughscheduler_arg_client
, -
mpirun
level options correspond to those passed throughscheduler_command_options
.
WARNING: full flexibility means that the total number of tasks wont be derived automatically. Making sure group_size
, #SBATCH --ntasks
and mpirun -n
are consistent is the user's responsibility.
NOTE: using slurm-openmpi
fixes the virtual cluster unit-group limitation!