Add the slurm virtual cluster to the CI
This MR adds steps for testing the slurm schedulers in the pipeline including slurm
slurm-semiglobal
and slurm-openmpi
.
This is done by adding a new lxd runner that sits on Maiko. The creation of the cluster and the activation of the runner is semi-automated but still needs a user to manually register the runner with gitla.inria.fr
. All instructions for this process and source files are added in a new repository located here.
-
setup new virtual cluster with instructions -
add new lxd runner accessible by gitlab CI -
create new CI stages employing LXD for slurm, slurm-semiglobal and slurm-openmpi -
create a virtual environment that can be reused to speed up CI stages -
create instructions for rebuilding the cluster and adding the runner -
ensure fault tolerance is tested in slurm -
add an update script to the virtual-cluster-ci repository to make updating dependencies easier than rebuilding the entire cluster. -
test new iterative stats checkpointing