Jobs on a Supercomputer with Slurm
Plafrim supercomputer allows users to get a specific account to run their gitlab-ci jobs. Explanation here: https://plafrim-users.gitlabpages.inria.fr/doc/#gitlab-ci.
Runner installation on the supercomputer
Let's consider we got a Plafrim account "gitlab-gitlabci-gallery" specific for this project. First register the runner to be used on the supercomputer
ssh gitlab-gitlabci-gallery@plafrim
# gitlab-runner executable is already installed on plafrim
module add tools/gitlab-runner/14.7.0
# register the runner
gitlab-runner register
# register your specific runner with the appropriate information, see https://docs.gitlab.com/runner/register/#linux
# Example:
# instance URL: https://gitlab.inria.fr/,
# registration token: GR13489413XJvSphSc7fb2N2pgt4y,
# description: devel01.plafrim.cluster,
# tags: plafrim,
# executor: shell
Setup the URL and the token found in the gitlab web interface (Settings -> CI/CD -> Runners -> Specific runners -> Set up a specific runner manually). Setup tags such: the project name, guix, plafrim, shell, etc. Set shell as executor.
Increase the default number of jobs which can run concurently, edit the file
~/.gitlab-runner/config.toml
and change the value of concurrent, e.g.
concurrent = 10.
Then launch gitlab-runner in user mode to allow your runner waiting for new jobs triggered by Gitlab
ssh gitlab-gitlabci-gallery@plafrim
tmux
module add tools/gitlab-runner/14.7.0 tools/git/2.36.0 tools/gitlab-ci
gitlab-runner run &
# or use the available script on plafrim: gitlab-runner-keep-alive
# detach from the tmux shell: ctrl+b, d
# you can re-attach to it with: tmux attach
The runner should appear in your Gitlab's project in Settings -> CI/CD -> Runners -> Available specific runners.
Source code
The gitlab-ci jobs are defined in .gitlab-ci.yml
, see the results on the
CI/CD ->
Pipelines
page (remember to enable the CI/CD feature in Settings -> General -> Visibility,
project features, permissions).
Two jobs are defined with a parallel matrix, see this example:
- one using
salloc
, - and another one using
sbatch
.
sbatch
job submission is asynchronous, in the sense that it returns
immediately without waiting for the job completion, see this
discussion.
The two jobs perform the same thing, a "pingpong" test from the Intel MPI
benchmarks package, see the command
mpiexec IMB-MPI1 PingPong
.
The pipeline is triggered following the rule
schedule so that it is
not executed each time a branch is updated but only once a day at a fixed time.
Notice that the date, time, repetition can be configured differently see
cron. It can also be launched manually if
your clic on the Play button in the schedule
panel.
The Slurm's job queue may be busy and the job can take time to start.
Hence, we use a timeout
of 24h for the gitlab-ci job since it is triggered every 24 hours.
The kind of node (i.e. here a slurm parameter see the --constraint
flag) to
use is choosen thanks to a CI/CD
variable, arbitrarily named
CONS
, defined in the schedule panel (default is "bora").
Notice the software environment is GNU Guix but one can install programs manually in the "gitlab-gitlabci-gallery" account home directory.