Dynamic pool of virtual machines hosted on ci.inria.fr using Terraform
Why should we use a dynamic pool of virtual machines?
GitLab shared runners allow pipelines to be executed on virtual machines deployed on the fly. However, some jobs may require virtual machines with specific needs that are not covered by shared runners: Windows or Mac OS, more CPUs, more memory, more disk space, large data set, etc. The cloud of virtual machines provided by ci.inria.fr covers these needs by providing large resources for customizable virtual machines.
However, these virtual machines consume resources and electric power all the day if there are on, even if they are used only some minutes each time a commit is performed. A good practice is to prepare virtual machine templates instead of keeping running virtual machines, and to instantiate the virtual machines only when they are used. Doing that consumes less energy, frees more resources for other users of the platform, and even allows the project to deploy reasonably more resources, but only when they are needed.
Prerequisites
This project has the same prerequisites as those listed for the terraform project.
main.tf
The Terraform configuration file The Terraform configuration file main.tf
is similar to the
configuration file described for the
`terraform
project.
There is one additional variable: runner_count
, of type number
.
variable "runner_count" {
type = number
}
The variable runner_count
has two purposes:
- It allows to deploy a virtual machine conditionally.
Indeed, one can pass
-var runner_count 0
toterraform plan
in order to destroy the virtual machine(s). - It allows to deploy many virtual machines if needed.
For instance, this example deploys 3 copies of the template virtual machine,
to run three jobs in parallel.
It is worth noticing that even if you don't need many copies of a virtual
machine (either because you need only one virtual machine, or because you
need virtual machines with different templates or characteristics), such
a variable
runner_count
is still useful to pass either1
or0
, depending upon whether the virtual machines should be deployed or destroyed.
The virtual machines themselves are specified below.
resource "cloudstack_instance" "runner" {
count = var.runner_count
name = "gitlabcigallery-terraform-runner-${count.index}"
service_offering = "Custom"
template = "ubuntu-20.04-lts"
zone = "zone-ci"
details = {
cpuNumber = 1
memory = 1024
}
expunge = true
user_data = templatefile("cloud-init.sh.tftpl", {
index = count.index
REGISTRATION_TOKEN = var.REGISTRATION_TOKEN
SSH_PUBLIC_KEY = var.SSH_PUBLIC_KEY
})
connection {
type = "ssh"
host = self.name
user = "ci"
private_key = file("id_rsa")
bastion_host = "ci-ssh.inria.fr"
bastion_user = "gter001"
bastion_private_key = file("id_rsa")
}
provisioner "remote-exec" {
when = destroy
inline = ["sudo gitlab-runner unregister --all-runners || true"]
}
}
In comparison to the
terraform
project, the additional property count
specifies the number of virtual
machines to be deployed (set by the input variable runner_count
).
We then use the index of the virtual machine available through count.index
(which will be between 0 and count
-1) for suffixing the name
so that each virtual machine is named uniquely, and we pass the index
to the template file so that each runner can be registered with a different
tag runner-${index}
by the script
cloud-init.sh.tftpl
.
.gitlab-ci.yml
The pipeline specification file In comparison to the
[terraform(https://gitlab.inria.fr/gitlabci_gallery/orchestration/terraform#the-pipeline-specification-file-gitlab-ciyml)
project, we suppress build
stage:
the plan and the deployment are performed in the same deploy
phase,
which is no longer manual.
Indeed, contrary to the terraform project, the deployment of the
infrastructure is now necessarily linked to the subsequent execution of the
pipeline on this infrastructure, because this infrastructure will be
deployed only during this pipeline and will be destroyed at the end.
There is an additional cleanup
stage that destroys the runner at
the end of the pipeline. The cleanup
job has the property
when: always
, so that it is executed even when previous jobs fail.
The stages are then as follows.
stages:
- validate
- deploy
- execute
- cleanup
Every job that will use the Terraform configuration file needs to copy
the file referred by SSH_PRIVATE_KEY
into the file id_rsa
.
To copy the file in the validate
job without overriding all the script,
we use the before_script
key.
validate:
tags:
- linux
- small
extends: .terraform:validate
before_script:
- cp $SSH_PRIVATE_KEY id_rsa
The deploy
phase begins by deploying 0 runners (i.e., it destroys
all possibly existing runners): usually, no runners should have been
deployed before, so this should be normally a no-op
, but this allows
us to clean the environment in the case the cleaning phase of previous
pipelines has failed.
Then, new runners are deployed, 3
in this example.
deploy:
stage: deploy
tags:
- linux
- small
script:
- cp $SSH_PRIVATE_KEY id_rsa
- gitlab-terraform plan -var runner_count=0
- gitlab-terraform apply
- gitlab-terraform plan -var runner_count=3
- gitlab-terraform apply
The execute
phase uses a
matrix
to run jobs in parallel on these three runners,
by specifying the runner-$index
tag to distinguish them.
execute:
stage: execute
image: alpine
parallel:
matrix:
- index: [0, 1, 2]
tags:
- terraform
- docker
- runner-$index
script:
- echo Greetings from runner $index!
There is an additional cleanup
job that is always executed
(even if previous jobs failed) and destroys all the runners
by assigning runner_count=0
.
cleanup:
stage: cleanup
tags:
- linux
- small
script:
- cd "${TF_ROOT}"
- cp $SSH_PRIVATE_KEY id_rsa
- gitlab-terraform plan -var runner_count=0
- gitlab-terraform apply
when: always