Using Terraform to specify a pool of virtual machines hosted on ci.inria.fr
Why using Terraform to specify the pool of virtual machines used by continuous integration?
Terraform is an open-source software developed by HashiCorp to specify declaratively the state of a cluster of virtual machines.
Virtual machines are specified by a collection of text-based .tf
files, that can be stored and versioned in the project repository
itself.
The arguments in favor of such an approach are similar to the
arguments in favor of having a versioned specification for CI/CD
pipelines (.gitlab-ci.yml
, Jenkinsfile
, etc.):
-
having both cluster configuration and pipeline specification versioned in the project repository allows developer to keep track the history of the continuous integration environment: if something breaks, it is easier to see what has changed, in order to fix it or to roll back to a working version.
-
the specification is a reliable part of the documentation of the continuous integration environment, and the history makes easier to follow its evolution, in particular in the case where the maintenance charge is shared among multiple developers.
-
the specification makes the environment easily reproducible for other projects or other platforms, and common parts can be shared and factored. In particular, if the environment is lost (if virtual machines are accidentally destroyed), it can be automatically restored.
Moreover, the Terraform language provides us a convenient API to interact with CloudStack (the platform behind ci.inria.fr), for instance to create and destroy virtual machines on demand. This particular point is illustrated in the project terraform-dynamic.
The Terraform configuration language is documented on HashiCorp website.
Prerequisites
In addition to CI/CD features and shared runners (see the Prerequisites section in the intro project), projects using Terraform need:
-
A project on ci.inria.fr: see the tutorial to create a new project. (Note: Terraform will never touch on virtual machines that it didn't create itself and these virtual machines will not prevent it to create its own virtual machines, unless if there are naming conflicts. Therefore, you don't need to start from an empty project.)
-
A dedicated user on ci.inria.fr, which has the Slave Admin role on this project. This user is recommended to be specific for this project (this requires to use a specific email address, or a specific suffix for mail accounts that support it).
-
Cloudstack API and secret keys should be added as variables
CLOUDSTACK_API_KEY
andCLOUDSTACK_SECRET_KEY
of type Variable in CI/CD settings. See the intro project for details on how to set a CI/CD variable. To get the Cloudstack API and secret keys:- Go to https://sesi-cloud-ctl1.inria.fr/
- Login as the dedicated user, with
ci/project-name
as domain. - On the left sidebar, go to Accounts.
- Select admins, then View Users.
- Select the dedicated user.
- Copy the fields API Key and Secret key.
-
Gitlab runner registration token should be added as a variable
TF_VAR_REGISTRATION_TOKEN
of type Variable in CI/CD settings. This token will allow virtual machines deployed by Terraform to register themselves as GitLab runners to execute jobs in the project pipeline. TheTF_VAR_
prefix makes Terraform bind the value to thevar.REGISTRATION_TOKEN
variable inside Terraform configuration files (Terraform documentation). To get the registration token:- On the left sidebar in the Gitlab interface, go to Settings → CI/CD.
- Expand Runners.
- In the Specific runners section,
you will find it after the label And this registration token:.
There is a button Copy token just after the token to copy it
to the clipboard.
(Note: in this example, the same project is used for hosting
the Terraform configuration files and the build steps that are
executed on the virtual machines maintained by Terraform, but
this is not necessary the case. A distinct project can be
dedicated to maintain the infrastructure by Terraform, with
a variable
REGISTRATION_TOKEN
that points to another project dedicated for the build itself.)
-
This project needs a pair of passphrase-less SSH private/public keys for the GitLab shared runner to be able to connect to the deployed runners to unregister them from GitLab before deletion. You can use the following command to create a pair of SSH private/public keys without passphrase in the current directory (files
id_rsa
andid_rsa.pub
):ssh-keygen -b 4096 -f id_rsa -N ""
.- The contents of the private key file
id_rsa
should be added as a variableSSH_PRIVATE_KEY
of type File in CI/CD settings. See the intro project for details on how to set a CI/CD variable. - The public key file
id_rsa.pub
should be registered on ci.inria.fr portal to allow the dedicated user to connect to the hosted virtual machines (portal documentation). for details on how to register a public key on the portal. The contents of the public key file should also be added as a variable TF_VAR_SSH_PUBLIC_KEY of type Variable in CI/CD settings: the Terraform configuration filemain.tf
substitutes the public key in the cloud-init script templatecloud-init.sh.tftpl
, to register the key in the file~ci/.ssh/authorized_keys
in deployed virtual machines.
- The contents of the private key file
-
The repository contains a
backend.tf
file for connecting Terraform with GitLab. This may be convenient to delete it locally for runningterraform
directly on the development machine to experiment before committing changes to Gitlab: in these settings, we prefer to use the default (local) backend. However, we do not wantgit
to keep track of this deletion: we can run locallygit update-index --assume-unchanged backend.tf
after having removedbackend.tf
forgit
to ignore this change. -
The pipeline defined in
.gitlab-ci.yml
contains a non-blocking stepfmt
to check that*.tf
files are formatted accordingly to the Terraform guidelines. Automatic reformatting can be performed locally with the commandterraform fmt
. The file.pre-commit-config.yaml
defines a pre-commit hook to validate configuration files and perform automatic reformatting before each commit: you may enable it by installing pre-commit and initializing your repository with the commandpre-commit install
.
main.tf
The Terraform configuration file The configuration file main.tf
contains some
sections to set up CloudStack as a resource provider, and then the
specification of the resources themselves.
As explained in the previous section,
the secret REGISTRATION_TOKEN
is passed through a variable.
variable "REGISTRATION_TOKEN" {
type = string
sensitive = true
}
The variable is marked as sensitive to prevent Terraform from showing its value in output (Terraform documentation).
The SSH public key is passed through the variable SSH_PUBLIC_KEY
.
variable "SSH_PUBLIC_KEY" {
type = string
}
The value of the SSH_PUBLIC_KEY
variable will be stored in the file
~ci/.ssh/authorized_keys
in virtual machines, so that Terraform can
connect to the virtual machines with the private key to unregister the
runners before destroying the machines.
In this example, we set up three resources:
a virtual machine running on Ubuntu 20.04,
a virtual machine running on Windows 10, and
a virtual machine running on Mac OS X 15.
The three virtual machines register themselves as runners on gitlab.inria.fr:
the Ubuntu machine provides a docker
executor,
Windows and Mac OS X provide shell
executors
(powershell
for Windows, bash
for Mac OS X).
Ubuntu 20.04 virtual machine
resource "cloudstack_instance" "ubuntu" {
## It is a good practice to have the "{project name}-" prefix
## in VM names.
name = "gitlabcigallery-terraform-ubuntu"
service_offering = "Custom"
template = "ubuntu-20.04-lts"
zone = "zone-ci"
details = {
cpuNumber = 2
memory = 2048
}
expunge = true
user_data = templatefile("cloud-init.yaml.tftpl", {
REGISTRATION_TOKEN = var.REGISTRATION_TOKEN
SSH_PUBLIC_KEY = var.SSH_PUBLIC_KEY
})
connection {
type = "ssh"
host = self.name
user = "ci"
private_key = file("id_rsa")
bastion_host = "ci-ssh.inria.fr"
bastion_user = "gter001"
bastion_private_key = file("id_rsa")
}
provisioner "remote-exec" {
when = destroy
inline = ["sudo gitlab-runner unregister --all-runners || true"]
}
}
-
custom_instance
is an identifier for the resource, which can be used to refer to it elsewhere in the Terraform configuration; -
gitlabcigallery-terraform-ubuntu
is the name of the virtual machine: by convention, the prefixgitlabcigallery-terraform
is the name of the project on ci.inria.fr. - The service offering
Custom
allows us to specify the characterics of the virtual machine in the detailssection
:-
cpuNumber
should be between1
and16
(cores), -
memory
should be between1024
and24576
(GB).
-
-
template
can refer to a template by name or ID. The available templates can be listed with the ci.inria.fr portal in the virtual machine creation form (portal documentation). We rely here on the fact thatcloud-init
is installed in the template and takes into account the CloudStack user-data (CloudStack documentation for cloud-init support). We could also use aremote-exec
provisioner to connect the virtual machine via SSH to execute an initialization script on first boot: we will use this method with Windows and Mac OS X virtual machines, sincecloud-init
only exists on Linux. - There is only one zone,
zone-ci
, andexpunge
should be set totrue
to ask CloudStack to destroy the virtual machine immediately when Terraform needs to replace it (by default, virtual machines are kept during 24h after deletion, which prevents Terraform for recreating a machine with the same name). -
user_data
contains a script which is passed tocloud-init
to be run at the first boot of the virtual machine. Thetemplatefile
is used to read the cloud-init configuration file from filecloud-init.yaml.tftpl
by substituting${REGISTRATION_TOKEN}
with the value of the variable passed to Terraform. - We pass also the
SSH_PUBLIC_KEY
to the template file to have its value written in the~ci/.ssh/authorized_keys
file. We configure the connection via ssh to the runner:gter001
is the login of the dedicated user on ci.inria.fr, and we will make sure in the next section that the private key is written in the fileid_rsa
. We cannot use a variable for passing the path to this file, since the connection is used by a destroy provisioner, that cannot refer to variables. This destroy provisioner executesgitlab-runner unregister
before the destruction of the virtual machine; failures are ignored in case of thegitlab-runner
command was not yet installed when destroying occurs.
Windows 10 virtual machine
resource "cloudstack_instance" "windows" {
## It is a good practice to have the "{project name}-" prefix
## in VM names.
name = "gitlabcigallery-terraform-windows"
service_offering = "Custom"
template = "windows10-vs2022-runner"
zone = "zone-ci"
details = {
cpuNumber = 2
memory = 2048
}
expunge = true
connection {
type = "ssh"
host = self.name
user = "ci"
password = "ci"
bastion_host = "ci-ssh.inria.fr"
bastion_user = "gter001"
bastion_private_key = file("id_rsa")
target_platform = "windows"
}
provisioner "remote-exec" {
inline = [<<-EOF
gitlab-runner start
gitlab-runner register --non-interactive --tag-list terraform,windows --executor shell --shell powershell --url https://gitlab.inria.fr --registration-token ${var.REGISTRATION_TOKEN}
EOF
]
}
provisioner "remote-exec" {
when = destroy
inline = ["gitlab-runner unregister --all-runners || true"]
}
}
-
It is essential to specify
target_platform = "windows"
for the SSH connection to work. -
gitlab-runner
service is not started on boot by default in the template: we start the service explicitely before registering the runner.
Mac OS X 15 virtual machine
resource "cloudstack_instance" "macos" {
## It is a good practice to have the "{project name}-" prefix
## in VM names.
name = "gitlabcigallery-terraform-macos"
service_offering = "Custom"
template = "osx-15-runner"
zone = "zone-ci"
details = {
cpuNumber = 2
memory = 2048
}
expunge = true
connection {
type = "ssh"
host = self.name
user = "ci"
password = "ci"
bastion_host = "ci-ssh.inria.fr"
bastion_user = "gter001"
bastion_private_key = file("id_rsa")
}
provisioner "remote-exec" {
inline = [<<-EOF
set -ex
(
export PATH=/usr/local/bin:$PATH
gitlab-runner register --non-interactive --tag-list terraform,macos --executor shell --url https://gitlab.inria.fr --registration-token ${var.REGISTRATION_TOKEN}
) >~/log.txt 2>&1
EOF
]
}
provisioner "remote-exec" {
when = destroy
inline = [<<-EOF
export PATH=/usr/local/bin:$PATH
gitlab-runner unregister --all-runners || true"
EOF
]
}
}
-
In the
remote-exec
provisioner, outputs are redirected to~/log.txt
to ease debugging, since they are not shown in GitLab log. -
The executable
gitlab-runner
is in/usr/local/bin
, which is added toPATH
by.bashrc
(or.zshrc
), which is not sourced when commands are executed via SSH non interactively. Therefore, we add/usr/local/bin
specifically toPATH
before executinggitlab-runner
(we could have sourced. ~/.bashrc
instead).
cloud-init.yaml.tftpl
The cloud-init configuration file The cloud-init configuration file
cloud-init.yaml.tftpl
sets up the following:
- the user
ci
can executesudo
without password, so that the destroy provisioner be able to unregister the runners; - the SSH public key is registed as authorized key for
ci
, - by default, password authentication is disabled for
ci
(you may addlock_passwd: false
to enable it again, documentation); -
gitlab-runner
anddocker.io
is installed on the virtual machine, the runner is registered on gitlab.inria.fr. The configuration file should begin with the following line.
#cloud-config
You may provide a shell script with the according shebang (#!/bin/sh
) instead.
.gitlab-ci.yml
The pipeline specification file The pipeline specification file .gitlab-ci.yml
relies on the template
Terraform/Base.gitlab-ci.yml
,
provided by gitlab.com.
The usage of this template is described in the Gitlab documentation.
This template uses a Docker container
$CI_TEMPLATE_REGISTRY_HOST/gitlab-org/terraform-images/releases/1.1:v0.43.0
, where
CI_TEMPLATE_REGISTRY_HOST
is by default set to registry.gitlab.com
:
to use it on shared runners, we have mirrored this Docker container locally to circumvent
quotas.
The container is available registry.gitlab.inria.fr
(that we use for CI_TEMPLATE_REGISTRY_HOST
),
in the project gitlab-org/terraform-images
.
There are five stages:
stages:
- validate
- build
- deploy
- execute
- destroy
-
In the stage
validate
, the stepvalidate
checks that there is no error in*.tf
files and the stepfmt
checks that they are properly formatted according to the guidelines (thisfmt
step is non-blocking: the step can fail without stopping the pipeline). -
In the stage
plan
, the stepbuild
plans the modifications to apply to CloudStack to conform the configuration file. The plan will be stored as artifacts. -
In the stage
deploy
, the homonymous step applies the plan. This step is configured to be triggered manually: it can be triggered by clicking onPlay
in the project Pipelines page (on the left sidebar in the Gitlab interface, go to CI/CD → Pipelines). -
In the stage
execute
, the homonymous step executes a command using the docker-based GitLab runner hosted in the deployed virtual machine. (Note: we chose in this example to perform theexecute
stage in the same project, but we could have chosen to register the runner to another project by adjusting theREGISTRATION_TOKEN
variable.) -
In the stage
destroy
, the homonymous step is to be run manually and executesgitlab-terraform destroy
, which destroys the virtual machines (and these virtual machines haveremote-exec
destroy provisioners that unregister themselves from gitlab.inria.fr).
Every job that will use the Terraform configuration file needs to copy
the file referred by SSH_PRIVATE_KEY
into the file id_rsa
.
To copy the file without overriding all the script,
we use the before_script
key:
defining the before_script
key at top-level, outside any job,
makes the file be copied before every job.
before_script:
- cp $SSH_PRIVATE_KEY id_rsa
.gitignore
Ignored files in .gitignore
instructs git to ignore the local files
generated by the terraform
command, in the case this command is used
locally for experimentation.