Mentions légales du service

Skip to content
Snippets Groups Projects
Commit 0634b898 authored by SIMONIN Matthieu's avatar SIMONIN Matthieu
Browse files

Introduce environment control tutorial

parent 0da7556f
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:86083fcd-c5b4-42c0-80f6-218784cb5647 tags:
# Resources selection and environment control
Get the resources that fit your need in terms of servers characteristics, network, disks and Operating Systems. Controlling what you get is a first step towards experiments reproducibility.
---
- Website: https://discovery.gitlabpages.inria.fr/enoslib/index.html
- Instant chat: https://framateam.org/enoslib
- Source code: https://gitlab.inria.fr/discovery/enoslib
---
## Prerequisites
<div class="alert alert-block alert-warning">
<ul>
<li>⚠️ Make sure you've run the one time setup for your environment</li>
<li>⚠️ Make sure you're running this notebook under the right kernel</li>
</ul>
</div>
%% Cell type:code id:0ff9540e-e238-4a4c-829a-7ada3ea4cf49 tags:
``` python
import enoslib as en
# Display some general information about the library
en.check()
# Enable rich logging
_ = en.init_logging()
```
%% Cell type:markdown id:d8f785dd-460b-4d61-a425-af64f136cfb8 tags:
## General considerations
Grid'5000 uses the [OAR](https://oar.imag.fr) scheduler behind the scene. The scheduler has powerful resource selections capabilities. You can refer to [some of the Grid'5000 tutorials](https://www.grid5000.fr/w/Getting_Started#Discovering,_visualizing_and_reserving_Grid'5000_resources) to explore them.
EnOSlib exposes a higher level interface for selecting resources which is based on the [Grid'5000 REST API](https://api.grid5000.fr/) (which wraps OAR).
In EnOSlib you can reserve compute resources, networks (provided by Grid5000) and disks with the following assumptions:
- Nodes are reserved as a whole (this makes a difference with OAR supports to reserve part of a node)
- Networks are those offered by Grid'5000 (Layer 3 subnets and Layer 2 VLANS - possibly spanning multiple sites)
- Local disks are reserved with there associated machines
## Nodes selection
### By cluster name
In EnOSlib you can reserve some nodes by specifying the cluster name. The summary of all the available cluster is summarized in [the hardware page](https://www.grid5000.fr/w/Hardware#Clusters) of the Grid'5000 documentation.
EnOSlib supports the so called multisite experiments (experiments spanning different sites) easily. To illustrate this let's reserve nodes from different sites. The multisites experiment requires to synchronize jobs on different sites. EnOSlib eases this process for you.
%% Cell type:code id:ef09e268-dd91-493f-b02c-896d65b8fc73 tags:
``` python
job_name="multisite"
conf = (
en.G5kConf.from_settings(job_type=[], job_name=job_name, walltime="0:10:00")
# For convenience, we use the site name as role
.add_machine(roles=["rennes", "intel"], cluster="paravance", nodes=1)
.add_machine(roles=["lille", "amd"], cluster="chiclet", nodes=1)
)
provider = en.G5k(conf)
```
%% Cell type:code id:f309e434-a245-4af0-94bc-c297907c5f10 tags:
``` python
roles, networks = provider.init()
```
%% Cell type:code id:b7df9949-8096-4e3b-a632-f873a2d74859 tags:
``` python
en.run_command("cat /proc/cpuinfo", roles=roles)
```
%% Cell type:code id:bb205e6f-06d2-4d22-a16b-1a04d95c2c91 tags:
``` python
provider.destroy()
```
%% Cell type:markdown id:5a7d423f-261f-4387-8c9f-b4773452aaa0 tags:
## By server names
On Grid’5000, machines belonging to a given cluster are normally homogeneous. But it is impossible to provide absolute guarantees about it: for instance, physical disks may have different performance characteristics across nodes of a cluster even though they share the same vendor and model. For this reason, experimenters may need to reproduce an experiment several times using the exact same hardware.
This is possible by specifying nodes with their exact name. By default all the servers specified this way will get reserved unless you specify a target number of nodes using the nodes parameter.
<div class="alert alert-warning">
In the following make sure to change the servers list, otherwise your reservation will conflict with others.
</div>
%% Cell type:code id:b4b559ee-c2f3-4ee5-b600-6a536c1ff44a tags:
``` python
job_name = "specific-server"
conf = (
en.G5kConf()
.from_settings(job_name=job_name, walltime="0:10:00")
.add_machine(
roles=["compute"],
servers=["paravance-19.rennes.grid5000.fr", "paravance-20.rennes.grid5000.fr"],
)
)
provider = en.G5k(conf)
```
%% Cell type:code id:a54af459-1cd7-4dd0-a7e6-b951e07eb763 tags:
``` python
roles, networks = provider.init()
```
%% Cell type:markdown id:c665fddd-b18f-4b2f-bda6-7a43683bd49d tags:
## Non default network selection
%% Cell type:markdown id:e8dfc6c8-a144-4fc7-9398-b92d5b11e2d4 tags:
In all of the above we get the default network resource (the "production network"). This network is shared with other users.
There are two other types of networks:
- `subnets`, which can be used if you need to assign extra addresses to your nodes (e.g virtual machines)
- `kavlans` are layer 2 isolated network. Using this network type currently requires an extra step after getting the resources: **a deployement** of a full OS on the node.
<div class="alert alert-warning">
The number of kavlans is limited:
<ul>
<li>kavlan-local: 3 per sites (non routed network)</li>
<li>kavlan: 3 per sites (routed network)</li>
<li>kavlan-global: 1 per site (allow multi site, isolated experiments)</li>
</ul>
</div>
%% Cell type:code id:86b1cfc2-5390-4092-9226-7512d8c85490 tags:
``` python
import logging
job_name = "vlan"
private_net = en.G5kNetworkConf(type="kavlan", roles=["private"], site="rennes")
conf = (
en.G5kConf.from_settings(
job_name=job_name,
job_type=["deploy"],
env_name="debian11-nfs",
walltime="0:20:00",
)
.add_network_conf(private_net)
.add_machine(
roles=["roleA"], cluster="paravance", nodes=1, primary_network=private_net
)
.finalize()
)
provider = en.G5k(conf)
roles, networks = provider.init()
```
%% Cell type:code id:fb1f62bc-a137-439e-9c77-4c25d136b405 tags:
``` python
# checking the networks we got
networks
```
%% Cell type:code id:63556a22-4cbb-41c9-8d79-62e3a27db788 tags:
``` python
# Checking the ips of the nodes
roles = en.sync_info(roles, networks)
roles
```
%% Cell type:code id:464899d6-a7a9-4b65-99b7-504898cc14f8 tags:
``` python
# Show kavlan subnet
print("Kavlan subnet:", networks["private"][0].network)
# The nodes use this kavlan network for all traffic
# (the network is interconnected at layer-3 with the rest of Grid'5000)
results = en.run_command("ip route get 9.9.9.9", roles=roles["roleA"])
for result in results:
print(f"{result.stdout}")
```
%% Cell type:code id:9b44fe33-beef-4318-aafc-f40318840c0f tags:
``` python
# release resources
provider.destroy()
```
%% Cell type:markdown id:6b036ad6-a3ee-418d-a5aa-7a7795bbe14c tags:
## Disk reservation primer
Grid’5000 has a [disk reservation](https://www.grid5000.fr/w/Disk_reservation) feature: on several clusters, reserving secondary disks is mandatory if you want to use them in your experiments.
Disk reservation feature addresses different use cases:
- benchmarking of storage
- long term storage of data local to the node computing them
Let's have a look in the following
<div class="alert alert-warning">
Make sure to specify a cluster that supports this feature -- refer to the documentation
</div>
%% Cell type:code id:428a44f4-c9c8-4bfe-8dc1-f7126f49c36c tags:
``` python
job_name = "disks"
conf = en.G5kConf.from_settings(
job_name=job_name, job_type=[], walltime="0:30:00"
).add_machine(
roles=["storage"],
cluster="gros",
nodes=1,
)
with en.G5k(conf) as (roles, _):
results = en.run_command("lsblk", roles=roles)
```
%% Cell type:code id:9f5f7790-1d96-49bf-8206-b5debb3b18fb tags:
``` python
# no extra disk available
print(results[0].stdout)
```
%% Cell type:code id:d5e22b94-c7d2-4410-a187-5c66568d41c4 tags:
``` python
```
%% Cell type:code id:a854cc54-23e2-474f-9f80-b456295efd2b tags:
``` python
job_name = "disks"
conf = en.G5kConf.from_settings(
job_name=job_name, job_type=[], walltime="0:30:00"
).add_machine(
roles=["storage"],
cluster="gros",
nodes=1,
reservable_disks=True
)
with en.G5k(conf) as (roles, _):
results = en.run_command("lsblk", roles=roles)
results
```
%% Cell type:code id:4f376c24-b595-4118-9416-4d559f2e4e38 tags:
``` python
# another disk is available
print(results[0].stdout)
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment