Mentions légales du service

Skip to content
Snippets Groups Projects
Commit cbdcd806 authored by SIMONIN Matthieu's avatar SIMONIN Matthieu
Browse files

pass on 03

parent 8da6256e
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:33965d92-545e-4cc9-add1-261770b5a00e tags:
# Observability service
Third party software stack to keep an eye on your experiment or gather some metrics.
Note that this tutorial is about instrumenting on your own the deployed nodes. Note that Grid'5000 also provides ways to get data from your job using [a REST API](https://www.grid5000.fr/w/Monitoring_Using_Kwollect).
---
- Website: https://discovery.gitlabpages.inria.fr/enoslib/index.html
- Instant chat: https://framateam.org/enoslib
- Source code: https://gitlab.inria.fr/discovery/enoslib
---
## Prerequisites
<div class="alert alert-block alert-warning">
<ul>
<li>⚠️ Make sure you've run the one time setup for your environment</li>
<li>⚠️ Make sure you're running this notebook under the right kernel</li>
</ul>
</div>
%% Cell type:code id:779ea70c-aacd-45eb-9472-b9d17a16fd3a tags:
``` python
import enoslib as en
en.check()
```
%% Cell type:markdown id:0016ac0d-1a38-409c-9899-89c8a21ff14b tags:
## EnOSlib's observability service
## EnOSlib's service
---
A `Service` in EnOSlib is a third party software stack that is commonly used among experimenters.
In particular, EnOSlib has some some Services which deal with the problem of getting some insight on what's running on remote nodes.
A Service is a python object which exposes three main methods:
- `deploy`: which deploy the service
- `destroy`: remove stop the service
- `backup`: retrieve some states of the services (e.g monitoring information)
Usually a service is used as follow:
```python
service = Service(*args, **kwargs)
service.deploy()
...
# do stuffs
...
service.backup()
service.destroy()
```
But it's sometime useful to use a Context Manager when working with module:
```python
with Service(*args, **kwargs) as service:
...
# do stuffs
...
```
This allows for
- running the service for some time depending on what's inside the context manager
- cleaning (and backuping) stuffs automatically at the end
---
<div class="alert alert-info">
There are different EnOSlib services for different purposes (network emulation, docker deployment, orchestrator deployment ...).
You can check <a href="https://discovery.gitlabpages.inria.fr/enoslib/apidoc/index.html">the documentation</a>.
</div>
%% Cell type:markdown id:worse-equity tags:
## Common setup
%% Cell type:code id:stylish-bahamas tags:
``` python
import enoslib as en
# Enable rich logging
_ = en.init_logging()
```
%% Cell type:code id:dressed-mentor tags:
``` python
conf = (
en.G5kConf.from_settings(job_type=[], job_name="enoslib_observability")
.add_machine(
roles=["control", "xp"], cluster="parasilo", nodes=1
)
.add_machine(
roles=["agent", "xp"], cluster="parasilo", nodes=1
)
.finalize()
)
conf
```
%% Cell type:code id:corrected-analysis tags:
``` python
provider = en.G5k(conf)
roles, networks = provider.init()
roles
```
%% Cell type:markdown id:final-light tags:
### A simple load generator
We'll install a simple load generator: `stress` available in the debian packages.
%% Cell type:code id:marked-sport tags:
``` python
with en.actions(roles=roles["agent"]) as a:
a.apt(name="stress", state="present")
```
%% Cell type:markdown id:satellite-burlington tags:
## Monitoring with dstat
Dstat is a simple monitoring tool: https://github.com/dstat-real/dstat#information
It runs as a single process and collect metrics from various sources.
That makes it a good candidate for getting a quick insight on the resources consumptions during an experiment.
The EnOSlib implementation lets you easily
- start Dstat processes on remote machine and start dumping the metrics into a csv file( it's the purpose `deploy` method of the Dstat service)
- retrieve all the csvs file (one per remote node) on your local machine ( that's the purpose of the `backup` method)
- stop every remote Dstat processes (that's the purpose of the `destroy` method)
%% Cell type:markdown id:auburn-torture tags:
### Capture
Let's start with a single capture implemented using a context manager.
The context manager runs `deploy` when entering, and `backup/destroy` when exiting.
%% Cell type:code id:excellent-saudi tags:
``` python
# Start a capture on all nodes
# - stress on some nodes
import time
with en.Dstat(nodes=roles["xp"]) as d:
time.sleep(5)
en.run_command("stress --cpu 4 --timeout 10", roles=roles["agent"])
time.sleep(5)
```
%% Cell type:markdown id:announced-basis tags:
### Visualization
All the CSVs files are available under the `backup_dir` inside subdirectories named after the corresponding remote host alias:
```bash
<backup_sir> / host1 / ... / <metrics>.csv
/ host2 / ..../ <metrics>.csv
```
The following bunch of python lines will recursively look for any csv file inside these directories and build a DataFrame and a visualization
%% Cell type:code id:adopted-vocabulary tags:
``` python
import pandas as pd
import seaborn as sns
df = en.Dstat.to_pandas(d.backup_dir)
df
```
%% Cell type:code id:rough-campus tags:
``` python
# let's show the metrics !
sns.lineplot(data=df, x="epoch", y="usr", hue="host", markers=True, style="host")
```
%% Cell type:markdown id:fatal-center tags:
## Monitoring with Telegraf/[InfluxDB|prometheus]/grafana
%% Cell type:code id:finished-individual tags:
``` python
monitoring = en.TIGMonitoring(collector=roles["control"][0], agent=roles["agent"], ui=roles["control"][0])
monitoring
```
%% Cell type:code id:982e398e-c0da-4451-b0e8-b074262295ad tags:
``` python
monitoring.deploy()
```
%% Cell type:code id:fossil-closer tags:
``` python
en.run_command("stress --cpu 24 --timeout 60", roles=roles["agent"], background=True)
```
%% Cell type:markdown id:a6c0ea91-eb00-4be2-af27-d2b35a8e78fa tags:
<div class="alert alert-info">
💡 Accessing a service inside Grid'5000 isn't straightforward.
The following depends on your environment.
</div>
%% Cell type:markdown id:edcfb503-308e-4e40-82f9-73d1110af16f tags:
<div class="alert alert-info">
💡 Run the following on a terminal in your local computer
<p>
-> This requires that you SSH key is set.
<br/>
This can be done by <a href="https://api.grid5000.fr/ui/account">managing your account</a> on Grid'5000
</p>
</div>
%% Cell type:code id:3e07f4ec-064f-4896-a4a9-7b7951470718 tags:
``` python
print(f"""
Access the UI at {monitoring.ui.address}:3000 (admin/admin)")
---
tip1: create a ssh port forwarding -> ssh -NL 3000:{monitoring.ui.address}:3000 access.grid5000.fr (and point your browser to http://localhost:3000)
tip2: use a proxy socks -> ssh -ND 2100 access.grid5000.fr (and point your browser to http://{monitoring.ui.address}:3000)
tip3: use the G5K vpn
""")
```
%% Cell type:markdown id:4ea72b0d-854b-45f7-9f40-13bf00a5eed7 tags:
<div class="alert alert-info">
💡 EnOSlib provides a way to programmaticaly create the tunnel if this notebook runs on your laptop.
However this doesn't apply if the notebook is running on a frontend node or a compute node inside Grid'5000.
</div>
%% Cell type:code id:canadian-maryland tags:
``` python
# If you are running this notebook outside of Grid'5000 (e.g from your local machine), you can access the dashboard by creating a tunnel
# This doesn't apply if you are running this notebook from the frontend or a node inside Grid5000
tunnel = en.G5kTunnel(address=monitoring.ui.address, port=3000)
local_address, local_port, _ = tunnel.start()
print(f"The service is running at http://localhost:{local_port} (admin:admin)")
# wait some time
import time
time.sleep(60)
# don't forget to close it
tunnel.close()
```
%% Cell type:markdown id:ethical-creation tags:
To not forget to close the tunnel you can use a context manager: the tunnel will be closed automatically when exiting the context manager.
%% Cell type:code id:stunning-turkish tags:
``` python
import time
with en.G5kTunnel(address=monitoring.ui.address, port=3000) as (_, local_port, _):
print(f"The service is running at http://localhost:{local_port}")
time.sleep(60)
```
%% Cell type:markdown id:33b1fa5e-6303-4dfc-8d07-b7364939b1d7 tags:
## Packet sniffing with tcpdump
### Capture
%% Cell type:code id:integrated-feedback tags:
``` python
# start a capture
# - on all the interface configured on the my_network network
# - we dump icmp traffic only
# - for the duration of the commands (here a client is pigging the server)
with en.TCPDump(
hosts=roles["xp"], ifnames=["any"], options="icmp"
) as t:
backup_dir = t.backup_dir
_ = en.run(f"ping -c10 {roles['control'][0].address}", roles["agent"])
```
%% Cell type:markdown id:4c60271e-a59d-4237-beba-979c8e38a67f tags:
### Visualization
%% Cell type:code id:virtual-memorial tags:
``` python
from scapy.all import rdpcap
import tarfile
# Examples:
# create a dictionnary of (alias, if) -> list of decoded packets by scapy
decoded_pcaps = dict()
for host in roles["xp"]:
host_dir = backup_dir / host.alias
t = tarfile.open(host_dir / "tcpdump.tar.gz")
t.extractall(host_dir / "extracted")
# get all extracted pcap for this host
pcaps = (host_dir / "extracted").rglob("*.pcap")
for pcap in pcaps:
decoded_pcaps.setdefault((host.alias, pcap.with_suffix("").name),
rdpcap(str(pcap)))
# Displaying some packets
for (host, ifs), packets in decoded_pcaps.items():
print(host, ifs)
packets[0].show()
packets[1].show()
```
%% Cell type:markdown id:993b61cd-c168-4d2a-b8ac-d451cddb7723 tags:
### Capture on a specific network
You can start a capture on a dedicated network by specifying it to TCPDump
This will sniff all the packet that go through an interface configured in this specific network
You need to call `sync_info` first to enable the translation (network logical name)->interface name
%% Cell type:code id:281ed635-b2f2-4548-997d-85f32e595593 tags:
``` python
roles = en.sync_info(roles, networks)
```
%% Cell type:code id:3d9815b9-4602-44e8-9992-60445424cf22 tags:
``` python
# start a capture
# - on all the interface configured on the my_network network
# - we dump icmp traffic only
# - for the duration of the commands (here a client is pigging the server)
with en.TCPDump(
hosts=roles["xp"], networks=networks["my_network"], options="icmp"
) as t:
backup_dir = t.backup_dir
_ = en.run(f"ping -c10 {roles['control'][0].address}", roles["agent"])
```
%% Cell type:code id:4480a37a-16bc-443d-bcc7-efcdbeff00b8 tags:
``` python
from scapy.all import rdpcap
import tarfile
# Examples:
# create a dictionnary of (alias, if) -> list of decoded packets by scapy
decoded_pcaps = dict()
for host in roles["xp"]:
host_dir = backup_dir / host.alias
t = tarfile.open(host_dir / "tcpdump.tar.gz")
t.extractall(host_dir / "extracted")
# get all extracted pcap for this host
pcaps = (host_dir / "extracted").rglob("*.pcap")
for pcap in pcaps:
decoded_pcaps.setdefault((host.alias, pcap.with_suffix("").name),
rdpcap(str(pcap)))
# Displaying some packets
for (host, ifs), packets in decoded_pcaps.items():
print(host, ifs)
packets[0].show()
packets[1].show()
```
%% Cell type:markdown id:separated-briefing tags:
## Cleaning
%% Cell type:code id:structured-motor tags:
``` python
provider.destroy()
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment