Mentions légales du service

Skip to content
Snippets Groups Projects
Commit 56db5dd1 authored by BADTS Thomas's avatar BADTS Thomas
Browse files

New Kwollect monitoring documentation

parent f9000b0b
No related branches found
No related tags found
No related merge requests found
......@@ -24,3 +24,4 @@ Tutorials
monitoring-energy-dstat-g5k-iotlab
monitoring-tig-g5k
monitoring-tig-chameleon
monitoring-kwollect
********************************
Kwollect monitoring on Grid'5000
********************************
.. _ kwollect_monitoring:
In this tutorial, we will show how to use `e2clab` with the kwollect monitoring service deployed on Grid'5000: `kwollect documentation`_
Kwollect is particularly useful as it gives you access to a bunch of already availble metrics on the nodes like:
- wattmetre power consumption
- metrics from the **Board Management Controller** (bmc)
- metrics from **Prometheus node exporter**
- etc
You can find all the availble metrics on every Grid'5000 clusters in the `kwollect documentation`_.
In this short example, we will show **how to**
- Fetch some of those metrics for your experiments
- How you can analyze them
- How to push your own custom metrics to the monitoring stack
Experiment Artifacts
====================
The artifacts repository contains the E2Clab configuration files
.. code-block:: console
git clone https://gitlab.inria.fr/E2Clab/examples/monitoring-kwollect.git
cd monitoring-kwollect
The structure of the experimental setup looks like this:
.. code-block:: none
kwollect_monitoring/ # SCENARIO_DIR
├── artifacts/ # ARTIFACTS_DIR
│ └── push_metrics.sh
├── .e2c_env
├── layers_services.yaml
├── networks.yaml
├── workflow.yaml
├── workflow_env.yaml
├── analysis.py
├── requirements.txt
├── README.md
└── ...
Defining the Experimental Environment
=====================================
Experiment definition
---------------------
Notice that the experiment artifacts contain a ``.e2c_env`` file:
.. literalinclude:: ../../examples/monitoring-kwollect/.e2c_env
:caption: .e2c_env
This defines environment variables that will be passed to the ``e2clab`` CLI.
As you can see in the documentation (:ref:`e2clab-cli`), we can define environment varibales for the ``SCENARIO_DIR`` and ``ARTIFACTS_DIR`` arguments.
In this example we define ``E2C_SCENARIO_DIR`` and ``E2C_ARTIFACTS_DIR`` as well as wether we want the debug logs.
Layers & Services Configuration
-------------------------------
This example of `layers_services.yaml` file defines our experiment's deployment.
We make a reservation on the `paradoxe` cluster and define a new section called `kwollect` (whose definition you can find in :ref:`layers_services_schema`)
Within the ``kwollect`` section we define two things:
- The metrics we want to pull from kwollect in the ``metrics`` option
- ``wattmetre_power_watt``
- ``bmc_node_power_watt``
- ``my_e2clab_metric``
- The timeframe we want to pull the metrics from, defined by the ``step`` option
- We pull the metrics recorded during the time it took to run the ``launch`` part of the workflow
Learn more about those options in the documentation section dedicated to :ref:`kwollect metrics`.
Another **important detail**, the `kwollect documentation`_ mentions that not all metrics are recorded from the nodes by default.
To make sure that we have access the the ``bmc_node_power_watt`` metric and our custom ``my_e2clab_metric`` are collected by the kwollect monitoring stack, we have to activate them with the ``monitor`` option in the ``g5k`` section of the ``environment``.
.. literalinclude:: ../../examples/monitoring-kwollect/layers_services.yaml
:language: yaml
:name: layers_services.yaml
:caption: layers_services.yaml
:linenos:
Network Configuration
---------------------
No network emulation needed:
.. literalinclude:: ../../examples/monitoring-kwollect/network.yaml
:language: yaml
:name: networks.yaml
:caption: networks.yaml
:linenos:
Workflow Configuration
----------------------
In this simple monitoring demonstration, we are just going to run some stress tests on the hosts with the ``stress`` command.
We also run the ``push_metrics.sh`` script in the background of the ``fog`` host to demonstrate how to publish and collect custom metrics with kwollect.
In this case, the ``push_metrics.sh`` just sets the ``my_e2clab_metric`` to a new value every 15 seconds.
This functionality may be useful if you need to monitor some values in real time or you want to fetch those metrics at the same time that you fetch the data from the kwollect api.
The ``{{ env_time }}`` variables refer to application configurations defined in the :ref:`workflow_env.yaml` file.
When running the ``long`` application configuration, the ``stress`` commands will last "60s", and "30s" when using the ``short`` configuration.
To know more about how ``workflow.yaml`` and ``workflow_env.yaml`` articulate, you ca read the following documentation: :ref:`workflow env`.
.. literalinclude:: ../../examples/monitoring-kwollect/workflow.yaml
:language: yaml
:name: workflow.yaml
:caption: workflow.yaml
:linenos:
.. literalinclude:: ../../examples/monitoring-kwollect/workflow_env.yaml
:language: yaml
:name: workflow_env.yaml
:caption: workflow_env.yaml
:linenos:
Running & Verifying Experiment Execution
========================================
Run the experiment
------------------
Use the command bellow to run this example:
.. code-block:: bash
e2clab deploy --app_conf short,long
The command will run the whole workflow for both configurations of your workflow ``long`` and ``short``.
Check metrics in real-time
--------------------------
During the experiment's execution, you can access the kwollect monitoring dashboard for the rennes site at: https://api.grid5000.fr/stable/sites/rennes/metrics/dashboard, and entering the ``Job ID`` corresponding to the deployment of the experiment on Grid’5000.
You can find that job id in the logs while running the experiment.
.. code-block:: none
[E2C,KWOLLECT] Access kwollect metric dashboard for job <JOB ID>: https://api.grid5000.fr/stable/sites/rennes/metrics/dashboard
.. figure:: monitoring-kwollect/grafana_kwollect.png
:width: 100%
:align: center
Example of visualization on the kwollect dashboard interface
Analyze experiment metrics
--------------------------
At the end of the deployment, the metrics that you requested will be fetched from the kwollect API and saved in a csv file in your result directory which should look lilke:
.. code-block:: none
YYMMDD-HHMMSS/
├── long/
│ ├── monitoring-data/
│ │ └── kwollect-data/
│ └── workflow-validate.out
├── short/
│ └── ...
├── e2clab.err
├── e2clab.log
├── layers_services-validate.yaml
└── workflow-validate.out
You will get output data for both the ``long`` and ``short`` configurations of the experiment.
Power consumption
~~~~~~~~~~~~~~~~~
We provide a simple python script to vizualize the data that was pulled from the kwollect API.
.. note::
It is best to run the follwoing commands inside of a python virtual environment
.. code-block:: console
pip install -r requirements.txt
analysis.py YYYYMMDD-hhmmss/long/monitoring-data/kwollect-data/<site>.yaml
.. figure:: monitoring-kwollect/data_analysis.png
:width: 100%
:align: center
Example of power consumption monitoring on Grid’5000 nodes
We can clearly see the rise in power consumption caused by the ``stress`` command.
.. note::
To know more about the caveats of power monitoring on Grid’5000, check the following link: https://www.grid5000.fr/w/Unmaintained:Power_Monitoring_Devices#measurement_artifacts_and_pitfalls
Custom metric
~~~~~~~~~~~~~
We can also check that our cutom metric ``my_e2clab_metric`` has been captured by kwollect,
first by looking at the online dashboard and entering ``my_e2clab_metric`` in the ``Metric name`` selection:
.. figure:: monitoring-kwollect/my_e2clab_metric.png
We can also check that it was indeed pulled from the API and saved in our experiment results:
.. code-block:: bash
cat YYYYMMDD-hhmmss/long/monitoring-data/kwollect-data/<site>.yaml | grep "my_e2clab_metric"
.. note::
To kwow more about custom metrics and at what timestamp they are recorded, check the **Pushing custom metrics** section in the `kwollect documentation`_.
Free the computing resources
----------------------------
Once you are done, you may kill your job on the Grid’5000 platform using the following command:
.. code-block:: bash
e2clab destroy
.. _kwollect documentation: https://www.grid5000.fr/w/Monitoring_Using_Kwollect
docs/examples/monitoring-kwollect/data_analysis.png

97.7 KiB

docs/examples/monitoring-kwollect/grafana_kwollect.png

154 KiB

docs/examples/monitoring-kwollect/my_e2clab_metric.png

28.1 KiB

......@@ -304,6 +304,8 @@ Also possibile using the ``deploy`` command:
:language: yaml
:linenos:
.. _workflow env:
workflow_env.yaml and application configuration
-----------------------------------------------
......
......@@ -221,6 +221,8 @@ the monitoring profiles in the dashboard through this link
average: 4
.. _kwollect metrics:
Monitoring Grid'5000 using Kwollect metrics
===========================================
......
......@@ -52,6 +52,7 @@ class MonitoringManager(Manager):
# TODO: Add different requirements when using TIG / TPG
SCHEMA = {
"title": "Monitoring manager",
"description": "Definition of the monitoring capabilities",
"type": "object",
"properties": {
......
......@@ -36,6 +36,7 @@ class MonitoringIoTManager(Manager):
_MONITORING_PROFILE = {
"description": "https://www.iot-lab.info/testbed/resources/monitoring",
"title": "FIT IoT-LAB monitoring manager",
"type": "array",
"items": {
"type": "object",
......
......@@ -40,7 +40,7 @@ class MonitoringKwollectManager(Manager):
SCHEMA = {
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"title": "Grid5000 kwollect monitoring Schema",
"title": "Grid5000 kwollect monitoring manager",
"properties": {
METRICS: {
"description": "Metrics to pull from job, '[all]' to pull all metrics",
......
......@@ -33,6 +33,7 @@ class ProvenanceManager(Manager):
SCHEMA = {
"description": "Definition of the provenance data capture capabilities",
"title": "Provenance manager",
"type": "object",
"properties": {
PROVENANCE_SVC_PROVIDER: {
......
......@@ -108,9 +108,9 @@ def load_yaml_file(file: Path) -> dict[str, str]:
except FileNotFoundError as e:
raise E2clabFileError(file, "File does not exist") from e
except yaml.YAMLError as e:
raise E2clabFileError(file, "Yaml syntax error in file") from e
raise E2clabFileError(file, f"Yaml syntax error in file: {e}") from e
except Exception as e:
raise E2clabFileError(file, "Unknown error") from e
raise E2clabFileError(file, f"Unknown error {e}") from e
def validate_conf(conf_file: Path, type: str) -> bool:
......
Subproject commit 02fd58a70f7f71bdd17a65722b71416c691c4870
Subproject commit cb7a98aee64ebec660054fc67c166d5fad927928
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment