New Kwollect monitoring documentation

56db5dd1 · BADTS Thomas · f9000b0b · 56db5dd1 · 56db5dd1 · 56db5dd1
Commit 56db5dd1 authored 3 months ago by BADTS Thomas
--- a/docs/examples/index.rst
+++ b/docs/examples/index.rst
@@ -24,3 +24,4 @@ Tutorials
    monitoring-energy-dstat-g5k-iotlab
    monitoring-tig-g5k
    monitoring-tig-chameleon
+    monitoring-kwollect
--- a/docs/examples/monitoring-kwollect.rst
+++ b/docs/examples/monitoring-kwollect.rst
+********************************
+Kwollect monitoring on Grid'5000
+********************************
+
+.. _ kwollect_monitoring:
+
+In this tutorial, we will show how to use `e2clab` with the kwollect monitoring service deployed on Grid'5000: `kwollect documentation`_
+
+Kwollect is particularly useful as it gives you access to a bunch of already availble metrics on the nodes like:
+
+- wattmetre power consumption
+- metrics from the **Board Management Controller** (bmc)
+- metrics from **Prometheus node exporter**
+- etc
+
+You can find all the availble metrics on every Grid'5000 clusters in the `kwollect documentation`_.
+
+In this short example, we will show **how to**
+
+- Fetch some of those metrics for your experiments
+- How you can analyze them
+- How to push your own custom metrics to the monitoring stack
+
+
+Experiment Artifacts
+====================
+
+The artifacts repository contains the E2Clab configuration files
+
+.. code-block:: console
+
+    git clone https://gitlab.inria.fr/E2Clab/examples/monitoring-kwollect.git
+    cd monitoring-kwollect
+
+The structure of the experimental setup looks like this:
+
+
+.. code-block:: none
+
+    kwollect_monitoring/        # SCENARIO_DIR
+    ├── artifacts/              # ARTIFACTS_DIR
+    │   └── push_metrics.sh
+    ├── .e2c_env
+    ├── layers_services.yaml
+    ├── networks.yaml
+    ├── workflow.yaml
+    ├── workflow_env.yaml
+    ├── analysis.py
+    ├── requirements.txt
+    ├── README.md
+    └── ...
+
+Defining the Experimental Environment
+=====================================
+
+Experiment definition
+---------------------
+
+Notice that the experiment artifacts contain a ``.e2c_env`` file:
+
+.. literalinclude:: ../../examples/monitoring-kwollect/.e2c_env
+    :caption: .e2c_env
+
+This defines environment variables that will be passed to the ``e2clab`` CLI.
+As you can see in the documentation (:ref:`e2clab-cli`), we can define environment varibales for the ``SCENARIO_DIR`` and ``ARTIFACTS_DIR`` arguments.
+
+In this example we define ``E2C_SCENARIO_DIR`` and ``E2C_ARTIFACTS_DIR`` as well as wether we want the debug logs.
+
+
+Layers & Services Configuration
+-------------------------------
+
+This example of `layers_services.yaml` file defines our experiment's deployment.
+We make a reservation on the `paradoxe` cluster and define a new section called `kwollect` (whose definition you can find in :ref:`layers_services_schema`)
+
+Within the ``kwollect`` section we define two things:
+
+- The metrics we want to pull from kwollect in the ``metrics`` option
+    - ``wattmetre_power_watt``
+    - ``bmc_node_power_watt``
+    - ``my_e2clab_metric``
+- The timeframe we want to pull the metrics from, defined by the ``step`` option
+    - We pull the metrics recorded during the time it took to run the ``launch`` part of the workflow
+
+Learn more about those options in the documentation section dedicated to :ref:`kwollect metrics`.
+
+Another **important detail**, the `kwollect documentation`_ mentions that not all metrics are recorded from the nodes by default.
+To make sure that we have access the the ``bmc_node_power_watt`` metric and our custom ``my_e2clab_metric`` are collected by the kwollect monitoring stack, we have to activate them with the ``monitor`` option in the ``g5k`` section of the ``environment``.
+
+
+.. literalinclude:: ../../examples/monitoring-kwollect/layers_services.yaml
+    :language: yaml
+    :name: layers_services.yaml
+    :caption: layers_services.yaml
+    :linenos:
+
+
+Network Configuration
+---------------------
+
+No network emulation needed:
+
+.. literalinclude:: ../../examples/monitoring-kwollect/network.yaml
+    :language: yaml
+    :name: networks.yaml
+    :caption: networks.yaml
+    :linenos:
+
+
+Workflow Configuration
+----------------------
+
+In this simple monitoring demonstration, we are just going to run some stress tests on the hosts with the ``stress`` command.
+We also run the ``push_metrics.sh`` script in the background of the ``fog`` host to demonstrate how to publish and collect custom metrics with kwollect.
+
+In this case, the ``push_metrics.sh`` just sets the ``my_e2clab_metric`` to a new value every 15 seconds.
+This functionality may be useful if you need to monitor some values in real time or you want to fetch those metrics at the same time that you fetch the data from the kwollect api.
+
+The ``{{ env_time }}`` variables refer to application configurations defined in the :ref:`workflow_env.yaml` file.
+When running the ``long`` application configuration, the ``stress`` commands will last "60s", and "30s" when using the ``short`` configuration.
+
+To know more about how ``workflow.yaml`` and ``workflow_env.yaml`` articulate, you ca read the following documentation: :ref:`workflow env`.
+
+.. literalinclude:: ../../examples/monitoring-kwollect/workflow.yaml
+   :language: yaml
+   :name: workflow.yaml
+   :caption: workflow.yaml
+   :linenos:
+
+.. literalinclude:: ../../examples/monitoring-kwollect/workflow_env.yaml
+   :language: yaml
+   :name: workflow_env.yaml
+   :caption: workflow_env.yaml
+   :linenos:
+
+
+Running & Verifying Experiment Execution
+========================================
+
+Run the experiment
+------------------
+
+Use the command bellow to run this example:
+
+.. code-block:: bash
+
+    e2clab deploy --app_conf short,long
+
+The command will run the whole workflow for both configurations of your workflow ``long`` and ``short``.
+
+Check metrics in real-time
+--------------------------
+
+During the experiment's execution, you can access the kwollect monitoring dashboard for the rennes site at: https://api.grid5000.fr/stable/sites/rennes/metrics/dashboard, and entering the ``Job ID`` corresponding to the deployment of the experiment on Grid’5000.
+
+You can find that job id in the logs while running the experiment.
+
+.. code-block:: none
+
+    [E2C,KWOLLECT] Access kwollect metric dashboard for job <JOB ID>: https://api.grid5000.fr/stable/sites/rennes/metrics/dashboard
+
+.. figure:: monitoring-kwollect/grafana_kwollect.png
+    :width: 100%
+    :align: center
+
+    Example of visualization on the kwollect dashboard interface
+
+Analyze experiment metrics
+--------------------------
+
+At the end of the deployment, the metrics that you requested will be fetched from the kwollect API and saved in a csv file in your result directory which should look lilke:
+
+.. code-block:: none
+
+    YYMMDD-HHMMSS/
+    ├── long/
+    │   ├── monitoring-data/
+    │   │   └── kwollect-data/
+    │   └── workflow-validate.out
+    ├── short/
+    │   └── ...
+    ├── e2clab.err
+    ├── e2clab.log
+    ├── layers_services-validate.yaml
+    └── workflow-validate.out
+
+You will get output data for both the ``long`` and ``short`` configurations of the experiment.
+
+Power consumption
+~~~~~~~~~~~~~~~~~
+
+We provide a simple python script to vizualize the data that was pulled from the kwollect API.
+
+.. note::
+
+    It is best to run the follwoing commands inside of a python virtual environment
+
+.. code-block:: console
+
+    pip install -r requirements.txt
+    analysis.py YYYYMMDD-hhmmss/long/monitoring-data/kwollect-data/<site>.yaml
+
+.. figure:: monitoring-kwollect/data_analysis.png
+    :width: 100%
+    :align: center
+
+    Example of power consumption monitoring on Grid’5000 nodes
+
+We can clearly see the rise in power consumption caused by the ``stress`` command.
+
+.. note::
+
+    To know more about the caveats of power monitoring on Grid’5000, check the following link: https://www.grid5000.fr/w/Unmaintained:Power_Monitoring_Devices#measurement_artifacts_and_pitfalls
+
+Custom metric
+~~~~~~~~~~~~~
+
+We can also check that our cutom metric ``my_e2clab_metric`` has been captured by kwollect,
+first by looking at the online dashboard and entering ``my_e2clab_metric`` in the ``Metric name`` selection:
+
+.. figure:: monitoring-kwollect/my_e2clab_metric.png
+
+
+We can also check that it was indeed pulled from the API and saved in our experiment results:
+
+.. code-block:: bash
+
+    cat YYYYMMDD-hhmmss/long/monitoring-data/kwollect-data/<site>.yaml | grep "my_e2clab_metric"
+
+.. note::
+
+    To kwow more about custom metrics and at what timestamp they are recorded, check the **Pushing custom metrics** section in the `kwollect documentation`_.
+
+Free the computing resources
+----------------------------
+
+Once you are done, you may kill your job on the Grid’5000 platform using the following command:
+
+.. code-block:: bash
+
+    e2clab destroy
+
+.. _kwollect documentation: https://www.grid5000.fr/w/Monitoring_Using_Kwollect
--- a/docs/examples/monitoring-kwollect/data_analysis.png
+++ b/docs/examples/monitoring-kwollect/data_analysis.png
--- a/docs/examples/monitoring-kwollect/grafana_kwollect.png
+++ b/docs/examples/monitoring-kwollect/grafana_kwollect.png
--- a/docs/examples/monitoring-kwollect/my_e2clab_metric.png
+++ b/docs/examples/monitoring-kwollect/my_e2clab_metric.png
--- a/docs/exp_workflow/index.rst
+++ b/docs/exp_workflow/index.rst
@@ -304,6 +304,8 @@ Also possibile using the ``deploy`` command:
    :language: yaml
    :linenos:

+.. _workflow env:
+
 workflow_env.yaml and application configuration
 -----------------------------------------------


--- a/docs/monitoring/index.rst
+++ b/docs/monitoring/index.rst
@@ -221,6 +221,8 @@ the monitoring profiles in the dashboard through this link
         average: 4


+.. _kwollect metrics:
+
 Monitoring Grid'5000 using Kwollect metrics
 ===========================================


--- a/e2clab/managers/monitoring.py
+++ b/e2clab/managers/monitoring.py
@@ -52,6 +52,7 @@ class MonitoringManager(Manager):

    # TODO: Add different requirements when using TIG / TPG
    SCHEMA = {
+        "title": "Monitoring manager",
        "description": "Definition of the monitoring capabilities",
        "type": "object",
        "properties": {

--- a/e2clab/managers/monitoring_iot.py
+++ b/e2clab/managers/monitoring_iot.py
@@ -36,6 +36,7 @@ class MonitoringIoTManager(Manager):

    _MONITORING_PROFILE = {
        "description": "https://www.iot-lab.info/testbed/resources/monitoring",
+        "title": "FIT IoT-LAB monitoring manager",
        "type": "array",
        "items": {
            "type": "object",

--- a/e2clab/managers/monitoring_kwollect.py
+++ b/e2clab/managers/monitoring_kwollect.py
@@ -40,7 +40,7 @@ class MonitoringKwollectManager(Manager):
    SCHEMA = {
        "$schema": "https://json-schema.org/draft/2019-09/schema",
        "type": "object",
-        "title": "Grid5000 kwollect monitoring Schema",
+        "title": "Grid5000 kwollect monitoring manager",
        "properties": {
            METRICS: {
                "description": "Metrics to pull from job, '[all]' to pull all metrics",

--- a/e2clab/managers/provenance.py
+++ b/e2clab/managers/provenance.py
@@ -33,6 +33,7 @@ class ProvenanceManager(Manager):

    SCHEMA = {
        "description": "Definition of the provenance data capture capabilities",
+        "title": "Provenance manager",
        "type": "object",
        "properties": {
            PROVENANCE_SVC_PROVIDER: {

--- a/e2clab/utils.py
+++ b/e2clab/utils.py
@@ -108,9 +108,9 @@ def load_yaml_file(file: Path) -> dict[str, str]:
    except FileNotFoundError as e:
        raise E2clabFileError(file, "File does not exist") from e
    except yaml.YAMLError as e:
-        raise E2clabFileError(file, "Yaml syntax error in file") from e
+        raise E2clabFileError(file, f"Yaml syntax error in file: {e}") from e
    except Exception as e:
-        raise E2clabFileError(file, "Unknown error") from e
+        raise E2clabFileError(file, f"Unknown error {e}") from e


 def validate_conf(conf_file: Path, type: str) -> bool:

--- a/monitoring-kwollect @ cb7a98ae
+++ b/monitoring-kwollect @ cb7a98ae
-Subproject commit 02fd58a70f7f71bdd17a65722b71416c691c4870
+Subproject commit cb7a98aee64ebec660054fc67c166d5fad927928