NEW VERSION: now a single script for the backend

c4550349 · LEPAGE Gaetan · 6d53e4c2 · 6d53e4c2 · 6d53e4c2 · c4550349
Commit c4550349 authored 2 years ago by LEPAGE Gaetan
--- a/.remi/config.yaml
+++ b/.remi/config.yaml
-# Name for your project
-project_name: rl_hm
-
-
-# Inria username
-username: galepage
-
-
-# Name of your Inria workstation
-pc_name: alya
-
-
-# Location of the project on the remote computer
-project_remote_path: /scratch/alya/galepage/.remi_projects/rl_hm
-
-
-# Bastion used to ssh into Inria resources
-bastion:
-  hostname: bastion.inrialpes.fr
-  username: galepage
-
-
-# Desktop background jobs
-background:
-    # Which backend to use (`screen` or `tmux`)
-    backend: screen
-
-    # Whether to keep the session alive after the job has ended.
-    # It lets you attach to the session to see the program output.
-    # If 'false', the session will be closed when the job is over and stdout/stderr will be lost.
-    # CAUTION: If true, you will have to manually re-attach and close the session.
-    keep_session_alive: false
-
-
-# Virtual environment
-virtual_env:
-  # Enable the virtual environment
-  enabled: false
-
-  # Which virtual environment backend to use (`conda` or `virtualenv`)
-  type: virtualenv
-
-  # For `virtualenv` or `conda` virtual environments, you can specify a custom path.
-  path: venv/
-
-  # The name of your virtual environment (for `conda` environments)
-  name: my_conda_env
-
-  # For `conda` environments, path to a `yaml` configuration path
-  conda_env_file: environment.yaml
-
-  # For `conda` environments, you may specify a python version
-  python_version: 3.9
-
-
-# Singularity container options
-singularity:
-  # The name of the 'recipe' file (`.def`) to build the singularity container.
-  def_file_name: container.def
-
-  # The name of the singularity image.
-  output_sif_name: container.sif
-
-  # A dictionnary of binds for the singularity container.
-  # If the value is empty (''), the mount point is the same as the path on the host.
-  # By default, the project folder is bound within the singularity container: This configuration
-  # then allows you to add extra locations.
-  # Example:
-  #     /path_on_host/my_data: /path_in_container/my_data
-  bindings:
-
-
-# Oarsub options (for more details on `oarsub`, please refer to
-# https://oar.imag.fr/docs/latest/user/commands/oarsub.html).
-oarsub:
-
-  # Job name
-  job_name: rl_hm
-
-  # Number of cpus requested.
-  num_cpus: 1
-
-  # Number of cpu cores requested.
-  # If the value is 0, all the cores for the requested cpus will be used.
-  num_cpu_cores: 0
-
-  # Number of GPUs requested.
-  # If the value is 0, no GPU will be requested (CPU only).
-  num_gpus: 1
-
-  # The maximum allowed duration for your job.
-  walltime: '72:00:00'
-
-  # The name of the requested cluster (perception, mistis, thoth...)
-  cluster_name: perception
-
-  # Optionnaly specify the id of a specific node (gpu3, node2...)
-  host_id:
-
-  # If the options above are too restricive for your use-case, you may
-  # directly provide a property list that will be provided to `oarsub` with the
-  # `-p` flag.
-  custom_property_query:
-
-  # Whether to schedule the job in the besteffort queue.
-  besteffort: true
-
-  # Whether to set the job as idempotent (see oarsub documentation for more details).
-  idempotent: false
-
-
-# Remote servers
-# Remote servers are applications that run on a remote computer and can be accessed from your local
-# browser thanks to remi.
-# Two such servers are supported right now:
-# - Jupyter notebook
-# - TensorBoard
-remote_servers:
-    # The command to run for opening the local browser (`<browser_cmd> <url>`)
-    browser_cmd: firefox
-
-    # Jupyter notebook
-    jupyter:
-        # The port (local and remote) for the server
-        port: 8080
-
-        # If true, automatically open the jupyter notebook in the local browser.
-        open_browser: true
-
-    # TensorBoard
-    tensorboard:
-        # The port (local and remote) for TensorBoard
-        port: 9090
-
-        # Directory from where to run tensorboard.
-        logdir: 'output/'
-
-        # If true, automatically open TensorBoard in the local browser.
-        open_browser: true
--- a/.remi/exclude.txt
+++ b/.remi/exclude.txt
-.remi
-output/
-notebooks/
-.git
-__pycache__
-.ipynb_checkpoints
-logs
-.envrc
-.DS_Store
-.*.swp
-*.egg-info/
-**/__pycache__/.idea
-.mypy_cache/
-venv/
-*.sif
-build-temp-*
-venv/
--- a/README.md
+++ b/README.md
@@ -6,23 +6,21 @@ Link: [robotlearn.gitlabpages.inria.fr/cluster-monitor](https://robotlearn.gitla

 ## Implementation overview

-The cluster monitor counts three entities:
- The **data fetcher** is running directly on an inria workstation (currently `alya`). It is
-  performing `ssh` commands to other nodes (especially `access1-cp`) to gather the cluster state.\
-  **Code:** `rl_hm/data_fetcher/`
- The **backend server** is running on a Linux server external to Inria and is receiving the **data
-  fetcher** updates through a TCP socket connection running over an SSH tunnel.
-  It exposes a [Socket.IO](https://socket.io/) server to the web clients that connect to it.
-  As soon as it receives an update from the **data fetcher**, it pushes it to all of the connected
-  clients through the _socket-io_ connection.\
-  **Code:** `rl_hm/backend/`
- Finally, the web clients run a javascript application that connects to the _socket-io_ server
-  (**backend**) and updates the html page with the reveived data.\
+The cluster monitor counts two entities:
+- The **backend** server is running on an Inria machine (`perception.inrialpes.fr`).
+  It is performing `ssh` commands to other nodes (especially `access1-cp`) to gather the cluster
+  state.\
+  It also exposes a [Socket.IO](https://socket.io/) server to the web clients that connect to it.
+  It pushes the cluster information to all of the connected clients through the _socket-io_
+  connection.\
+  **Code:** `backend/`
+- The **frontend** is a javascript application that connects to the _socket-io_ server
+  (**backend**) and updates the html page with the received data.\
  **Code:** `public/`

-## Acknowlegment
+## Acknowledgment

 - **David Emukpere:** for his numerous advice and help about web development and infrastructure.
- [**Tanguy Lepage:**](https://tanguylepage.com/) for the front-end HTML/CSS design.
+- [**Tanguy Lepage:**](https://tanguylepage.com/) for the frontend HTML/CSS design.
 - [**Anand Ballou:**](https://team.inria.fr/robotlearn/team-members/anand-ballou/) for his advice
  and his help on the cluster data fetching.
--- a/rl_hm/__init__.py
+++ b/rl_hm/__init__.py
--- a/rl_hm/_logging.py
+++ b/rl_hm/_logging.py
--- a/rl_hm/data_fetcher/cluster.py
+++ b/rl_hm/data_fetcher/cluster.py
@@ -31,7 +31,7 @@ USERNAMES: list[str] = [
    'dmeng',
    'wguo',
    'xbie',
-    'lgomezca',
+    'adupless',
    'bbasavas',
 ]

@@ -58,11 +58,13 @@ def _run_remote_command_and_fetch_dict(update_cmd: list[str]) -> dict:
    update_cmd = ['ssh', 'access1-cp'] + update_cmd

    try:
-        cmd_output: subprocess.CompletedProcess = subprocess.run(args=update_cmd,
-                                                                 check=False,
-                                                                 capture_output=True,
-                                                                 text=True,
-                                                                 timeout=TIMEOUT)
+        cmd_output: subprocess.CompletedProcess = subprocess.run(
+            args=update_cmd,
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=TIMEOUT
+        )

        return_code: int = cmd_output.returncode
        stdout: str = cmd_output.stdout
@@ -128,9 +130,10 @@ def _fetch_and_parse_oarnodes_dict() -> tuple[dict, list[str]]:
                if gpu_device not in job['gpu_ids']:
                    job['gpu_ids'].append(gpu_device)
            else:
-                node_dict['running_jobs'][job_id] = {'nb_cores': 1,
-                                                     'gpu_ids': [gpu_device]
-                                                     }
+                node_dict['running_jobs'][job_id] = {
+                    'nb_cores': 1,
+                    'gpu_ids': [gpu_device]
+                }

                jobs_list.append(job_id)

@@ -169,9 +172,11 @@ def _fetch_and_parse_oarstat_dict(jobs_list: list[str]) -> dict:
        start_time: datetime = datetime.fromtimestamp(job_dict['startTime'])

        hours, mins, secs = _extract_walltime(job_dict['message']).split(':')
-        walltime: timedelta = timedelta(hours=int(hours),
-                                        minutes=int(mins),
-                                        seconds=int(secs))
+        walltime: timedelta = timedelta(
+            hours=int(hours),
+            minutes=int(mins),
+            seconds=int(secs)
+        )

        max_time: datetime = start_time + walltime

@@ -225,7 +230,9 @@ def update() -> dict:

    jobs_dict: dict = _fetch_and_parse_oarstat_dict(jobs_list=jobs_list)

-    cluster_dict: dict = _merge(nodes_dict=nodes_dict,
-                                jobs_dict=jobs_dict)
+    cluster_dict: dict = _merge(
+        nodes_dict=nodes_dict,
+        jobs_dict=jobs_dict
+    )

    return cluster_dict
--- a/rl_hm/backend/main.py
+++ b/rl_hm/backend/main.py
-import socket
-import json
+#!/usr/bin/env python3
+
 from logging import Logger, getLogger
+import time
+from datetime import datetime

-from rl_hm._logging import init_logger
+from backend._logging import init_logger
+from backend import cluster

 import socketio  # type: ignore
 import eventlet  # type: ignore
@@ -12,46 +15,56 @@ eventlet.monkey_patch()
 init_logger()
 LOGGER: Logger = getLogger('rl_hm.web_app')

-DATA_FETCHER_PORT: int = 9999
+REFRESH_TIME: int = 1
+DATETIME_FORMAT: str = '%m/%d/%Y-%H:%M:%S'
+
 SOCKET_IO_PORT: int = 8888
 # 16_384
 MAX_MESSAGE_LENGTH: int = 2 ** 14

-socket_io: socketio.Server = socketio.Server(async_mode='eventlet',
-                                             cors_allowed_origins='*')
+socket_io: socketio.Server = socketio.Server(
+    async_mode='eventlet',
+    cors_allowed_origins='*'
+)

 num_connected_clients: int = 0

-data_fetcher_socket: socket.socket = socket.socket(family=socket.AF_INET,
-                                                   type=socket.SOCK_STREAM)

+def _update_loop(socket_io: socketio.Server) -> None:

-def _update_loop(socket_io: socketio.Server,
-                 data_fetcher_socket: socket.socket) -> None:
-
-    LOGGER.info("connecting to data fetcher socket on port %i", DATA_FETCHER_PORT)
-    data_fetcher_socket.connect(('localhost', DATA_FETCHER_PORT))
-    LOGGER.info("succesfully connected to data fetcher")
+    # Tracking
+    init_time: datetime = datetime.now()
+    step_counter: int = 1

    while True:
-        received_message_bytes: bytes = data_fetcher_socket.recv(MAX_MESSAGE_LENGTH)
-        decoded_string: str = received_message_bytes.decode()

-        try:
-            payload_dict = json.loads(decoded_string)
-        except json.decoder.JSONDecodeError as json_exception:
-            # if the received data is corrupted, just ignore this packet
-            LOGGER.error("JSONDecodeError while parsing received payload: %s", str(json_exception))
-            LOGGER.error("received payload: %s", decoded_string)
+        LOGGER.info("%i clients are currently connected", num_connected_clients)
+
+        if num_connected_clients > 0:
+
+            LOGGER.info("--> updating")
+            LOGGER.info("step n°%i", step_counter)
+            LOGGER.info("started: %s", init_time.strftime(DATETIME_FORMAT))
+
+            # If the update fails, just ignore it and try another time.
+            try:
+                payload_dict: dict = cluster.update()
+                # payload_dict: dict = {}
+
+                socket_io.emit(
+                    event='update',
+                    data=payload_dict
+                )

-            continue
+            except Exception as exception:
+                LOGGER.error("Exception:", exc_info=exception)

-        source: str = payload_dict.pop('source')
-        LOGGER.info("Received update for '%s'", source)
+        # Eventually wait before updating again
+        if REFRESH_TIME > 0:
+            LOGGER.info("waiting %is before updating again", REFRESH_TIME)
+            time.sleep(REFRESH_TIME)

-        # TODO use Socket.IO rooms to emit update
-        socket_io.emit(event='update',
-                       data=payload_dict)
+        step_counter += 1


 @socket_io.on('connect')
@@ -63,7 +76,6 @@ def callback_connect(*args) -> None:

    if num_connected_clients == 1:
        LOGGER.info("This is the first client: starting to fetch cluster updates")
-        data_fetcher_socket.sendall('cluster_start'.encode())


 @socket_io.on('disconnect')
@@ -75,20 +87,24 @@ def callback_disconnect(*args) -> None:

    if num_connected_clients == 0:
        LOGGER.info("This was the last client: interrupting cluster updates fetching")
-        data_fetcher_socket.sendall('cluster_stop'.encode())


 def main() -> None:

    # Run the update loop that gets updates from the data fetcher.
-    socket_io.start_background_task(target=_update_loop,
-                                    socket_io=socket_io,
-                                    data_fetcher_socket=data_fetcher_socket)
+    socket_io.start_background_task(
+        target=_update_loop,
+        socket_io=socket_io
+    )

    LOGGER.info("Starting WSGIApp")
    app: socketio.ASGIApp = socketio.WSGIApp(socket_io)
-    eventlet.wsgi.server(eventlet.listen(('0.0.0.0', SOCKET_IO_PORT)),
-                         app)
+    eventlet.wsgi.server(
+        eventlet.listen(
+            ('0.0.0.0', SOCKET_IO_PORT)
+        ),
+        app
+    )


 if __name__ == '__main__':

--- a/deploy.sh
+++ b/deploy.sh
+#!/bin/sh
+
+rsync -rav \
+    --delete \
+    --exclude "*.mypy_cache/" \
+    --exclude "venv/" \
+    --exclude ".git/" \
+    --exclude "*/__pycache__/" \
+    . alya:~/rl_hm_2/
+    # . server:~/rl_hm_2/
--- a/requirements.txt
+++ b/requirements.txt
+eventlet
+python-socketio
 termcolor
+numpy
--- a/resources.md
+++ b/resources.md
+# Ressources
+
+## Deployment
+
+- [GitLab Pages](https://docs.gitlab.com/ee/user/project/pages/)
+
+
+## Flask and websockets
+
+- [Doc Flask](https://flask.palletsprojects.com/en/2.0.x/)
+- [Doc Flask-SocketIO](https://flask-socketio.readthedocs.io/en/latest/index.html)
+- [WebSockets in Python](https://www.fullstackpython.com/websockets.html)
+- [Building apps using Flask-SocketIO and JavaScript Socket.IO](https://medium.com/@abhishekchaudhary_28536/building-apps-using-flask-socketio-and-javascript-socket-io-part-1-ae448768643)
+- [Implementation of WebSocket using Flask Socket IO in Python](https://www.includehelp.com/python/implementation-of-websocket-using-flask-socket-io-in-python.aspx)
+- [Implement a WebSocket Using Flask and Socket-IO(Python)](https://medium.com/swlh/implement-a-websocket-using-flask-and-socket-io-python-76afa5bbeae1)
+- [Easy WebSockets with Flask and Gevent](https://blog.miguelgrinberg.com/post/easy-websockets-with-flask-and-gevent)
+- [Flask-SocketIO, Background Threads , Jquery, Python Demo](https://timmyreilly.azurewebsites.net/flask-socketio-and-more/)
+- [Flask-socketio: Emitting from background thread to the second room blocks the first room](https://bleepcoder.com/flask-socketio/383418577/emitting-from-background-thread-to-the-second-room-blocks)
+
+
+## General Web development
+
+ - [JavaScript basics](https://developer.mozilla.org/en-US/docs/Learn/Getting_started_with_the_web/JavaScript_basics)
+ - [OpenClassrooms HTML/CSS](https://openclassrooms.com/fr/courses/5664271-learn-programming-with-javascript)
+ - [OpenClassrooms JS](https://openclassrooms.com/fr/courses/1603881-apprenez-a-creer-votre-site-web-avec-html5-et-css3/1604361-creez-votre-premiere-page-web-en-html)
--- a/rl_hm/backend/__init__.py
+++ b/rl_hm/backend/__init__.py
--- a/rl_hm/backend/requirements.txt
+++ b/rl_hm/backend/requirements.txt
-eventlet
-python-socketio
-termcolor
--- a/rl_hm/data_fetcher/__init__.py
+++ b/rl_hm/data_fetcher/__init__.py
--- a/rl_hm/data_fetcher/main.py
+++ b/rl_hm/data_fetcher/main.py
-#!/usr/bin/env python3
-
-from multiprocessing import Process, Queue
-import time
-from datetime import datetime
-import logging
-import socket
-import json
-from typing import Callable
-
-from rl_hm._logging import init_logger
-from rl_hm.data_fetcher import workstations, cluster
-
-init_logger()
-
-WS_REFRESH_TIME: int = 5
-CLUSTER_REFRESH_TIME: int = 0
-
-DATETIME_FORMAT: str = '%m/%d/%Y-%H:%M:%S'
-HOST: str = 'localhost'
-PORT: int = 9999
-
-
-def _update_loop(update_fn: Callable,
-                 queue: Queue,
-                 socket_connection: socket.socket,
-                 task_name: str,
-                 refresh_time: int = 0) -> None:
-
-    # Wait starting instruction before running the loop
-    while queue.get() != 'start':
-        pass
-
-    # Tracking
-    init_time: datetime = datetime.now()
-    step_counter: int = 1
-
-    logger: logging.Logger = logging.getLogger('rl_hm.data_fetcher - ' + task_name)
-
-    while True:
-
-        # If no client is connected, stop the loop
-        if not queue.empty():
-            message: str = queue.get()
-            if message == 'stop':
-
-                # Wait for the 'start' instruction to resume
-                while queue.get() != 'start':
-                    pass
-
-        logger.info("--> updating")
-        logger.info("step n°%i", step_counter)
-        logger.info("started: %s", init_time.strftime(DATETIME_FORMAT))
-
-        # If the update fails, just ignore it and try another time.
-        try:
-            update_dict: dict = update_fn()
-
-            update_dict['source'] = task_name
-
-            data: str = json.dumps(update_dict)
-            logger.debug('len(data) = %i', len(data))
-            data_bytes: bytes = data.encode()
-            logger.debug('len(data_bytes) = %i', len(data_bytes))
-
-            socket_connection.sendall(data_bytes)
-
-        except Exception as exception:
-            logger.error("Exception:", exc_info=exception)
-            continue
-
-        # Eventually wait before updating again
-        if refresh_time > 0:
-            logger.info("waiting %is before updating again", refresh_time)
-            time.sleep(refresh_time)
-
-        step_counter += 1
-
-
-def main() -> None:
-
-    logger: logging.Logger = logging.getLogger('rl_hm.data_fetcher')
-
-    # Create a socket (SOCK_STREAM means a TCP socket)
-    sock: socket.socket = socket.socket(family=socket.AF_INET,
-                                        type=socket.SOCK_STREAM)
-    logger.info("Binding TCP socket on port %i", PORT)
-    not_bound: bool = True
-    while not_bound:
-        try:
-            sock.bind((HOST, PORT))
-            not_bound = False
-        except OSError:
-            # Wait before retrying
-            logger.warning("Port seems to be busy. Retrying...")
-            time.sleep(1)
-
-    # Listen for client
-    logger.info("Listening for connections")
-    sock.listen()
-
-    socket_connection: socket.socket
-    addr: tuple[str, int]
-    logger.info("Waiting for the web app to connect to socket...")
-    socket_connection, addr = sock.accept()
-    logger.info("Got connection. Addr: %s, Port: %i", addr[0], addr[1])
-
-    ws_queue: Queue = Queue()
-    cluster_queue: Queue = Queue()
-
-    with socket_connection:
-
-        # Create two processes: workstations and cluster
-        ws_process: Process = Process(target=_update_loop,
-                                      args=(workstations.update,
-                                            ws_queue,
-                                            socket_connection,
-                                            'workstations',
-                                            WS_REFRESH_TIME))
-        cluster_process: Process = Process(target=_update_loop,
-                                           args=(cluster.update,
-                                                 cluster_queue,
-                                                 socket_connection,
-                                                 'cluster',
-                                                 CLUSTER_REFRESH_TIME))
-
-        ws_process.start()
-        cluster_process.start()
-
-        while True:
-            received_message_bytes: bytes = socket_connection.recv(10000)
-            decoded_string: str = received_message_bytes.decode()
-
-            if decoded_string == 'cluster_start':
-                logger.info('Starting cluster updates')
-                cluster_queue.put('start')
-
-            elif decoded_string == 'cluster_stop':
-                logger.info('Pausing cluster updates')
-                cluster_queue.put('stop')
-
-            elif decoded_string == 'ws_start':
-                logger.info('Starting workstations updates')
-                ws_queue.put('start')
-
-            elif decoded_string == 'ws_stop':
-                logger.info('Pausing workstations updates')
-                ws_queue.put('stop')
-
-            else:
-                logger.warning("Unhandled message: %s", decoded_string)
-
-
-if __name__ == '__main__':
-    main()
--- a/rl_hm/data_fetcher/parsing.py
+++ b/rl_hm/data_fetcher/parsing.py
-"""
-TODO
-"""
-# type: ignore
-
-import json
-import xml.etree.ElementTree as ET  # noqa: N817
-from typing import Any
-
-
-def parse_cpu_output(lscpu_output: str,
-                     mpstat_output: str) -> dict[str, Any]:
-    cpu_dict: dict[str, Any] = {}
-
-    lscpu_fields_list: list[dict] = json.loads(lscpu_output)['lscpu']
-
-    for item in lscpu_fields_list:
-        if item['field'] == 'CPU(s):':
-            cpu_dict['num_threads'] = int(item['data'])
-        elif item['field'] == 'Core(s) per socket:':
-            cores_per_socket: int = int(item['data'])
-        elif item['field'] == 'Socket(s):':
-            cpu_dict['num_sockets'] = int(item['data'])
-        elif item['field'] == 'Model name:':
-            cpu_dict['model_name'] = item['data']
-        elif item['field'] == 'CPU MHz:':
-            cpu_dict['current_freq'] = float(item['data'])
-        elif item['field'] == 'CPU max MHz:':
-            cpu_dict['min_freq'] = float(item['data'])
-        elif item['field'] == 'CPU min MHz:':
-            cpu_dict['max_freq'] = float(item['data'])
-
-    cpu_dict['num_cores'] = cpu_dict['num_sockets'] * cores_per_socket
-
-    # print(cpu_dict)
-    mpstat_list: list[dict] = \
-        json.loads(mpstat_output)['sysstat']['hosts'][0]['statistics'][0]['cpu-load']
-
-    cpu_dict['cores'] = {}
-    for cpu_stat in mpstat_list:
-        usage: float = round(cpu_stat['usr'] + cpu_stat['sys'], 2)
-        if cpu_stat['cpu'] == 'all':
-            cpu_dict['global_usage'] = usage
-        else:
-            cpu_dict['cores'][cpu_stat['cpu']] = usage
-
-    return cpu_dict
-
-
-def parse_free_output(free_output: str) -> dict[str, Any]:
-    ram_dict: dict[str, Any] = {}
-
-    # print(free_output)
-    output_lines: list[str] = free_output.splitlines()
-
-    mem_list: list[str] = output_lines[1].split()
-    ram_dict['memory'] = {
-        'total': mem_list[1],
-        'used': mem_list[2],
-        'free': mem_list[3]
-    }
-
-    swap_list: list[str] = output_lines[2].split()
-    ram_dict['swap'] = {
-        'total': swap_list[1],
-        'used': swap_list[2],
-        'free': swap_list[3]
-    }
-
-    return ram_dict
-
-
-def parse_nvidia_output(nvidia_output: str) -> dict[str, Any]:
-    gpu_dict: dict = {}
-
-    # print(nvidia_output)
-    tree_root: ET.Element = ET.fromstring(nvidia_output)
-
-    gpu_dict['driver_version'] = tree_root.find('driver_version').text  # type: ignore
-    gpu_dict['cuda_version'] = tree_root.find('cuda_version').text  # type: ignore
-    gpu_dict['num_gpus'] = int(tree_root.find('attached_gpus').text)  # type: ignore
-    gpu_dict['gpus'] = {}
-
-    for gpu_id, gpu in enumerate(tree_root.findall('gpu')):
-
-        utilization_node: ET.Element = gpu.find('utilization')  # type: ignore
-        memory_node: ET.Element = gpu.find('fb_memory_usage')  # type: ignore
-
-        temperature_node: ET.Element = gpu.find('temperature')  # type: ignore
-
-        gpu_dict['gpus'][gpu_id] = {
-            'model_name': gpu.find('product_name').text,  # type: ignore
-            'memory': {
-                'total': memory_node.find('total').text,  # type: ignore
-                'used': memory_node.find('used').text,  # type: ignore
-                'free': memory_node.find('free').text,  # type: ignore
-                'util': utilization_node.find('memory_util').text.replace(' %', '')  # type: ignore
-            },
-            'fan': gpu.find('fan_speed').text,  # type: ignore
-            'usage': utilization_node.find('gpu_util').text.replace(' %', ''),  # type: ignore
-            'temp': int(temperature_node.find('gpu_temp').text.replace(' C', ''))  # type: ignore
-        }
-
-    assert len(gpu_dict['gpus']) == gpu_dict['num_gpus']
-
-    return gpu_dict
--- a/rl_hm/data_fetcher/requirements.txt
+++ b/rl_hm/data_fetcher/requirements.txt
-matplotlib
-numpy
--- a/rl_hm/data_fetcher/workstations.py
+++ b/rl_hm/data_fetcher/workstations.py
-import csv
-import subprocess
-from datetime import datetime
-from multiprocessing import Pool
-from collections import namedtuple
-from socket import gethostname
-import logging
-
-from . import parsing
-
-USERNAME: str = 'aballou'
-TIMEOUT: int = 20
-
-
-COMMANDS: list[str] = [
-    'lscpu -J',
-    'mpstat -o JSON -P ALL',
-    'free -h',
-    'nvidia-smi -q -x'
-]
-
-Workstation: namedtuple = namedtuple('Workstation',
-                                     ['hostname', 'office', 'user'])
-
-
-def _register_workstations() -> list[Workstation]:
-
-    ws_list: list[Workstation] = []
-
-    with open('workstations.csv', newline='') as csv_workstation_list:
-        csv_reader = csv.reader(csv_workstation_list, delimiter=',', quotechar='|')
-
-        for row in csv_reader:
-
-            hostname, office, user = row
-
-            ws_list.append(Workstation(hostname=hostname,
-                                       office=office,
-                                       user=user))
-
-    return ws_list
-
-
-def _get_update_cmd(hostname: str) -> list[str]:
-
-    update_cmd: list[str] = []
-
-    # No need to ssh on the current workstation.
-    if hostname != gethostname():
-        update_cmd = ['ssh', hostname,
-                      'export', 'PATH=$HOME/.local/bin:$PATH;']
-
-    else:
-        update_cmd = ['bash', '-c']
-
-    update_cmd.append('')
-    for cmd in COMMANDS:
-        update_cmd[-1] += f"echo '>>>{cmd.split()[0]}' && {cmd} && "
-
-    # Remove the unecessary trailing ' && '
-    update_cmd[-1] = update_cmd[-1][:-4]
-
-    return update_cmd
-
-
-def _fetch_state(workstation: Workstation) -> dict:
-
-    state_dict: dict = {
-        'office': workstation.office,
-        'user': workstation.user,
-    }
-
-    timestamp = datetime.fromtimestamp(0)
-
-    logger: logging.Logger = logging.getLogger(__name__ + f".{workstation.hostname}")
-
-    # Run the command
-    try:
-        cmd_output: subprocess.CompletedProcess = subprocess.run(
-            args=_get_update_cmd(hostname=workstation.hostname),
-            check=False,
-            capture_output=True,
-            text=True,
-            timeout=TIMEOUT)
-
-        return_code: int = cmd_output.returncode
-        stdout: str = cmd_output.stdout
-        # stderr: str = cmd_output.stderr
-
-        if return_code > 0:
-            logger.error("Update command failed with return code %i", return_code)
-            # print("\nstdout:\n", stdout)
-            # print("\nstderr:\n", stderr)
-
-        elif return_code == 0:
-            logger.info("Update was succesfull")
-
-            # Split the output
-            output_list: list[str] = stdout.splitlines()
-            output_dict: dict[str, str] = {}
-            command: str = ''
-            for line in output_list:
-                if line.startswith('>>>'):
-                    command = line[3:]
-                    output_dict[command] = ''
-                else:
-                    output_dict[command] += line + '\n'
-
-            # Parse each section's output
-            state_dict['cpu'] = parsing.parse_cpu_output(lscpu_output=output_dict['lscpu'],
-                                                         mpstat_output=output_dict['mpstat'])
-            state_dict['ram'] = parsing.parse_free_output(free_output=output_dict['free'])
-
-            # TODO this code is unreachable (we are in the case where return_code=0)
-            # `nvidia-smi` can eventually fail.
-            # This should not prevent other informations to being parsed.
-            if return_code in [9, 255]:
-                state_dict['gpu'] = {
-                    'error_code': return_code,
-                    'stderr': output_dict['nvidia-smi']
-                }
-            else:
-                state_dict['gpu'] = \
-                    parsing.parse_nvidia_output(nvidia_output=output_dict['nvidia-smi'])
-
-            timestamp = datetime.now()
-
-    except subprocess.TimeoutExpired:
-        logger.warning("Update command for has timed out")
-
-    state_dict['last_updated'] = timestamp.strftime('%m/%d/%Y-%H:%M:%S')
-
-    state_dict = {workstation.hostname: state_dict}
-
-    return state_dict
-
-
-def update() -> dict:
-
-    ws_list: list[Workstation] = _register_workstations()
-
-    with Pool(len(ws_list)) as pool:
-        result_list: list[dict] = pool.map(_fetch_state, ws_list)
-
-    state_dict: dict = {}
-
-    for result in result_list:
-        state_dict.update(result)
-
-    return state_dict
--- a/start.sh
+++ b/start.sh
+#!/bin/sh
+
+export PYTHONPATH=.
+
+python3 cluster_monitor/main.py
--- a/start_backend.sh
+++ b/start_backend.sh
-#!/bin/sh
-
-tmux
-python rl_hm/web_app/app.py &
-
-python -m http.server 8000 --directory rl_hm/web_app/static &
-
-
-trap 'echo signal received!; kill $(jobs -p); wait;' SIGINT SIGTERM
-
-wait
--- a/todo.md
+++ b/todo.md
-# TODO
-
-
- [x] Make cluster and workstations updates mutually asynchronous.
-    - [x] Have two independent concurrent loops for updates
-    - [x] Send two distinct packages
-
-## Web app
-
- [ ] CAS Inria --> Actually, [GitLab pages](https://docs.gitlab.com/ee/user/project/pages/)
-  should be perfect to host the frontend.
- [ ] Display the packet timestamp in the web app
- [ ] Webapp startup script\
-    Tunnel command: `ssh -N -L 9999:localhost:9999 alya`
- [ ] Favicon
-
-### Ressources
- Deployment
-    - [GitLab Pages](https://docs.gitlab.com/ee/user/project/pages/)
- Flask and websockets
-    - [Doc Flask](https://flask.palletsprojects.com/en/2.0.x/)
-    - [Doc Flask-SocketIO](https://flask-socketio.readthedocs.io/en/latest/index.html)
-    - [WebSockets in Python](https://www.fullstackpython.com/websockets.html)
-    - [Building apps using Flask-SocketIO and JavaScript Socket.IO](https://medium.com/@abhishekchaudhary_28536/building-apps-using-flask-socketio-and-javascript-socket-io-part-1-ae448768643)
-    - [Implementation of WebSocket using Flask Socket IO in Python](https://www.includehelp.com/python/implementation-of-websocket-using-flask-socket-io-in-python.aspx)
-    - [Implement a WebSocket Using Flask and Socket-IO(Python)](https://medium.com/swlh/implement-a-websocket-using-flask-and-socket-io-python-76afa5bbeae1)
-    - [Easy WebSockets with Flask and Gevent](https://blog.miguelgrinberg.com/post/easy-websockets-with-flask-and-gevent)
-    - [Flask-SocketIO, Background Threads , Jquery, Python Demo](https://timmyreilly.azurewebsites.net/flask-socketio-and-more/)
-    - [Flask-socketio: Emitting from background thread to the second room blocks the first room](https://bleepcoder.com/flask-socketio/383418577/emitting-from-background-thread-to-the-second-room-blocks)
- General Web development
-    - [JavaScript basics](https://developer.mozilla.org/en-US/docs/Learn/Getting_started_with_the_web/JavaScript_basics)
-    - [OpenClassrooms HTML/CSS](https://openclassrooms.com/fr/courses/5664271-learn-programming-with-javascript)
-    - [OpenClassrooms JS](https://openclassrooms.com/fr/courses/1603881-apprenez-a-creer-votre-site-web-avec-html5-et-css3/1604361-creez-votre-premiere-page-web-en-html)