Resolve "Add external processes for controlling and removing Scaleway resources" (!66) · Merge requests · sed-paris / VisioManager

HUYNH Kim-Tam requested to merge 109-health-monitoring into dev Apr 01, 2021

Description

The web application already uses a scheduler to monitor and remove Scaleway resources (in particular rogue servers). The security audit has showed some weaknesses:

if the web application server crashes, there is no mechanism for controlling Scaleway resources or alerting administrators to fix it.
the number of authorized VMs is not limited

Changelog

Closes #109 , Closes #67

Add limitation on the global number of resources that can be used (send email to user)
Add a threshold for sending email alerts to admins when a certain number of resources has been reached
Add backend APIs for monitoring and an external program which uses these APIs
Modify emails with config info (dev, qualif). Production config does not have this info displayed in the email.
Fix email recipients when visio creation fails

How to test

Modify the scaleway config to set your own threshold, create the correct number of visios and verify that the email has been sent (and dev info)
You can also change the alert interval if you want to test it.
Use external program (monitor.sh) to test new backend apis.

Misc

Email alerts interval is set in tools/wsgi.py to every hour since the last alert.

This list is the things to do at deployment side related to this MR:

modify qualif/prod configs with these new parameters (limit, threshold)
add a scheduled job (e.g., twice per day) which executes this external program to check the service availability

Edited Apr 01, 2021 by HUYNH Kim-Tam

Admin message

Resolve "Add external processes for controlling and removing Scaleway resources"

Merge request reports