Mentions légales du service

Skip to content

Resolve "Add external processes for controlling and removing Scaleway resources"

HUYNH Kim-Tam requested to merge 109-health-monitoring into dev

Description

The web application already uses a scheduler to monitor and remove Scaleway resources (in particular rogue servers). The security audit has showed some weaknesses:

  • if the web application server crashes, there is no mechanism for controlling Scaleway resources or alerting administrators to fix it.
  • the number of authorized VMs is not limited

Changelog

Closes #109 , Closes #67

  • Add limitation on the global number of resources that can be used (send email to user)
  • Add a threshold for sending email alerts to admins when a certain number of resources has been reached
  • Add backend APIs for monitoring and an external program which uses these APIs
  • Modify emails with config info (dev, qualif). Production config does not have this info displayed in the email.
  • Fix email recipients when visio creation fails

How to test

  • Modify the scaleway config to set your own threshold, create the correct number of visios and verify that the email has been sent (and dev info)
  • You can also change the alert interval if you want to test it.
  • Use external program (monitor.sh) to test new backend apis.

Misc

Email alerts interval is set in tools/wsgi.py to every hour since the last alert.

This list is the things to do at deployment side related to this MR:

  • modify qualif/prod configs with these new parameters (limit, threshold)
  • add a scheduled job (e.g., twice per day) which executes this external program to check the service availability
Edited by HUYNH Kim-Tam

Merge request reports