Resolve "Add external processes for controlling and removing Scaleway resources"
Description
The web application already uses a scheduler to monitor and remove Scaleway resources (in particular rogue servers). The security audit has showed some weaknesses:
- if the web application server crashes, there is no mechanism for controlling Scaleway resources or alerting administrators to fix it.
- the number of authorized VMs is not limited
Changelog
Closes #109 , Closes #67
- Add limitation on the global number of resources that can be used (send email to user)
- Add a threshold for sending email alerts to admins when a certain number of resources has been reached
- Add backend APIs for monitoring and an external program which uses these APIs
- Modify emails with config info (dev, qualif). Production config does not have this info displayed in the email.
- Fix email recipients when visio creation fails
How to test
- Modify the scaleway config to set your own threshold, create the correct number of visios and verify that the email has been sent (and dev info)
- You can also change the alert interval if you want to test it.
- Use external program (monitor.sh) to test new backend apis.
Misc
Email alerts interval is set in tools/wsgi.py
to every hour since the last alert.
This list is the things to do at deployment side related to this MR:
- modify qualif/prod configs with these new parameters (limit, threshold)
- add a scheduled job (e.g., twice per day) which executes this external program to check the service availability
Edited by HUYNH Kim-Tam