Fix server time-out issue with the slurm_semiglobal scheduler
Because server jobs were initialized with a RUNNING
state, the launcher was expecting life signals from the server 2 * server_ping_interval
seconds after the launcher started. This ultimately results in a time-out detection even though the server may not even be actually running.
Indeed, nothing guarantees that the server job actually starts before the time-out delay since it depends on the GPU availability on the cluster.
This MR solves this issue by using the WAITING
initial job state instead of RUNNING
. In addition, the server job state is not monitored through the scheduler function _update_jobs_impl
anymore. It's solely monitored via connection and messages handled by the state machine.