Mentions légales du service

Skip to content

Proper server job cancellation for slurm semiglobal

SCHOULER Marc requested to merge improve-semiglobal-scheduler into develop

The recent introduction of the server fault tolerance necessitates a proper management of the server job cancellation with slurm-semiglobal. This MR introduces the formal notion of hybrid scheduler which does the following:

  • submission: all jobs are submitted as direct processes which means that although the server job is submitted through the batch scheduler, the success of its submission won't be monitored by the launcher (no calls to _run_process_asynchronously, _wait_for_process and no creation of a ProcessCompletion_ event)
  • updates: client job updates rely on the status of their associated subprocess while server job updates only rely on PING receptions
  • cancellation: client jobs are cancelled through their associated subprocess while the server job is properly killed via the scheduler. For cancellation, both direct and indirect cancellation steps are applied.

Merge request reports