Experiments at scale
Regardless of the scheduler (OpenMPI
or OAR
), the job_limit
feature happens to crash the launcher in the following conditions:
job_limit=8
sampling_size=300
This MR aims at solving this problem by putting postponed JobSubmission
messages inside a postponed_job_list
instead of into the event queue.