Mentions légales du service

Skip to content

Client termination from finalization message

SCHOULER Marc requested to merge finalize-api into develop

This MR was made in the frame of the SC2023 paper work. Its purpose is to add a termination signal to the melissa_finalize function in the API. This becomes necessary when the number of expected time-steps is not known a priori which can happen when the client termination condition is not based on the simulation time.

For instance with Code-Saturne, if we want each client to simulate 10 periods of Von-Karman vortices, the total number of time-steps may vary from one client to another and cannot be anticipated. In this case, the server can be informed by the API that a certain client is done as soon as it receives a message whose time-step is a negative number (in our case the number of sent time-steps).

This will require the following modifications:

  • adding a message to the server socket inside the melissa_finalize function [API side],
  • checking for the "termination" field name and updating the termination monitoring [DL server side],
  • make sure this does not break the fault tolerance nor the training (e.g. introduces a dead lock risk) [pass CI].

Notes:

  • the main point of this solution is that is stays compatible with the way solvers are instrumented with Melissa,
  • this allows clients to produce different number of time-steps which is fine for DL but does not make sense for SA,
  • this allows users not to set num_samples in the configuration file.

Regarding this last point, an immediate inconvenient is that an incoherent watermark value will only be caught once all clients are done.

Edited by SCHOULER Marc

Merge request reports