Mentions légales du service

Skip to content

Revise model parameters serialization

MR description

This MR addresses issue #481 (closed) about revising the way model parameters are serialized when exchanged between the nodes and the researcher (in both directions). On the side, it also performs some further polishing of backend code and minimal API additions in the wake of issue #472 (closed).

The core modification is the following:

  • Add fedbiomed.common.serializers.Serializer and use it to exchange both model parameters and side information through (overloaded-)msgpack files.
    • Use it to dump and load model parameters that are to be shared.
    • Use it to dump and load aggregator parameters that are to be shared.
    • Use it to dump and load parameters as part of the Experiment breakpoint system.

With more details:

  • Model API:
    • Add Model.set_weights method, as a counterpart to Model.get_weights.
    • Replace with Model.export, documented to be targetted at exporting models in unsafe formats for local re-use.
    • Replace Model.load with Model.reload, as a counterpart to Model.export (note: "import" is a reserved name).
  • TrainingPlan API:
    • Revise BaseTrainingPlan.get_model_params, to merely interface Model.get_weights.
    • Add BaseTrainingPlan.set_model_params, that interfaces Model.set_weights.
    • Add BaseTrainingPlan.after_training_params (which pre-existed at children level), designed to optionally extend get_model_params.
    • Remove all load and save methods from training plan children classes.
    • Add BaseTrainingPlan.export_model and BaseTrainingPlan.import_model, that interface Model.export and Model.reload.
  • Job backend:
    • Revise Job.update_parameters, removing out-of-scope functionalities.
      • Use BaseTrainingPlan.get_model_params and set_model_params to access or modify current weights.
      • Use Serializer.dump to write down parameter files that are to be shared.
    • Revise Job.upload_aggregator_args.
      • Use Serializer.dump to write down parameter files that are to be shared.
      • No longer use call update_parameters to perform serialization and uploads of non-model-parameters data.
  • Job and Experiment breakpoint-parsing backend.
    • Use Serializer.dump and Serializer.load to dump and load parameters that need saving.
    • Keep the main JSON breakpoint file, that holds paths to parameters' dump files.
    • Stop using training plans to clumsily access supposedly-specific (de)serializers that are all relying on pickle.
  • Round backend:
    • Use Serializer.load and Serializer.dump to load and dump model and aggregator parameters.
    • Use BaseTrainingPlan.set_model_params and after_training_params to assign and access model parameters.
    • Have aggregator parameters be all loaded via Serializer.load rather than delay their parsing to later code.

As a side effort, I fixed the maths for Scaffold. The commit that does that was cherry-picked into MR !197 (merged) and may be removed from the current branch if needed.

Regarding unit tests:

  • I wrote unit tests with 100% coverage for Serializer.
  • I added unit tests for Model.set_weights.
  • I turned unit tests for the removed load / save methods into tests for Model.export, Model.reload, BaseTrainingPlan.export_model and BaseTrainingPlan.import_model.
  • I updated (and sometimes trimmed) existing tests for Experiment, Job and Round.

I ran the 101 notebook with validation, and also tested the use of Scaffold on the MNIST dataset. Both experiments run, and converge as expected.

Closes #481 (closed)

Edited by ANDREY Paul

Merge request reports