Revise model parameters serialization
MR description
This MR addresses issue #481 (closed) about revising the way model parameters are serialized when exchanged between the nodes and the researcher (in both directions). On the side, it also performs some further polishing of backend code and minimal API additions in the wake of issue #472 (closed).
The core modification is the following:
- Add
fedbiomed.common.serializers.Serializer
and use it to exchange both model parameters and side information through (overloaded-)msgpack files.- Use it to dump and load model parameters that are to be shared.
- Use it to dump and load aggregator parameters that are to be shared.
- Use it to dump and load parameters as part of the Experiment breakpoint system.
With more details:
- Model API:
- Add
Model.set_weights
method, as a counterpart toModel.get_weights
. - Replace
Model.save
withModel.export
, documented to be targetted at exporting models in unsafe formats for local re-use. - Replace
Model.load
withModel.reload
, as a counterpart toModel.export
(note: "import" is a reserved name).
- Add
- TrainingPlan API:
- Revise
BaseTrainingPlan.get_model_params
, to merely interfaceModel.get_weights
. - Add
BaseTrainingPlan.set_model_params
, that interfacesModel.set_weights
. - Add
BaseTrainingPlan.after_training_params
(which pre-existed at children level), designed to optionally extendget_model_params
. - Remove all
load
andsave
methods from training plan children classes. - Add
BaseTrainingPlan.export_model
andBaseTrainingPlan.import_model
, that interfaceModel.export
andModel.reload
.
- Revise
- Job backend:
- Revise
Job.update_parameters
, removing out-of-scope functionalities.- Use
BaseTrainingPlan.get_model_params
andset_model_params
to access or modify current weights. - Use
Serializer.dump
to write down parameter files that are to be shared.
- Use
- Revise
Job.upload_aggregator_args
.- Use
Serializer.dump
to write down parameter files that are to be shared. - No longer use call
update_parameters
to perform serialization and uploads of non-model-parameters data.
- Use
- Revise
- Job and Experiment breakpoint-parsing backend.
- Use
Serializer.dump
andSerializer.load
to dump and load parameters that need saving. - Keep the main JSON breakpoint file, that holds paths to parameters' dump files.
- Stop using training plans to clumsily access supposedly-specific (de)serializers that are all relying on pickle.
- Use
- Round backend:
- Use
Serializer.load
andSerializer.dump
to load and dump model and aggregator parameters. - Use
BaseTrainingPlan.set_model_params
andafter_training_params
to assign and access model parameters. - Have aggregator parameters be all loaded via
Serializer.load
rather than delay their parsing to later code.
- Use
As a side effort, I fixed the maths for Scaffold. The commit that does that was cherry-picked into MR !197 (merged) and may be removed from the current branch if needed.
Regarding unit tests:
- I wrote unit tests with 100% coverage for
Serializer
. - I added unit tests for
Model.set_weights
. - I turned unit tests for the removed
load
/save
methods into tests forModel.export
,Model.reload
,BaseTrainingPlan.export_model
andBaseTrainingPlan.import_model
. - I updated (and sometimes trimmed) existing tests for
Experiment
,Job
andRound
.
I ran the 101 notebook with validation, and also tested the use of Scaffold on the MNIST dataset. Both experiments run, and converge as expected.
Closes #481 (closed)