OBSOLETE_Fed-BioMed merge requestshttps://gitlab.inria.fr/fedbiomed/fedbiomed/-/merge_requests2023-05-25T09:43:27+02:00https://gitlab.inria.fr/fedbiomed/fedbiomed/-/merge_requests/220Feature/447 Add researcher-side (optional) Optimizer2023-05-25T09:43:27+02:00ANDREY PaulFeature/447 Add researcher-side (optional) OptimizerThis MR adds the possibility to set up an Optimizer on the researcher-side, so as to refine aggregated model updates received from nodes prior to applying them to the global model. See related issue #447.
Adding an `Optimizer` on the re...This MR adds the possibility to set up an Optimizer on the researcher-side, so as to refine aggregated model updates received from nodes prior to applying them to the global model. See related issue #447.
Adding an `Optimizer` on the researcher side enables the following features to be plugged into an Experiment:
- Set up some momentum or an adaptive optimizer, effectively implementing the so-called FedAvgM or FedOpt algorithms.
- Set up a server-side learning rate, some weight decay, or any of the previous algorithms in a modular way. Most notably, the choice of using such refinements is decoupled from the choice of aggregation rule, so that any combination of recipes inspired from the literature may be set up.
- Set up the Scaffold algorithm via the declearn backend:
- Plug a `ScaffoldClient` module into the `Optimizer` used by nodes (defined in the experiment's `TraininPlan`).
- Plug a `ScaffoldServer` module into the `Optimizer` used by the researcher (directly in `Experiment`).
- This improves over the legacy implementation of Scaffold in the following ways:
- It covers both scikit-learn and torch (and would cover tensorflow and jax/haiku if Fed-BioMed supported them).
- It is more generic (notably enabling nodes to run distinct numbers of local optimization steps).
- It is decoupled from the choice of aggregation rule (so one could for example use GradientMaskedAveraging and Scaffold together).
- Its backend mechanisms may be re-used to implement other state-synchronization-based algorithms, such as FedDyn.
- *Warning*: This does not work yet as this MR does not implement the sharing of Optimizer auxiliary variables (see issue #467).
This MR goes for the following implementation:
- Add an optional `researcher_optimizer` instantiation parameter to `Experiment`. By default, leave to None.
- Add the associate getter and setter methods.
- When a researcher optimizer is set, add the associated steps as part of the `Experiment.run_once` method.
- Include the researcher optimizer as part of the `Experiment` breakpoint system.
Closes issue #447Generalize the backend of the training planCANSIZ SergenCANSIZ Sergenhttps://gitlab.inria.fr/fedbiomed/fedbiomed/-/merge_requests/209Feature/513 Fix warnings in `TorchModel.set_weights` and BatchNorm layers' ha...2023-05-04T11:47:19+02:00ANDREY PaulFeature/513 Fix warnings in `TorchModel.set_weights` and BatchNorm layers' handlingThis MR addresses issue #513, which is about `TorchModel.set_weights` generating warnings when the model's state dict comprises tensors or values that are not part of the model's parameters.
The suggested fix is merely to add a secondar...This MR addresses issue #513, which is about `TorchModel.set_weights` generating warnings when the model's state dict comprises tensors or values that are not part of the model's parameters.
The suggested fix is merely to add a secondary filter on these warnings so that it is expected that input weights would not cover these values (and may even exclude non-trainable weights), out of coherence with the `get_weights` counterpart method's outputs.
**Edit**: the initial issue emerged from the presence of BatchNorm layers, the handling of which has its importance in a federated context. As such, and based on exchanges that can be found on the issue's page, this MR was also made to:
- share non-`torch.nn.Parameter` model states at the end of rounds so that they are aggregated
- implement a new training arg (as an optional boolean flag, the default value of which is True) to enable not sharing these values (although in the current state of things that is not recommended - see #529)Generalize the backend of the training planCREMONESI FrancescoCREMONESI Francescohttps://gitlab.inria.fr/fedbiomed/fedbiomed/-/merge_requests/193Revise model parameters serialization2023-04-05T19:20:12+02:00ANDREY PaulRevise model parameters serialization**MR description**
This MR addresses issue #481 about revising the way model parameters are serialized when exchanged between the nodes and the researcher (in both directions). On the side, it also performs some further polishing of bac...**MR description**
This MR addresses issue #481 about revising the way model parameters are serialized when exchanged between the nodes and the researcher (in both directions). On the side, it also performs some further polishing of backend code and minimal API additions in the wake of issue #472.
The core modification is the following:
- Add `fedbiomed.common.serializers.Serializer` and use it to exchange both model parameters and side information through (overloaded-)msgpack files.
- Use it to dump and load model parameters that are to be shared.
- Use it to dump and load aggregator parameters that are to be shared.
- Use it to dump and load parameters as part of the Experiment breakpoint system.
With more details:
- Model API:
- Add `Model.set_weights` method, as a counterpart to `Model.get_weights`.
- Replace `Model.save` with `Model.export`, documented to be targetted at exporting models in unsafe formats for local re-use.
- Replace `Model.load` with `Model.reload`, as a counterpart to `Model.export` (note: "import" is a reserved name).
- TrainingPlan API:
- Revise `BaseTrainingPlan.get_model_params`, to merely interface `Model.get_weights`.
- Add `BaseTrainingPlan.set_model_params`, that interfaces `Model.set_weights`.
- Add `BaseTrainingPlan.after_training_params` (which pre-existed at children level), designed to optionally extend `get_model_params`.
- Remove all `load` and `save` methods from training plan children classes.
- Add `BaseTrainingPlan.export_model` and `BaseTrainingPlan.import_model`, that interface `Model.export` and `Model.reload`.
- Job backend:
- Revise `Job.update_parameters`, removing out-of-scope functionalities.
- Use `BaseTrainingPlan.get_model_params` and `set_model_params` to access or modify current weights.
- Use `Serializer.dump` to write down parameter files that are to be shared.
- Revise `Job.upload_aggregator_args`.
- Use `Serializer.dump` to write down parameter files that are to be shared.
- No longer use call `update_parameters` to perform serialization and uploads of non-model-parameters data.
- Job and Experiment breakpoint-parsing backend.
- Use `Serializer.dump` and `Serializer.load` to dump and load parameters that need saving.
- Keep the main JSON breakpoint file, that holds paths to parameters' dump files.
- Stop using training plans to clumsily access supposedly-specific (de)serializers that are all relying on pickle.
- Round backend:
- Use `Serializer.load` and `Serializer.dump` to load and dump model and aggregator parameters.
- Use `BaseTrainingPlan.set_model_params` and `after_training_params` to assign and access model parameters.
- Have aggregator parameters be all loaded via `Serializer.load` rather than delay their parsing to later code.
As a side effort, I fixed the maths for Scaffold. The commit that does that was cherry-picked into MR !197 and may be removed from the current branch if needed.
Regarding unit tests:
- I wrote unit tests with 100% coverage for `Serializer`.
- I added unit tests for `Model.set_weights`.
- I turned unit tests for the removed `load` / `save` methods into tests for `Model.export`, `Model.reload`, `BaseTrainingPlan.export_model` and `BaseTrainingPlan.import_model`.
- I updated (and sometimes trimmed) existing tests for `Experiment`, `Job` and `Round`.
I ran the 101 notebook with validation, and also tested the use of Scaffold on the MNIST dataset. Both experiments run, and converge as expected.
Closes #481Generalize the backend of the training planBOUILLARD YannickBOUILLARD Yannickhttps://gitlab.inria.fr/fedbiomed/fedbiomed/-/merge_requests/192Polish 'Model' API and backend code.2023-03-08T11:34:59+01:00ANDREY PaulPolish 'Model' API and backend code.This MR is an extension to MR !188, that addressed issue #472 about implementing a new 'Model' abstraction layer and refactoring framework-specific code from the current 'TrainingPlan' classes into it and its framework- or model-specific...This MR is an extension to MR !188, that addressed issue #472 about implementing a new 'Model' abstraction layer and refactoring framework-specific code from the current 'TrainingPlan' classes into it and its framework- or model-specific subclasses.
This MR implements the following changes:
- Disambiguate class attributes from instance attributes in 'Model' and its subclasses.
- Declare instance attributes as part of `__init__` methods and class attributes at the proper scope.
- Use the `ClassVar` type-hint to further highlight that some attributes are defined at class level.
- Add `Model._model_type` private class attribute and use it for init-type type-checking.
- This class attribute enables specifying what classes of models may be wrapped.
- It was used to refactor type-checking code into the shared parent `Model.__init__`.
- Revise 'Model.get_weights' and 'Model.get_gradients' signatures.
- Replace the generic `return_type` attribute with `as_vector: bool = False`.
- This was done in accordance with @scansiz, with issue #481 in mind.
- Revise `BaseSkLearnModel.train` backend and remove superfluous private attributes.
- Revise `TorchModel.init_params`.
- Previously, the signature and actual type of this attribute did not match.
- Perform an overall cleaning of import statements' ordering, type hints and code formatting.
- Rename scikit-learn backend `Models` dict to `SKLEARN_MODELS` out of PEP-8 compliance.Generalize the backend of the training planCANSIZ SergenCANSIZ Sergenhttps://gitlab.inria.fr/fedbiomed/fedbiomed/-/merge_requests/188Feature/472 create model abstraction for declearn integration2023-05-22T15:42:31+02:00BOUILLARD YannickFeature/472 create model abstraction for declearn integration**MR description**
Implements model abstraction for pytorch and scikit learn
Closes #472.
Here a list of points that i found difficult to implement (and that need a closer look).
1. BaseSklearnModel:
- batch_size computation / reset...**MR description**
Implements model abstraction for pytorch and scikit learn
Closes #472.
Here a list of points that i found difficult to implement (and that need a closer look).
1. BaseSklearnModel:
- batch_size computation / reset that is done internally
- sklearn model n_iter attribute increment/decrement (hidden through methods)
- apply_updates method: slightly modified from the one of the poc, that adds gradients instead of changing them, in the same spirit as the `apply_updates` of pytorch. Check if computation is correct
- Toolbox classes, that implement some method using multiple inheritance
2. TorchModel
- method train not implemented: should we compute loss in this method ?
- is get_gradients method correct?
- saving state of the TorchModel: initial parameters are saved in a specific attribute:
- make sure it handles frozen layers
More broadly speaking:
- appropriate raise of exceptions: some exceptions may not be caught
- appropriate naming of variables
- create multiple modules for each classes (ie one class per file) rather than having 3 classes in a file.
**Developer Certificate Of Origin (DCO)**
By opening this merge request, you agree the
[Developer Certificate of Origin (DCO)](https://gitlab.inria.fr/fedbiomed/fedbiomed/-/blob/develop/CONTRIBUTING.md#fed-biomed-developer-certificate-of-origin-dco)
This DCO essentially means that:
- you offer the changes under the same license agreement as the project, and
- you have the right to do that,
- you did not steal somebody else’s work.
**License**
Project code files should begin with these comment lines to help trace their origin:
```
# This file is originally part of Fed-BioMed
# SPDX-License-Identifier: Apache-2.0
```
Code files can be reused from another project with a compatible non-contaminating license.
They shall retain the original license and copyright mentions.
The `CREDIT.md` file and `credit/` directory shall be completed and updated accordingly.
**Guidelines for MR review**
General:
* give a glance to [DoD](https://fedbiomed.gitlabpages.inria.fr/latest/developer/Fed-BioMed_DoD.pdf)
* check [coding rules and coding style](https://fedbiomed.gitlabpages.inria.fr/latest/developer/usage_and_tools/#coding-style)
* check docstrings (eg run `tests/docstrings/check_docstrings`)
Specific to some cases:
* update all conda envs consistently (`development` and `vpn`, Linux and MacOS)
* if modified researcher (eg new attributes in classes) check if breakpoint needs update (`breakpoint`/`load_breakpoint` in `Experiment()`, `save_state`/`load_state` in aggregators, strategies, secagg, etc.)Generalize the backend of the training planCANSIZ SergenCANSIZ Sergenhttps://gitlab.inria.fr/fedbiomed/fedbiomed/-/merge_requests/187Create Fed-BioMed Optimizer class wrapping Declearn Optimizer.2023-04-26T07:41:35+02:00ANDREY PaulCreate Fed-BioMed Optimizer class wrapping Declearn Optimizer.**MR description**
Closes #473
This MR aims at implementing a new `fedbiomed.common.optimizer.Optimizer` class that interfaces the [declearn](https://gitlab.inria.fr/magnet/declearn/declearn2)-provided optimization features for use in...**MR description**
Closes #473
This MR aims at implementing a new `fedbiomed.common.optimizer.Optimizer` class that interfaces the [declearn](https://gitlab.inria.fr/magnet/declearn/declearn2)-provided optimization features for use in Fed-BioMed, as part of the current effort to redesign part of the TrainingPlan code.
Currently this MR:
- Implements the new `Optimizer` class
- Implements a new `FedbiomedOptimizerError` class and a dedicated error code (`FB620`)
- Adds declearn as a third-party dependency in the conda environments
Task list:
- [x] implement `Optimizer`
- [x] implement unit tests
- [ ] discuss potential design changes based on related workGeneralize the backend of the training planANDREY PaulANDREY Paul