Mentions légales du service

Skip to content
Snippets Groups Projects
Verified Commit 5cf654bf authored by ANDREY Paul's avatar ANDREY Paul
Browse files

Add a guide about the Optimizer API to the docs.

parent 3b0ce5fe
No related branches found
No related tags found
No related merge requests found
Pipeline #891224 skipped
...@@ -2,4 +2,5 @@ ...@@ -2,4 +2,5 @@
- [Overview of the Federated Learning process](./fl_process.md) - [Overview of the Federated Learning process](./fl_process.md)
- [Overview of the declearn API](./package.md) - [Overview of the declearn API](./package.md)
- [Hands-on usage](./usage.md) - [Hands-on usage](./usage.md)
- [Guide to the Optimizer API](./optimizer.md)
- [Local Differential Privacy capabilities](./local_dp.md) - [Local Differential Privacy capabilities](./local_dp.md)
...@@ -12,5 +12,7 @@ This guide is structured this way: ...@@ -12,5 +12,7 @@ This guide is structured this way:
Description of the declearn code's structure and of the main APIs. Description of the declearn code's structure and of the main APIs.
- [Hands-on usage](./usage.md):<br/> - [Hands-on usage](./usage.md):<br/>
Guide on how to set up your own federated learning task using declearn. Guide on how to set up your own federated learning task using declearn.
- [Guide to the Optimizer API](./optimizer.md):<br/>
API, design principles and practical how-tos of the declearn Optimizer.
- [Local Differential Privacy capabilities](./local_dp.md):<br/> - [Local Differential Privacy capabilities](./local_dp.md):<br/>
Description of the local-DP features of declearn. Description of the local-DP features of declearn.
# Declearn Optimizer - API, design principles and practical how-tos
This guide provides a comprehensive introduction to the Declearn `Optimizer`,
which is one of the core features of the package. It is aimed at both end-users
and developers, as it tackles both the practical use of our API, its advanced
details (including some limitations), its underlying design principles and the
way to build upon it to implement your own federated optimization algorithms
and applications.
As an alternative or a complement to this guide, one may refer to the
[API reference](../api-reference/optimizer/index.md) of `declearn.optimizer`
for a code-driven and docstrings-based exhaustive view on the abstractions,
concrete classes and utils exposed by DecLearn. Here are links to the abstract
base classes' API reference:
[`Optimizer`](../api-reference/optimizer/Optimizer.md),
[`OptiModule`](../api-reference/optimizer/modules/OptiModule.md),
[`Regularizer`](../api-reference/optimizer/regularizers/Regularizer.md).
## Introduction
In short, `declearn.optimizers.Optimizer` is a class designed to provide with
a single entry-point to define SGD-based optimization algorithms, agnostic to
the machine learning framework used to define the model and its weights' data
structure. It is designed around a plug-in system that enables combining some
unitary algorithm pieces into complex optimizers, that leave the opportunity
for both developers and end-users to write up their own plug-ins. It is also
meant to provide with capabilities that are specific to the federated or even
decentralized learning settings, while remaining compatible with the "basic"
centralized setting.
Although it was designed as part of Declearn and in articulation with the rest
of our APIs, our `Optimizer` may be re-used in other projects, and has notably
been made part of [Fed-BioMed](https://fedbiomed.org/), another Inria-spawned
project that implements a Federated Learning solution specifically targetted at
cross-silo settings for healthcare institutions and applications.
## Overview
### Structure
Declearn provides with a unified entry-point to define SGD-based optimizers:
`declearn.optimizer.Optimizer`.
- It provides with the most basic SGD algorithm pieces: it scales input
gradients into updates based on a learning rate, and optionally adds a
decoupled weight decay term.
- In addition, it enables setting up pipelines of plug-ins, that are applied
sequentially to the input gradients so as to refine them prior to applying
the learning rate scaling and adding the weight decay term.
There are two types of plug-ins to an `Optimizer`:
- `declearn.optimizer.regularizers.Regularizer`:
- A `Regularizer` plug-in implements a loss regularization term.
- It receives gradients and model weights as inputs, and returns the modified
gradients (usually: gradients to which a weight-based term was added).
- Examples include generic regularization terms, such as the Lasso (L1) and
Ridge (L2) ones, but also FL-specific ones, such as FedProx.
- You can list currently-available regularizers (and their names) by calling
`declearn.optimizer.list_optim_regularizers()`.
- In most cases, a `Regularizer` actually computes the derivative of the
desired regularization term based on model weights and/or some internal
state, and adds the results to input gradients.
- `declearn.optimizer.modules.OptiModule`:
- An `OptiModule` plug-in implements a given gradients-altering algorithm.
- It receives gradients as inputs, and returns them after some processing.
- Examples include norm-based gradient clipping, adaptive algorithms such as
RMSProp or Adam, or the FL-specific Scaffold algorithm, that manages and
applies a correction term based on quantities computed federatively during
training.
- You can list currently-available optimodules (and their names) by calling
`declearn.optimizer.list_optim_modules()`.
- Additional mechanisms enable setting up some information sharing between
paired client-side and server-side modules in the federated context.
These can be used to set up FL-specific algorithms such as Scaffold.
### Practical use
The syntax to set up an `Optimizer` instance is:
- `optim = Optimizer(lrate, w_decay, regularizers, modules)`
- By default, `w_decay=0.0` (no weight decay), and `regularizers` and `modules`
are `None`, resulting in a Vanilla SGD optimizer.
- Plug-ins (whether `regularizers` or `modules`) are input as a list, each
element of which specifies a plug-in, either as:
- an instance of the desired plug-in
(e.g. `AdamModule(beta_1=0.9, beta_2=0.99)`)
- a tuple with the name and hyper-parameters of the plug-in
(e.g. `("adam", {"beta_1": 0.9, "beta_2": 0.99})`)
- a string providing only the plug-in's name, resulting in default
hyper-parameter values to be used (e.g. `"adam"`)
An `Optimizer` has a configuration and a state, that may be (de)serialized:
- `optim.get_config` may be used to return a JSON-serializable configuration.
- `Optimizer.from_config` may be used to instantiate an optimizer from its
configuration dict.
- `optim.get_state` may be used to access the current states of an optimizer
(made recursively of those of its plug-ins).
- `optim.set_state` may be used to reset the states of an optimizer to given
values.
- The `declearn.utils.json_load` and `json_dump` utils may be used to save and
load the configuration and state dictionaries to and from JSON files.
### Framework agnosticity
Both the `Optimizer` and its plug-in components operate on the `Model` and
`Vector` APIs of Declearn, enabling their code to be written agnostic to the
machine learning framework (and actual model architecture), so that the (exact)
same algorithms are made available for all supported frameworks.
The only exceptions to that principle are a few framework-specific plug-ins
that provide end-users with adapters to interface framework-specific optimizer
tools, which were designed to facilitate the transition to Declearn as well as
ease efforts to experiment with using less-usual algorithms in a federated
context - in hope that such efforts will eventually result in implementing such
algorithms into "proper" framework-agnostic Declearn plug-ins.
## Design principles
The design choices for the Declearn `Optimizer` were based on the following
objectives and rationale:
1. Provide with framework-agnostic rather than framework-wise implementations
2. Provide with combinable bricks rather than a myriad of optimizer subclasses
3. Enable end-users to write up their own bricks with full tooling support
### Provide with framework-agnostic implementations
The first objective comes from the observation that while each of our supported
machine learning frameworks (notably TensorFlow and Torch) provide with their
own optimization API and implementations for a number of algorithms, using them
directly would yield costs of three distinct natures. First, it would mean that
any algorithm that would not be available out-of-the-box (_e.g._ ones that are
specific to federated learning) would have to be implemented once per supported
framework. Second, it would mean that adding support for a new maching learning
framework would require re-implementing each and every pre-existing algorithm
(or suffer some asymetry of capabilities across frameworks, which goes against
the ambition of Declearn). Finally, it would mean that discrepancies between
framework-specific third-party implementations would be borne in Declearn -
_e.g._ the fact that the Torch and TensorFlow implementations of Adam differ,
or that thay do not abide by the same definition of weight decay, would be
kept, which again goes against our ambition to see the choice of framework as a
mere implementation detail.
### Provide with combinable bricks
The second objective comes from the fact that many research papers on federated
optimization provide with specifications to modify a given point in SGD-based
or existing algorithms, that are in fact open to be combined with other bricks
and/or re-use a common algorithmic backbone. While many frameworks (including
Torch and TensorFlow) choose to implement optimization algorithms as subclasses
of an abstract optimizer (with possible shared backend code), Declearn takes a
reverse approach, where a single `Optimizer` class is provided, the instances
of which are populated with plug-ins that implement unitary algorithmic bricks
and are combinable into complex algorithms. One may for instance use a FedProx
loss regularization term together with some gradient clipping and an adaptive
algorithm that scales the resulting gradients based on some momentum. While
this means that end-users can and may write up non-sensical algorithms, we
decided in favor of bearing this risk rather than limit or complexify the
process through which one may configure their desired optimizer and/or run
some experiments on how some bricks interact with each other.
### Enable end-users to write up their own bricks
The third objective comes from the fact that we do not expect Declearn to ever
cover the entire range of algorithms that end-users may want to use; especially
as it aims to be used for research purposes. It is therefore primordial to us
that end-users may easily write up their own plug-ins and use them, whether in
simulated FL experiments or in real-life deployments, with full support by the
rest of the Declearn machinery. A type-registration system was therefore built
to enable offering the same support for third-party plug-ins as for the ones
that are integrated as part of the main Declearn package. In practice, this
means that new plug-ins can be defined in the context of a specific use-cases
as well as distributed via third-party Declearn add-on packages (in the vein
of what `tensorflow_addons` used to do for TensorFlow). It is also possible to
have our unit tests run for third-party plug-ins (as detailed below in this
guide).
## Practical examples
### SGD-M optimizer
This sets up an optimizer that uses SGD with Momentum, using the default
hyper-parameters of the `MomentumModule`:
```python
import declearn
optim = declearn.optimizer.Optimizer(lrate=0.01, modules=["momentum"])
```
This sets up a similar optimizer, with 0.8 Nesterov Momentum:
```python
import declearn
from declearn.optimizer.modules import MomentumModule
momentum = declearn.optimizer.modules.MomentumModule(beta=0.8, nesterov=True)
optim = declearn.optimizer.Optimizer(lrate=0.01, modules=[momentum])
```
This does exactly the same, with a different, just-as-valid syntax:
```python
import declearn
momentum = ("momentum", {"beta": 0.8, "nesterov": True})
optim = declearn.optimizer.Optimizer(lrate=0.01, modules=[momentum])
```
### Complex optimizer (FedProx & AdamW with gradient clipping)
This sets up an Optimizer with FedProx regularization (to constraint client
drift related to data heterogeneity), L2-norm-based gradient clipping (to avoid
exploding gradients), and an AdamW adaptive algorithm (made of Adam and a
decoupled weight decay term).
In this case, we use a 0.001 learning rate, 0.01 weight decay, a 1.0 clipping
threshold, 0.01 alpha for FedProx (mu in the original paper) and default values
for Adam's hyper-parameters (beta1=0.9, beta2=0.99 and epsilon=1e-7).
```python
import declearn
optim = declearn.optimizer.Optimizer(
lrate=0.001,
w_decay=0.01,
regularizers=[("fedprox", {"alpha": 0.01})]
modules=[
("l2-clipping", {"max_norm": 1.0}),
("adam", {"beta1": 0.9, "beta2": 0.99, "eps": 1e-7}),
],
)
```
Since all plug-ins hyper-parameters specified here are the default ones, we
could be less explicit in the instantiation instructions and go with:
```python
import declearn
optim = declearn.optimizer.Optimizer(
lrate=0.001,
w_decay=0.01,
regularizers=["fedprox"],
modules=["l2-clipping", "adam"],
)
```
Of course, one may be even more explicit by manually instantiating the plug-ins
and passing them to the `Optimizer` constructor:
```python
import declearn
fedprox = declearn.optimizers.regularizers.FedProx(alpha=0.01)
l2_clip = declearn.optimizers.modules.L2Clipping(max_norm=1.0)
adam = declearn.optimizers.modules.AdamModule(beta1=0.9, beta2=0.99, eps=1e-7)
optim = declearn.optimizer.Optimizer(
lrate=0.001,
w_decay=0.01,
regularizers=[fedprox],
modules=[l2_clip, adam],
)
```
### Scaffold
[Scaffold](https://arxiv.org/abs/1910.06378) is a Federated Learning algorithm
that aims at improving over vanilla FedAvg in contexts where data heterogeneity
leads to clients having distinct optimal model solutions, the average of which
does not correspond to the global optimum. It relies on the introduction of
correction terms to the locally-computed gradients, that are computed based on
both a shared global state and client-wise ones.
To implement Scaffold in Declearn, one needs to set up both server-side and
client-side OptiModule plug-ins. The client-side module is in charge of both
correcting input gradients and computing the required quantities to update the
states at the end of each training round, while the server-side module merely
manages the computation and distribution of the global and correction states.
The following snippet sets up a pair of client-side and server-side optimizers
that implement Scaffold, here with a 0.001 learning rate on the client side and
a 0.9 one on the server side, which are both arbitrary values to adjust to your
actual use-case.
```python
import declearn
client_opt = declearn.optimizer.Optimizer(
lrate=0.001,
modules=[("scaffold-client")],
)
server_opt = declearn.optimizer.Optimizer(
lrate=0.9,
modules=[("scaffold-server")],
)
```
In practice, both of these instances (or their configuration) should be wrapped
as part of the `declearn.main.config.FLOptimConfig` that would be provided at
instantiation to the server-side `declearn.main.FederatedServer` instance used
for orchestrating the federated learning process.
For more details on how the paired modules exchange information, see the
section on [auxiliary variables](#optimodule-auxiliary-variables) below.
### Interface a TensorFlow or Torch Optimizer
In general, we strongly advise end-users to make use of the Declearn-specific
optimizer plug-ins, whether provided as part of Declearn or written as part of
your use-case code or a third-party extension library. There might however be
cases where one _really_ wants to use a specific optimizer written using the
framework-specific API of TensorFlow or Torch, _e.g._ because it is a very
specific, experimental and/or complex algorithm that is either too hard or
not mature enough to be worth the effort re-implementing using the Declearn
syntax. In that case, you may use a Declearn-provided interface to wrap that
framework-specific object into an `OptiModule` plug-in instance.
Note that in both cases, the learning rate defined as part of the TensorFlow
or Torch optimizer will be overridden in favor of that defined by the Declearn
optimizer into which it is being plugged.
Here is an example code snippet in Torch:
```python
import declearn
import declearn.model.torch
import torch
plugin = declearn.model.torch.TorchOptiModule(
torch.optim.RAdam, # note that this is a type, not an instance
# you may pass any RAdam instantiation kwargs here
)
optim = declearn.optimizer.Optimizer(
lrate=0.001,
modules=[plugin],
)
```
And here is an example code snippet in TensorFlow:
```python
import declearn
import declearn.model.tensorflow
import tensorflow as tf
tf_opt = tf.optimizers.Nadam()
plugin = declearn.model.tensorflow.TensorFlowOptiModule(tf_opt)
optim = declearn.optimizer.Optimizer(
lrate=0.001,
modules=[plugin],
)
```
## Integration in the Declearn Federated Learning process
In Declearn, the Federated Learning optimization problem is formalized as
follows:
- An orchestrating server is in charge of learning a set of model parameters
$\theta$ using gradient descent, based on data that is distributed among an
ensemble of clients, each of which holds a dataset $\mathcal{D}_i$.
- At each training round:
- (a subset of) the clients receive the current model
weights $\theta^{(t)}$, and perform a number of stochastic gradient
descent steps based on their dataset, that result in $K_i$ local weights
updates:
- $\theta_i^{(t, k + 1)} = \theta^{(t, k)} - client\_opt(\theta_i^{(t, k)}, \nabla\theta_i^{(t, k)}(B_{i, k}))$.
- the server collects the resulting local models $\theta_i^{t + 1}$,
performs an aggregation into a single set of weights, and then conducts
an optimization step, using these aggregated weights as a proxy to the
actual gradients of the model:
- $\hat{\nabla}\theta^{(t)}(\cup_i \mathcal{D}_i) = aggregator(\{\theta_i^{(t + 1)}\}_i)$
- $\theta^{(t+1)} = \theta^{(t)} - server\_opt(\theta^{(t)}, \hat{\nabla}\theta^{(t)}(\cup_i \mathcal{D}_i))$
As such, a federated optimzation comprises three structural components:
- An [`Aggregator`](../api-reference/aggregator/Aggregator.md), that
defines how client-wise model updates are combined into the approximate
gradients of the global model. In vanilla FedAvg, client-wise updates are
averaged, with optional weighting based on the number of steps taken by
the clients.
- A client-side `Optimizer`, that defines the algorithm used to conduct
local stochastic gradient descent steps. In vanilla FedAvg, this is a
vanilla SGD optimizer with a given learning rate.
- A server-side `Optimizer`, that defines how the aggregated weights are
transformed into updates to the global model at the end of the round. In
vanilla FedAvg, the model is replaced with the average of client-wise new
ones; this is equivalent to using a vanilla SGD optimzier with a 1.0
learning rate.
In addition to these components, a number of additional hyper-parameters may
intervene, such as the number of training rounds, the number of training steps
per round (which may be defined in number of steps, number of epochs or even in
training duration), or the clients-selection strategy (something which is yet
to be extended in Declearn).
In Declearn, we use the following configuration-specification tools to set up
the federated optimization algorithm, which is entirely defined by the server
and applied by its clients:
- [`FLOptimConfig`](../api-reference/main/config/FLOptimConfig.md) defines the
aggregator, client-side optimizer and server-side one.
- [`FLRunConfig`](../api-reference/main/config/FLRunConfig.md) defines other
hyper-parameters, such as the number of rounds and their effort constraints.
If you are not using our orchestration classes, bearing in mind the way how we
have structured and formalized federated learning should be helpful in using
properly our `Optimizer` (and optionally `Aggregator`) APIs as part of your
own processing flow.
## Advanced topics
### `OptiModule` auxiliary variables
#### What are auxiliary variables?
In most cases, optimization is merely a function of a model's weights, its
gradients based on some inputs, some hyper-parameters and some local state
variables. In federated learning however, some algorithms require the use of
state variables or hyper-parameters that may change through time co-dependently
between clients. An example is the [Scaffold](https://arxiv.org/abs/1910.06378)
algorithm, that requires each and every client to apply a specific correction
term to their local gradients, that is updated based on both local computations
and a shared state that depends on the aggregation of client-wise statistics.
Declearn introduces the notion of "auxiliary variables" to cover such cases:
- Each and every `OptiModule` subclass may define a pair of routines to emit
and receive such variables, which are structured information of any nature.
This is done by implementing the `collect_aux_var` and `process_aux_var`
API-defined methods.
- When needed, a pair of server-side and client-side modules may be implemented
and made to exchange their auxiliary variables with each other. This is done
by having these paired subclasses share the same (unique) `aux_name` string
class attribute.
- The packaging and distribution of module-wise auxiliary variables is done by
`Optimizer.collect_aux_var` and `process_aux_var`, which orchestrate calls to
the plugged-in modules' methods of the same name.
- The management and compartementalization of client-wise auxiliary variables
information is also automated as part of `declearn.main.FederatedServer`, to
prevent information leakage between clients.
#### OptiModule and Optimizer auxiliary variables API
At the level of any `OptiModule`:
- `OptiModule.collect_aux_var` should output a dict that may either have a
simple `{key: value}` structure (for server-purposed or shared-across-clients
information), or a nested `{client_name: {key: value}}` structure (that is to
be split in order to send distinct information to the clients).
- `OptiModule.process_aux_var` should expect a dict that has the same structure
as that emitted by `collect_aux_var` (of this module class, or of a
counterpart paired one).
At the level of a wrapping `Optimizer`:
- `Optimizer.collect_aux_var` emits a `{module_aux_name: module_emitted_dict}`
dict.
- `Optimizer.process_aux_var` expects a `{module_aux_name: module_emitted_dict}`
dict as well.
As a consequence, you should note that:
- An `Optimizer` should not contain multiple auxiliary-variables-using modules
that have the same `name` or `aux_name`.
- If you are using our `Optimizer` within your own orchestration code (_i.e._
outside of our `FederatedServer` / `FederatedClient` main classes), it is up
to you to handle the restructuration of auxiliary variables to ensure that
(a) each client gets its own information (and not that of others), and that
(b) client-wise auxiliary variables are concatenated properly for the
server-side optimizer to process.
#### Integration to the Declearn FL process
On the server side:
- `Optimizer.collect_aux_var` is called at the start of a training round,
to emit auxiliary variables that should be sent to participating clients
alongside with the current model weights.
- `Optimizer.process_aux_var` is called at the end of a training round, to
process client-emitted information prior to post-processing aggregated
model weights into updates to the global model's weights.
On the client side:
- `Optimizer.process_aux_var` is called at the start of a training round;
to process server-emitted information prior to processing the sequence
of locally-computed gradients in order to update the local model weights.
- `Optimizer.collect_aux_var` is called at the end of a training round, to
emit auxiliary variables that should be known by the server and used to
process the aggregated model weights and/or send back new information at
the start of the next training round.
### The `Regularizer.on_round_start` method
`Regularizer` plug-ins expose an `on_round_start` method, that takes no
argument and is designed to be called at the beginning of each training round.
Depending on the implemented algorithm, is may be used to reset or update some
internal state(s); it is notably used in the FedProx plug-in.
As a developer, you may use this feature in your custom `Regularizer` plug-ins.
As an end-user, you do not need to worry about it when using the Declearn main
orchestration tools. If you write training loops or FL processes on your own,
you should however not omit to call that method.
### Differential Privacy
If you want to use Differential Privacy in Declearn, we strongly advise that
you read [our dedicated guide](./local_dp.md). In short, when DP features are
set up as part of the configuration of a federated learning process conducted
using our main orchestration classes, the adjustment of everything that needs
be is automated, _including_ the setup of a properly-calibrated noise-addition
module as part of the optimizers used.
If you are _not_ using our orchestrating class, _nor_ our DP-specific training
manager (`declearn.main.utils.DPTrainingManager`), then you may want to set up
noise addition as part of your `Optimizer`. This can be done by plugging in a
`NoiseModule` subclass instance (e.g. `GaussianNoiseModule`), which should in
general be placed at the very beginning of your modules pipeline. You are then
in charge of conducting gradient clipping to control the sensitivity of your
inputs, _e.g._ by using the `Model.compute_batch_gradients` `max_norm` kwarg
for sample-level DP, by plugging a `L2Clipping` module prior to the noise
addition one to clip batch-averaged gradients based on their L2 norm, or by
using your own solutions to compute and clip gradients that are input to the
declearn-provided `Optimizer` you use.
### Use in Fed-BioMed
[Fed-BioMed](https://fedbiomed.org/) is another Inria-spawned project that
implements a Federated Learning solution in Python, that specifically targets
cross-silo settings for healthcare institutions and applications.
Starting with version 4.4, Fed-BioMed has been integrating the Declearn
`Optimizer` as an alternative to framework-specific optimization tools.
Moreover, some algorithms specific to Federated Learning, such as FedProx and
Scaffold, are being delegated to Declearn (which takes over custom
implementations that are being deprecated).
This is illustrative of how the Declearn optimization API can benefit other
projects, even without adopting all of the other things Declearn has to offer.
In the case of Fed-BioMed, a custom interface that wraps our Optimizer has been
implemented (with our direct support), enabling to bound our API behind theirs,
and therefore to have both APIs evolve at their own pace in the future.
### How to implement and register your own plug-in
#### Implementing a Regularizer or an OptiModule
To implement a plug-in (whether a `Regularizer` or an `OptiModule`), one should
simply inherit from the abstract base class (or from any intermediate class),
declare the abstract `name` class attribute (with one that is not already in
use) and implement the abstract `run` method, and optionally define, overload
or overcharge any method that needs be.
For pairs of `OptiModule` classes that are designed to be plugged on the client
and server side respectively, and exchange auxiliary variables with each other
between training rounds, one will also need to define the `aux_name` class
attribute, and implement the `process_aux_var` and `collect_aux_var` methods
of both classes so that their specifications match.
#### Type-registration system
For (de)serialization purposes, Declearn maintains so-called type registries,
which are dynamic mappings that link some custom types with some unique names.
This mechanism notably enables specifying optimizer plug-ins as a string and a
dict of kwargs, which are mappable into an instance of the target class - this
can be used when instantiating an `Optimizer`, and is most importantly used to
share the client-side `Optimizer`'s configuration from the orchestrating server
to its clients as part of a Federated Learning process.
Custom, _i.e._ non-declearn-provided plug-in classes should therefore be added
to that registry, so that they can be handled as part of these mechanisms just
as any declearn-provided class would. Luckily, this is automated so that it is
done by default whenever a subclass of `OptiModule` or `Regularizer` is
declared. In other words, **your custom plug-in subtypes will be registered by
default**, under their `name` class attribute (hence the requirement for it to
be unique). However, if for some reason you want to prevent type-registration,
you may do it by adding `register=False` to the inheritance instructions of the
class; e.g. `class MyModule(OptiModule, register=False):`.
Note that for your custom types to be parsable by clients, they will need to
execute the code that declares them as part of the script that sets up and runs
their `declearn.main.FederatedClient`. This can be done by distributing your
types as part of a third-party library (somehow acting as a Declearn add-on)
that is imported as part of clients' script, or by ensuring that their source
code is included as part of these scripts. This may feel complicated, but is in
fact a design choice not to introduce too hastily mechanisms by which clients
would be made to execute arbitrary code or imports based on server demands. As
a result, and for now, we rely on end-users agreeing with each other on what
code will be executed by clients, using processes that are external to Declearn
and expected to match security and trust constraints that are unknown to us and
that ought to differ depending on the application setting.
## Caveats and Open Topics
Here is a non-exhaustive list of things that are not available in Declearn, but
that we are aware of and considering adding in the future. If you want to
contribute ideas, code or new topics to discuss, feel free to let us know,
either by opening a GitHub or GitLab issue, or by sending us an e-mail!
### Learning Rate Scheduling
Learning rate scheduling is a common practice in machine and especially in deep
learning. At the moment, Declearn does not (yet) provide a proper API to do so.
We expect to implement proper scheduling tools in the future.
A work-around that end-users may implement is to declare an `OptiModule`
subclass that keeps track of the number of iterations, and scales input
gradients based on it and on the desired scheduling formula. If that plug-in is
placed last in your pipeline, it will operate right before the learning rate is
applied, and therefore yield the desired adaptation to its value.
### Layer-wise Learning Rate
In some applications, it may make sense to apply distinct learning rates to
subsets of your model's weights. This is notably something that Torch allows
for deep neural networks. At the moment, such an approach is not possible in
Declearn, as the same learning rate (and optimization algorithms) will be
applied to all of the model weights.
A work-around that may be used, but requires tailoring to your application and
model architecture, would be to implement a custom `OptiModule` that operates
on the input gradients' `Vector.coefs` data array values, filtering them by
name. This would neither be elegant nor practical, but would work.
## How-Tos
Here is a non-exhaustive list of things that are available in Declearn, but
that new-comers may not find out about as easily as they wish. If you have
questions, or see things that you think should be listed here, feel free to
let us know either by pushing changes to this documentation's source files,
opening an issue on GitLab or GitHub, or sending us an e-mail!
### Gradient-Clipping
In some cases, you might want to clip your batch-averaged gradients, _e.g._ to
prevent exploding gradients issues. This is possible in Declearn, thanks to a
couple of `OptiModule` subclasses: `L2Clipping` (name: `'l2-clipping'`) clips
arrays of weights based on their L2-norm, while `L2GlobalClipping` (name:
`'l2-global-clipping`) clips all weights based on their global L2-norm (as if
concatenated into a single array).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment