FedOpt aggregation strategies
Introduction
The purpose of this issue is to highlight the work in progress regarding the implementation of new FL algorithms / aggregation strategies, consisting of federated versions of adaptive optimizers, including Adagrad
, Adam
, and Yogi
, which we group in the FedOpt
family.
Why use FedOpt ?
Standard federated optimization methods such as Federated Averaging (FedAvg
) are often difficult to tune and exhibit unfavorable convergence behavior.
It is indeed observed that FedAvg
suffers from convergence issues in some settings, impacted by client drift (local client models move away from globally optimal models) and lack of adaptivity. Karimireddy, Sai Praneeth, et al., 2019.
In non-federated settings, adaptive optimization methods have had notable success in combating such issues. Adaptive optimizers can then significantly improve the performance of federated learning. Intuitively, it means to incorporate knowledge of past iterations to perform more informed optimization. J. Reddi, Sashank, et al., 2020
FedOpt
optimization follows this scheme:
(1) Clients (nodes in Fed-BioMed) perform multiple epochs of training using a client optimizer to minimize loss on their local data (as it is done for all other strategies like FedAvg
or SCAFFOLD
).
(2) Server (researcher in Fed-BioMed) updates its global model by applying a gradient-based server optimizer to the average of the clients’ model updates.
Note : As explained in the paper, FedOpt
is specialized to settings where server optimizer is an adaptive optimization method (one of Adagrad
, Yogi
or Adam
) and client optimizer is SGD. By using adaptive methods (which generally require maintaining state) on the server and SGD on the clients, we ensure our methods have the same communication cost as FedAvg
and work in cross-device settings.
Accomplished work
Branch link: https://gitlab.inria.fr/fedbiomed/fedbiomed/-/tree/poc/fedopt
- In fedbiomed/researcher folder:
New file: fedopt.py FedOpt
family of strategies (including FedAdam
, FedYogi
, FedAdagrad
). It comes with strategy, server_lr (server learning rate), tau, beta1, beta2 as initialization parameters.
functional.py FedOpt
strategies (momentum, second moment) and calculate the new updates of the server aggregated model
experiment.py FedOpt
requirements for training round. A new method _calc_delta_aggregated_params
allows us to retrieve the aggregated and averaged updates of local models. Some treatment will be applied on these updates, depending on the chosen strategy. These new updates will be added to the initial server state (thanks to _update_params
new method) to get the final aggregated server model at the end of the round.