FedOpt aggregation strategies

Introduction

The purpose of this issue is to highlight the work in progress regarding the implementation of new FL algorithms / aggregation strategies, consisting of federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, which we group in the FedOpt family.

Why use FedOpt ?

Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. It is indeed observed that FedAvg suffers from convergence issues in some settings, impacted by client drift (local client models move away from globally optimal models) and lack of adaptivity. Karimireddy, Sai Praneeth, et al., 2019.

In non-federated settings, adaptive optimization methods have had notable success in combating such issues. Adaptive optimizers can then significantly improve the performance of federated learning. Intuitively, it means to incorporate knowledge of past iterations to perform more informed optimization. J. Reddi, Sashank, et al., 2020

FedOpt optimization follows this scheme:

(1) Clients (nodes in Fed-BioMed) perform multiple epochs of training using a client optimizer to minimize loss on their local data (as it is done for all other strategies like FedAvg or SCAFFOLD).

(2) Server (researcher in Fed-BioMed) updates its global model by applying a gradient-based server optimizer to the average of the clients’ model updates.

Note : As explained in the paper, FedOpt is specialized to settings where server optimizer is an adaptive optimization method (one of Adagrad, Yogi or Adam) and client optimizer is SGD. By using adaptive methods (which generally require maintaining state) on the server and SGD on the clients, we ensure our methods have the same communication cost as FedAvg and work in cross-device settings.

Accomplished work

Branch link: https://gitlab.inria.fr/fedbiomed/fedbiomed/-/tree/poc/fedopt

In fedbiomed/researcher folder:

New file: fedopt.py ➡ Definition of a class representing the FedOpt family of strategies (including FedAdam, FedYogi, FedAdagrad). It comes with strategy, server_lr (server learning rate), tau, beta1, beta2 as initialization parameters.

functional.py ➡ Implementation of functions used to update parameters of FedOpt strategies (momentum, second moment) and calculate the new updates of the server aggregated model

experiment.py ➡ Implementation of FedOpt requirements for training round. A new method _calc_delta_aggregated_params allows us to retrieve the aggregated and averaged updates of local models. Some treatment will be applied on these updates, depending on the chosen strategy. These new updates will be added to the initial server state (thanks to _update_params new method) to get the final aggregated server model at the end of the round.

Edited Sep 26, 2022 by AYED Samy-Safwan

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

FedOpt aggregation strategies