Implement Secure Aggregation
At the moment, declearn only supports clear-text sharing of model weights, evaluation metrics and optimizer auxiliary variables. This issue is about tracking the effort to implement Secure Aggregation tools and mechanisms that would enable recovering aggregated quantities on the server side without revealing client-wise information.
Starting point
To do so, we need to understand and/or design:
- Which SecAgg algorithm(s) we want to cover.
- How to set things up for Secure Aggregation.
- For each targeted algorithm, what needs to be set up?
- Which cryptographic tools and primitives should we use?
- How should we integrate this setup as part of or in addition the existing setup tools?
- How to encrypt quantities that need it.
- How can we make sure not to miss quantities that require protection?
- How should we handle
Vector
structures' encryption? - When should encryption occur as part of our process? (Hy: at serialization/sharing-with-server time)
- How to aggregate and decrypt received quantities.
- When should aggregation / decryption occur?
- How can and should SecAgg interact with existing code structures? (e.g. our
Aggregator
API)
Note that this effort should be carried taking into account our integration in Fed-BioMed, which already has some SecAgg capabilities, currently using the Joye-Libert scheme. An effort to provide Fed-BioMed with the possibility to SecAgg over a declearn-backed implementation of Scaffold is being carried, which may or may not result in some things being implemented directly in declearn.
Current advances
Which SecAgg algorithm(s) we want to cover.
Currently, two algorithms are implemented:
- "Joye-Libert" SecAgg, based on homomorphic summation from this paper.
- "Masking" SecAgg, based on values masking using secret PRNG keys, as proposed in this paper.
How to set things up for Secure Aggregation.
Work is still in progress as to designing a shared configuration and setup design for SecAgg.
What is already done:
- Each SecAgg algorithm has a first implementation of routines enabling its setup between a server and a (subset of) clients.
- The X3DH protocol was implemented and is currently used as a key component to set up and exchange secrets across peers.
- For now, we assume a public key infrastructure, where clients have already generated and exchanged long-lived identity keys, enabling to trust one another in spite of communications being centralized by the server.
How to encrypt quantities that need it.
Thanks to the Aggregate
API introduced in declearn 2.4.0, complex quantities that arise from modular components (whether model updates and their metadata, optimizer auxiliary variables, or evaluation metrics) are wrapped into a container that defines both cleartext and SecAgg aggregation rules.
As a result, the SecAgg encryption can take place after the usual collection of values that are to be shared, at the moment when a high-level client component packages them into a message being sent to the server (or, in the future, to any subset of peer clients). Thanks to that, low-level components do not need to be aware of SecAgg being used nor to access any SecAgg controller or secret information.
How to aggregate and decrypt received quantities.
Again, thanks to the Aggregate
API, shared quantities (including encrypted ones) are now designed to be aggregated in an iterative manner, whether by a centralizing server or by gossip across the chain of clients (paving the way for decentralized learning). Most importantly, they are aggregated prior to being fed to specific controllers that finalize them (Aggregator
to produce global model updates, MetricSet
to compute federated metrics, Optimizer
to update some quantities, e.g. the Scaffold shared state...).
As a result, the SecAgg aggregate-decryption can be made to occur in place of the cleartext aggregation, and components do not need to be aware of SecAgg occurring nor have access to any SecAgg controller. Thanks to the Aggregate.prepare_for_secagg
, SecAgg-incompatible quantities can be marked as so generically, as can specific aggregation rules for values that need to remain in cleartext in parallel to encrypted ones.
Note that this effort should be carried taking into account our integration in Fed-BioMed.
Prior to finalizing the release of DecLearn v2.4.0, an implementation of SecAgg over optimizer auxiliary variables was implemented for Fed-BioMed, powered by the new Aggregate
(and specifically AuxVar
) API. The PR is still open and will need further work as important refactoring is happening within Fed-BioMed, and SecAgg refactoring is also planned on their side, but our new design was proved to be combinable with their SecAgg implementation with limited effort and code that does not require hacking through theirs. This also acted as a playground to further test a SecAgg-specific wrapper structure around encrypted Aggregate
data, which is being implemented in DecLearn (with slightly different API as the one proposed for Fed-BioMed, but shared overall design).