Write up release notes for 'v2.4.0'.

7715ab91 · ANDREY Paul · e7fae4b5 · 7715ab91 · 7715ab91
Verified Commit 7715ab91 authored 1 year ago by ANDREY Paul
--- a/docs/release-notes/SUMMARY.md
+++ b/docs/release-notes/SUMMARY.md
+- [v2.4.0](v2.4.0.md)
 - [v2.3.2](v2.3.2.md)
 - [v2.3.1](v2.3.1.md)
 - [v2.3.0](v2.3.0.md)

--- a/docs/release-notes/v2.4.0.md
+++ b/docs/release-notes/v2.4.0.md
+# declearn v2.4.0
+
+Released: XX/XX/XXXX
+
+**Important notice**:<br/>
+DecLearn 2.4 derogates to SemVer by revising some of the major DecLearn
+component APIs.
+
+This is mitigated in two ways:
+
+- No changes relate to the main process setup code, meaning that end-users
+  that do not use custom components (aggregator, optimodule, metric, etc.)
+  should not see any difference, and their code will work as before (as an
+  illustration, our examples' code remains unchanged).
+- Key methods that were deprecated in favor of new API ones are kept for
+  two more minor versions, and are still tested to work as before.
+
+Any end-user encountering issues due to the released or planned evolutions of
+DecLearn is invited to contact us via GitLab, GitHub or e-mail so that we can
+provide with assistance and/or update our roadmap so that changes do not hinder
+the usability of DecLearn 2.x for research and applications.
+
+## New version policy (and future roadmap)
+
+As noted above, v2.4 does not fully abide by SemVer rules. In the future, more
+partially-breaking changes and API revisions may be introduced, incrementally
+paving the way towards the next major release, while trying as much as possible
+not to break end-user code.
+
+To avoid unforeseen incompatibilities and cryptic bugs from arsing, from this
+version onwards, the server and clients are expected and verified to use the
+same `major.minor` version of DecLearn.
+This policy may be updated in the future, e.g. to specify that clients may
+have a newer minor version than the server (and most probably not the other
+way around).
+
+To avoid unhappy surprises, we are starting to maintain a public roadmap on
+our GitLab. Although it may change, it should provide interested users (notably
+those that are interested in developing custom components or processes on top
+of DecLearn) with a way to anticipate changes, and voice any concerns or advice
+they might have.
+
+
+## Revise all aggregation APIs
+
+### Revise the overal design for aggregation and introduce `Aggregate` API
+
+This release introduces the `Aggregate` API, which is based on an abstract base
+dataclass acting as a template for data structures that require sharing across
+peers and aggregation.
+
+The `declearn.utils.Aggregate` ABC acts as a shared ancestor providing with
+a base API and shared backend code to define data structures that:
+
+  - are serializable to and deserializable from JSON, and may therefore be
+    preserved across network communications
+  - are aggregatable into an instance of the same structure
+  - use summation as the default aggregation rule for fields, which is
+    overridable by redefining the `default_aggregate` method
+  - can implement custom `aggregate_<field.name>` methods to override the
+    default summation rule
+  - implement a `prepare_for_secagg` method that
+      - enables defining which fields merely require sum-aggregation and need
+        encryption when using SecAgg, and which fields are to be preserved in
+        cleartext (and therefore go through the usual default or custom
+        aggregation methods)
+      - can be made to raise a `NotImplementedError` when SecAgg cannot be
+        achieved on a data structure
+
+This new ABC currently has three main children:
+
+  - `AuxVar`: replaces plain dict for `Optimizer` auxiliary variables
+  - `MetricState`: replaces plain dict for `Metric` intermediate states
+  - `ModelUpdates`: replaces sharing of updates as `Vector` and `n_steps`
+
+Each of this is defined jointly with another (pre-existing, revised) API for
+components that (a) produce `Aggregate` data structures based on some input
+data and/or computations; (b) produce some output results based on a received
+`Aggregate` structure, meant to result from the aggregation of multiple peers'
+produced data.
+
+### Revise `Aggregator` API, introducing `ModelUpdates`
+
+The `Aggregator` API was revised to make use of the new `ModelUpdates` data
+structure (inheriting `Aggregate`).
+
+- `Aggregator.prepare_for_sharing` pre-processes an input `Vector` containing
+  raw model updates and an integer indicating the number of local SGD steps
+  into a `ModelUpdates` structure.
+- `Aggregator.finalize_updates` receives a `ModelUpdates` resulting from the
+  aggregation of peers' instances, and performs final computations to produce
+  a `Vector` of aggregated model updates.
+- The legacy `Aggregator.aggregate` method is deprecated (but still works).
+
+### Revise auxiliary variables for `Optimizer`, introducing `AuxVar`
+
+The `OptiModule` API (and, consequently, `Optimizer`) was revised as to the
+design and signature of auxiliary variables related methods, to make use of
+the new `AuxVar` data structure (inheriting `Aggregate`).
+
+- `OptiModule.collect_aux_var` now emits either `None` or an `AuxVar` instance
+  (the precise type of which is module-dependent), instead of a mere dict.
+- `OptiModule.process_aux_var` now expects a proper-type `AuxVar` instance
+  that _already_ aggregates clients' data, externalizing the aggregation rules
+  to the `AuxVar` subtypes, while keeping the finalization logic part of the
+  `OptiModule` subclasses.
+- `Optimizer.collect_aux_var` therefore emits a `{name: aux_var}` dict.
+- `Optimizer.process_aux_var` therefore expects a `{name: aux_var}` dict,
+  rather than having distinct signatures on the client and server sides.
+- It is now expected that server-side components will send the _same_ data
+  to all clients, rather than allow sending client-wise values.
+
+The backend code of `ScaffoldClientModule` and `ScaffoldServerModule` was
+heavily revised to alter the distribution of information and computations:
+
+- Client-side modules are now the sole owners of their local state, and send
+  sum-aggregatable updates to the server, that are therefore SecAgg-compatible.
+- The server consequently shares the same information with all clients, namely
+  the current global state.
+- To keep track of the (possibly growing with time) number of unique client,
+  clients generate a random uuid that is sent with their state updates and
+  preserved in cleartext when SecAgg is used.
+- As a consequence, the server component knows which clients contributed to a
+  given round, but receives an aggregate of local updates rather than the
+  client-wise state values.
+
+### Revise `Metric` API, introducing `ModelState`
+
+The `Metric` API was revised to make use of the new `MetricState` data
+structure (inheriting `Aggregate`).
+
+- `Metric.build_initial_states` generates a "zero-state" `MetricState` instance
+  (it replaces the previously-private `_build_states` method that returned a
+  dict).
+- `Metric.get_states` returns a (Metric-type-dependent) `MetricState`
+  instance, instead of a mere dict.
+- `Metric.set_states` assigns an incoming `MetricState` into the instance, that
+  may be finalized into results using the unchanged `get_result` method.
+- The legacy `Metric.agg_states` is deprecated, in favor of `set_states` (but
+  it still works).
+
+
+## Revise backend communications and messaging APIs
+
+This release introduces some important backend changes to the communication
+and messaging APIs of DecLearn, resulting in more robust code (that is also
+easier to test and maintain), more efficient message parsing (possibly-costly
+de-serialization is now delayed to a time posterior to validity verification)
+and the extensibility of application messages, enabling to easily define and
+use custom message structures in downstream applications.
+
+The most important API change is that network communication endpoints now
+return `SerializedMessage` instances rather than `Message` ones.
+
+### New `declearn.communication.api.backend` submodule
+
+- Introduce a new `ActionMessage` minimal API under its `actions`
+  submodule, that defines hard-coded, lightweight and easy-to-parse
+  data structures designed to convey information and content across
+  network communications agnostic to the content's nature.
+- Revise and expose the `MessagesHandler` util, that now builds on
+  the `ActionMessage` API to model remote calls and answer them.
+- Move the `declearn.communication.messaging.flags` submodule to
+  `declearn.communication.api.backend.flags`.
+
+### New `declearn.messaging` submodule
+
+- Revise the `Message` API to make it extendable, with automated
+  type-registration of subclasses by default.
+- Introduce `SerializedMessage` as a wrapper for received messages,
+  that parses the exact `Lessage` subtype (enabling logic tests and
+  message filtering) but delays actual content de-serialization and
+  `Message` object recovery (enabling to prevent undue resources use
+  for unwanted messages that end up being discarded).
+- Move most existing `Message` subclasses to the new submodule, for
+  retro-compatibility purposes. In DecLearn 3.0 these will probably
+  be re-dispatched to make it clear that concrete messages only make
+  sense in the context of specific multi-agent processes.
+- Drop backend-oriented `Message` subclasses that are replaced with
+  the new `ActionMessage` backbone structures.
+- Deprecate the `declearn.communication.messaging` submodule, that
+  is temporarily maintained, re-exporting moved contents as well as
+  deprecated message types (which are bound to be rejected if sent).
+
+### Revise `NetworkClient` and `NetworkServer`
+
+- Have message-receiving methods return `SerializedMessage` instances
+  rather than finalized de-serialized `Message` ones.
+- Quit sending and expecting 'data_info' with registration requests.
+- Rename `NetworkClient.check_message` into `recv_message` (keep
+  the former as an alias, with a `DeprecationWarning`).
+- Improve the use of (optional) timeouts when sending or expecting
+  messages and overall exceptions handling:
+  - `NetworkClient.recv_message` may either raise a `TimeoutError`
+    (in case of timeout) or `RuntimeError` (in case of rejection).
+  - `NeworkServer.send_messages` and `broadcast_message` quietly
+    stop waiting for clients to collect messages after the (opt.)
+    timeout delay has passed. Messages may still be collected.
+  - `NetworkServer.wait_for_messages` no longer accepts a timeout.
+  - `NetworkServer.wait_for_messages_with_timeout` implements the
+    possibility to setup a timeout. It returns both received client
+    replies and a list of clients that failed to answer.
+  - All timeouts can now be specified as float values (which is
+    mostly useful for testing purposes or simulated environments).
+- Add a `heartbeat` instantiation parameter, with a default value of
+  1 second, that is passed to the underlying `MessagesHandler`. In
+  simulated contexts (including tests), setting a low heartbeat can
+  cut runtime down significantly.
+
+
+## Usability updates
+
+A few minor changes were made in hope that they can improve DecLearn usability
+for end-users.
+
+### Record and save training losses
+
+The `Model` API was updated so that `Model.compute_batch_gradients` now records
+the computed batch-averaged model loss as a float value in an internal buffer,
+and the newly-introduced `Model.collect_training_losses` method enables getting
+all stored values (and purging the buffer on the way).
+
+The `FederatedClient` was consequently updated to collect and export training
+loss values at the end of each and every training round when a `Checkpointer`
+is attached to it (otherwise, values are purged from memory but not recorded to
+disk).
+
+### Add a verbosity option for `FederatedClient` and `TrainingManager`.
+
+`FederatedClient` and `TrainingManager` now both accept a `verbose: bool=True`
+instantiatation keyword argument, that changes:
+
+- (a) the default logger verbosity level: if `logger=None` is also passed,
+  the default logger will have 'info'-level verbosity if `verbose` and
+  `declearn.utils.LOGGING_LEVEL_MAJOR`-level if `not verbose`, so that only
+  evaluation metrics and errors are logged.
+- (b) the optional display of a progressbar when conducting training or
+  evaluation rounds; if `not verbose`, no progressbar is used.
+
+### Add 'TrainingManager.(train|evaluate)_under_constraints' to the API.
+
+These public methods enable running training or evaluation rounds without
+relying on `Message` structures to specify parameters nor collect results.
+
+### Modularize 'run_as_processes' input specs.
+
+The `declearn.utils.run_as_processes` util was modularized to that routines
+can be specified in various ways. Previously, they could only be passed as
+`(func, args)` tuples. Now, they can either be passed as `(func, args)`,
+`(func, kwargs)` or `(func, args, kwargs)`, where `args` is still a tuple
+of positional arguments, and `kwargs` is a dict of keyword ones.