need more robust handling of secagg training error

When training with:

node1: SECURE_AGGREGATION=True FORCE_SECURE_AGGREGATION=True
node2: SECURE_AGGREGATION=True FORCE_SECURE_AGGREGATION=False
notebooks/notebooks/101_getting-started.ipynb

Then node1 fails with:

2023-04-17 18:16:48,257 fedbiomed INFO - Error message received during training: FB300: undetermined node error - FB314: Node round error: Node requires to apply secure aggregation but Secure aggregation context for the training is not defined.
2023-04-17 18:16:48,259 fedbiomed INFO - Downloading model params after training on node_c14d0b6b-23ea-4b81-aebd-e956e68f4ba4 - from http://localhost:8844/media/uploads/2023/04/17/node_params_5d2a9ef6-81b6-4ee5-b7ce-bd5cebb32505.mpk
2023-04-17 18:16:48,283 fedbiomed DEBUG - download of file node_params_d302f411-57c1-413f-a855-3005b262ddf6.mpk successful, with status code 200
2023-04-17 18:16:48,289 fedbiomed ERROR - FB408: node did not answer during training (node = node_5d16974d-c308-463b-8efe-06cd22703b8e)
2023-04-17 18:16:48,290 fedbiomed CRITICAL - FB408: node did not answer during training

For robustness sake, it would be better to try/except the error raised by Round._configure_secagg here https://gitlab.inria.fr/fedbiomed/fedbiomed/-/blob/develop/fedbiomed/node/round.py#L250 and return a TrainingReply rather than an Error message.

try:
        secagg_arguments = {} if secagg_arguments is None else secagg_arguments
        self._use_secagg = self._configure_secagg(
            secagg_servkey_id=secagg_arguments.get('secagg_servkey_id'),
            secagg_biprime_id=secagg_arguments.get('secagg_biprime_id'),
            secagg_random=secagg_arguments.get('secagg_random')
        )
except FedbiomedRoundError:
  return self._send_round_reply(success=False, message=...)

In some cases, successive such failures with 1 node doing secagg, 1 node not doing secagg (could not reproduce) put the Fed-BioMed instance in inconsistent state (had to clean the message queues for nodes + server to restore coherent state).

Edited Apr 26, 2023 by VESIN Marc

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

need more robust handling of secagg training error