FLamby integration (!137) · Merge requests · OBSOLETE_Fed-BioMed / OBSOLETE_Fed-BioMed

AYED Samy-Safwan requested to merge poc/flamby into develop Aug 25, 2022

Merge request regarding FLamby integration into Fed-BioMed.

This MR introduces the following changes:

Flamby library is now installed by default with Fed-BioMed
FlambyDataset allows to leverage flamby datasets in Fed-BioMed
DataLoadingPlan and DataLoadingBlocks infrastructure to accompany a FlambyDataset
updated CLI to enable adding of flamby datasets
one additional notebook and unit tests
documentation to be reviewed in fedbiomed.gitlabpages.inria.fr!59

Things to look out for:

lifecycle of FlambyDataset and the DataLoadingPlan are tightly linked. I opted for the solution to override set_dlp in a FlambyDataset in order to enforce this link, but this means that set_dlp for a FlambyDataset also has the important side effect of initializing the Flamby FedClass. It was the simplest solution I could think of (both in terms of implementation and user-friendliness), but it's not very elegant.
In contrast to my original idea, I abandoned the idea to have a FlambyTrainingPlan. I think it's best to ask the researcher to write a training_data function, just like they do with every other training plan. All of the flamby logic is contained in the FlambyDataset class.
Installation of FLamby happens automatically, however no datasets are present. The researcher must manually navigate to the datasets folder of flamby and manually execute a python script to download them. This is a limitation of flamby, that does not allow automatizing the process (presumably because it also involves accepting the license).
Flamby datasets may have their own dependencies. I did not want to clutter our conda environments with potentially useless packages, so I leave it up to the researcher to install additional dependencies for the specific flamby datasets that they wish to use. It's not very elegant, but at least it doesn't force us to add roughly 10 more packages to our conda envs just because one researcher may use some dataset.

Edited Sep 29, 2022 by CREMONESI Francesco

Admin message