FLamby integration
Merge request regarding FLamby integration into Fed-BioMed.
This MR introduces the following changes:
- Flamby library is now installed by default with Fed-BioMed
-
FlambyDataset
allows to leverage flamby datasets in Fed-BioMed - DataLoadingPlan and DataLoadingBlocks infrastructure to accompany a
FlambyDataset
- updated CLI to enable adding of flamby datasets
- one additional notebook and unit tests
- documentation to be reviewed in fedbiomed.gitlabpages.inria.fr!59
Things to look out for:
- lifecycle of
FlambyDataset
and theDataLoadingPlan
are tightly linked. I opted for the solution to overrideset_dlp
in aFlambyDataset
in order to enforce this link, but this means thatset_dlp
for aFlambyDataset
also has the important side effect of initializing the Flamby FedClass. It was the simplest solution I could think of (both in terms of implementation and user-friendliness), but it's not very elegant. - In contrast to my original idea, I abandoned the idea to have a
FlambyTrainingPlan
. I think it's best to ask the researcher to write atraining_data
function, just like they do with every other training plan. All of the flamby logic is contained in theFlambyDataset
class. - Installation of FLamby happens automatically, however no datasets are present. The researcher must manually navigate to the datasets folder of flamby and manually execute a python script to download them. This is a limitation of flamby, that does not allow automatizing the process (presumably because it also involves accepting the license).
- Flamby datasets may have their own dependencies. I did not want to clutter our conda environments with potentially useless packages, so I leave it up to the researcher to install additional dependencies for the specific flamby datasets that they wish to use. It's not very elegant, but at least it doesn't force us to add roughly 10 more packages to our conda envs just because one researcher may use some dataset.