Mentions légales du service

Skip to content

FLamby integration

AYED Samy-Safwan requested to merge poc/flamby into develop

Merge request regarding FLamby integration into Fed-BioMed.

This MR introduces the following changes:

  • Flamby library is now installed by default with Fed-BioMed
  • FlambyDataset allows to leverage flamby datasets in Fed-BioMed
  • DataLoadingPlan and DataLoadingBlocks infrastructure to accompany a FlambyDataset
  • updated CLI to enable adding of flamby datasets
  • one additional notebook and unit tests
  • documentation to be reviewed in fedbiomed.gitlabpages.inria.fr!59

Things to look out for:

  • lifecycle of FlambyDataset and the DataLoadingPlan are tightly linked. I opted for the solution to override set_dlp in a FlambyDataset in order to enforce this link, but this means that set_dlp for a FlambyDataset also has the important side effect of initializing the Flamby FedClass. It was the simplest solution I could think of (both in terms of implementation and user-friendliness), but it's not very elegant.
  • In contrast to my original idea, I abandoned the idea to have a FlambyTrainingPlan. I think it's best to ask the researcher to write a training_data function, just like they do with every other training plan. All of the flamby logic is contained in the FlambyDataset class.
  • Installation of FLamby happens automatically, however no datasets are present. The researcher must manually navigate to the datasets folder of flamby and manually execute a python script to download them. This is a limitation of flamby, that does not allow automatizing the process (presumably because it also involves accepting the license).
  • Flamby datasets may have their own dependencies. I did not want to clutter our conda environments with potentially useless packages, so I leave it up to the researcher to install additional dependencies for the specific flamby datasets that they wish to use. It's not very elegant, but at least it doesn't force us to add roughly 10 more packages to our conda envs just because one researcher may use some dataset.
Edited by CREMONESI Francesco

Merge request reports