Mentions légales du service

Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • ssilvari/flhd
  • mlorenzi/flhd
2 results
Show changes
Showing
with 22 additions and 10382 deletions
federated_learning/fl-graph.png

74.9 KiB

# Introduction
Standard machine learning approaches require to have a centralizaed dataset in order to train a model. In certain scenarios like in the biomedical field, this is not straightforward due to several reasons like:
* Privacy concerns:
* General Data Protection Regulation (GDPR): [General Data Protection Regulation (GDPR) – Official Legal Text](https://gdpr-info.eu/)
* Californian Consumer Privacy Act (CCPA): [California Consumer Privacy Act (CCPA) | State of California - Department of Justice - Office of the Attorney General](https://oag.ca.gov/privacy/ccpa)
* Ethical committee approval
* Transferring data to a centralized location
This slows down research in healthcare and limits the generalization of certain models.
## Federated Learning
Federated learning (FL) is a machine learning procedure whose goal is to train a model without having data centralized. The goal of FL is to train higher quality models by having access to more data than centralized approaches, as well as to keep data securely decentralized.
### Infrastructure of a federated learning setting in healthcare
A common scenario of federated learning in healthcare is shown as follows:
![](./fl-graph.png)
Hospitals (a.k.a. clients) across several geographical locations hold data of interest for a researcher. These data can be "made available" for local training but, only the model is authorized to be shared with a third thrusted party (e.g. research center). Once all the models are gathered, different techniques are proposed for **aggregating** them as a single global model. Then, the **Aggregated model** can be used as purposed (e.g. training a neural network for segmentation).
### Theoretical background
One of the critical points in FL is knowing how to aggregate the models submitted by the clients. The main problem relies on finding the best set of **parameters** that define your model in function of the submissions made by the clients.
In a canonical form:
$$
\min_w F(w) ,\quad \textrm{where} F(w):=\sum_{k=1}^{m} p_k F_k(w)
$$
Where $m$ is the total number of clients, $p_k>=0$, and $\sum_k p_k=1$ , and $F_k$ is the local objective function for the $k$-th client. The impact (contribution) of each client to the aggregation of the global model is given by $p_k$.
One of the first proposed methodologies in FL for model aggregation was **Federated Averaging `FedAVG`** by (MacMahan _et_ al, 2016), the idea behind it was to define the contribution of each client as $p_k=\frac{n_k}{n}$ where $n_k$ is the number of datapoints in the client $k$ and $n$ is the total number of observations studied.
### Challenges in federated learning
The main challenges in FL are associated to:
- **Communication efficiency:** number of iterations between clients and central location to train an optimal model.
- **Data heterogeneity:** how to build generalized models with heterogeneous data?
- **Security:** adversarial attacks and data leakage.
## Links
[Presentation material](https://ecaad164-c957-4008-a451-5e1098ff8953.filesusr.com/ugd/68a50d_a3d074241b3a4342be2fef2413ee61c7.pdf)
[Colab notebook - part 1](https://colab.research.google.com/drive/1_uemRwNuok1wop6wP2Aiokn0KQgcwfr1?usp=sharing)
[Colab notebook - part 2](https://colab.research.google.com/drive/1PiUee4n8T7pIhDV5zDEqhsK5jXvDYHpO?usp=sharing)
[Colab notebook - part 3](https://colab.research.google.com/drive/1kIbrUtNH_WIPQX5vLyzRjs5CTgKA2CMT?usp=sharing)
[Colab notebook - part 4](https://colab.research.google.com/drive/10wEN9eqdE9Z7CtvhRFgsL3gAzunZGlee?usp=sharing)
---
## References
1. **Konečný, J., McMahan, et al. (2016).** *Federated learning: Strategies for improving communication efficiency*. arXiv preprint arXiv:1610.05492.
2. **Li, T., Sahu, et al. (2018).** *Federated optimization in heterogeneous networks.* arXiv preprint arXiv:1812.06127.
3. **Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020).** *Federated learning: Challenges, methods, and future directions*. IEEE Signal Processing Magazine, 37(3), 50-60.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# Introduction
This lecture aims at covering the statistical background required to perform association analysis in typical studies of heterogeneous information. We will introduce the notion of statistical association, and highlight the standard analysis paradigm in univariate modeling. We will then explore multivariate association models, generalizing to high-dimensional data the notion of statistical association. In particular, we will focus on standard paradigms such as Canonical Correlation Analysis (CCA), Partial Least Squares (PLS), and Reduced Rank Regression (RRR). We will finally introduce more advanced analysis frameworks, such as Bayesian and deep association methods. Within this context we will present the Multi-Channel Variational Autoencoder, recently developed by our group.
## Links:
- [Presentation material](https://marcolorenzi.github.io/material/AI4Health_winter_school_part1.pdf).
- [Colab Notebook](https://colab.research.google.com/drive/1GifcqjQ0OB8JdrnooWZ137nmuxAE4T-z?usp=sharing).
- [The (hitchhiker‘s) guide to Imaging-Genetics](https://marcolorenzi.github.io/material/winter_school/Imaging_Genetics_Book_Chapter.pdf).
This chapter introduces the basics of statistical association models of heterogenous high-dimensional data, with a specific focus to data analysis in imaging-genetics.
File deleted
File deleted
File deleted
File deleted
File deleted
This diff is collapsed.
# Handling heterogeneity in the analysis of biomedical information
## 2021 AI4Health practical session
# Fed-BioMed, an open source framework for federated learning in real world healthcare applications
## 2023 AI4Health practical session
This session focuses on the problem of statistical analysis of heterogeneous data in biomedical studies. Through guided examples, we will first introduce the basics of latent variable modelling for the joint analysis of heterogeneous data types (such as imaging, clinical or biological measurements). We will initially focus on linear approaches, such as partial least squares and canonical correlation analysis. We will then present more flexible methods based on recent advances in deep learning and stochastic variational inference, such as the multi-channel variational autoencoder. We will finally address the problem of deploying latent variable models for federated learning in multi-centric studies, where models must account for data-privacy and heterogeneity across datasets.
This practical session focuses on federated learning (FL) for healthcare applications, and is based on Fed-BioMed, an open source framework for deploying FL in real world use-cases. Throughout the session the participants will get introduced to the basics of federated learning, and will learn to deploy a federated training in a network of clients by using the Fed-BioMed software components. We will focus on the federation of general machine learning approaches for the analysis of medical data (such as tabular or medical images), using a variety of AI frameworks, from Pytorch to scikit-learn. Most advanced topics include the use of privacy-preserving techniques in FL, and the definition of custom data types, models and optimisation routines.
## Material usage
## Program
Herein you will find the material that will be developed during the practical session. Some of the material corresponds to text and images that you can download in the upper right corner <i class="fas fa-download"></i>
The workshop lasts 6 hours, broken into 4 slots of 1.5 hours each.
The program of the workshop is:
- Introduction to FL and its importance in medical research ([slides](/fedbiomed-tutorial/slides))
- Fed-BioMed introduction and MedNIST tutorial ([notebook](/fedbiomed-tutorial/intro-tutorial-mednist))
- Hands-on exercise: detecting heart disease from tabular data ([notebook](fedbiomed-tutorial/tutorial-sklearn-problem))
- Hands-on exercise: segmentation of brain MRI images ([notebook](fedbiomed-tutorial/brain-segmentation-exercise))
## Using Fed-BioMed during the workshop
## Launch my notebooks
We provide a ready-to-use JupyterHub server. Follow the [instructions](/fedbiomed-tutorial/aws-instructions) to find out how to connect.
You can have an environment for yourself by clicking here: [![Binder](https://mybinder.org/badge_logo.svg)](http://bit.ly/3iahdfl)
## Community
Keep up to date and ask support questions through our [mailing list](mailto:fedbiomed-support@inria.fr) and our [user discord channel](https://discord.gg/SWUb7QAS).
**We welcome new contributors!** Check out our [repo](https://github.com/fedbiomed/fedbiomed) if you are interested.
**We are looking for new collaborations!** Share your research ideas through our [mailing list](mailto:fedbiomed-support@inria.fr) or get in touch with [Francesco](mailto:francesco.cremonesi@inria.fr) directly.
sphinxcontrib-bibtex==1.0.0
jupyter-book==0.8.3
\ No newline at end of file
sphinxcontrib-bibtex==2.5.0
myst-parser>=0.17.0,<1.0.0
jupyter-book==0.15.1
pandas==1.1.0
seaborn==0.11.0
matplotlib==3.3.1
scipy==1.5.2
torch==1.7.1
torchvision==0.8.2
tqdm==4.50.2
scikit-learn==0.24.0