Mentions légales du service

Skip to content
Snippets Groups Projects
Verified Commit 82174e1e authored by ANDREY Paul's avatar ANDREY Paul
Browse files

Update mnist-quickrun example and add a readme file.

parent 30549fa9
No related branches found
No related tags found
1 merge request!41Quickrun mode
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This notebook is meant to be run in google colab. You can find import your local copy of the file in the the [colab welcome page](https://colab.research.google.com/). This notebook is meant to be run in google colab. You can find import your local copy of the file in the the [colab welcome page](https://colab.research.google.com/).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Setting up your declearn # Setting up your declearn
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We first clone the repo, to have both the package itself and the `examples` folder we will use in this tutorial, then naviguate to the package directory, and finally install the required dependencies We first clone the repo, to have both the package itself and the `examples` folder we will use in this tutorial, then naviguate to the package directory, and finally install the required dependencies
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
!git clone -b experimental https://gitlab.inria.fr/magnet/declearn/declearn2 # you may want to specify a release branch or tag
!git clone https://gitlab.inria.fr/magnet/declearn/declearn2
``` ```
%% Output
Cloning into 'declearn2'...
warning: redirecting to https://gitlab.inria.fr/magnet/declearn/declearn2.git/
remote: Enumerating objects: 4997, done.
remote: Counting objects: 100% (79/79), done.
remote: Compressing objects: 100% (79/79), done.
remote: Total 4997 (delta 39), reused 0 (delta 0), pack-reused 4918
Receiving objects: 100% (4997/4997), 1.15 MiB | 777.00 KiB/s, done.
Resolving deltas: 100% (3248/3248), done.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
cd declearn2 cd declearn2
``` ```
%% Output
/content/declearn2
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
!pip install .[websockets] # Install the package, with TensorFlow and Websockets extra dependencies.
# You may want to work in a dedicated virtual environment.
!pip install .[tensorflow,websockets]
``` ```
%% Output
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Processing /content/declearn2
Installing build dependencies ... [?25l[?25hdone
Getting requirements to build wheel ... [?25l[?25hdone
Installing backend dependencies ... [?25l[?25hdone
Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Requirement already satisfied: cryptography>=35.0 in /usr/local/lib/python3.9/dist-packages (from declearn==2.1.0) (40.0.1)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.9/dist-packages (from declearn==2.1.0) (1.2.2)
Requirement already satisfied: requests~=2.18 in /usr/local/lib/python3.9/dist-packages (from declearn==2.1.0) (2.27.1)
Requirement already satisfied: pandas>=1.2 in /usr/local/lib/python3.9/dist-packages (from declearn==2.1.0) (1.5.3)
Requirement already satisfied: tomli>=2.0 in /usr/local/lib/python3.9/dist-packages (from declearn==2.1.0) (2.0.1)
Collecting fire>=0.4
Downloading fire-0.5.0.tar.gz (88 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.3/88.3 kB 1.6 MB/s eta 0:00:00
[?25h Preparing metadata (setup.py) ... [?25l[?25hdone
Requirement already satisfied: typing-extensions>=4.0 in /usr/local/lib/python3.9/dist-packages (from declearn==2.1.0) (4.5.0)
Collecting websockets~=10.1
Downloading websockets-10.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 106.5/106.5 kB 3.5 MB/s eta 0:00:00
[?25hRequirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.9/dist-packages (from cryptography>=35.0->declearn==2.1.0) (1.15.1)
Requirement already satisfied: six in /usr/local/lib/python3.9/dist-packages (from fire>=0.4->declearn==2.1.0) (1.16.0)
Requirement already satisfied: termcolor in /usr/local/lib/python3.9/dist-packages (from fire>=0.4->declearn==2.1.0) (2.2.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=1.2->declearn==2.1.0) (2022.7.1)
Requirement already satisfied: numpy>=1.20.3 in /usr/local/lib/python3.9/dist-packages (from pandas>=1.2->declearn==2.1.0) (1.22.4)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=1.2->declearn==2.1.0) (2.8.2)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests~=2.18->declearn==2.1.0) (2.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests~=2.18->declearn==2.1.0) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests~=2.18->declearn==2.1.0) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests~=2.18->declearn==2.1.0) (3.4)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.9/dist-packages (from scikit-learn>=1.0->declearn==2.1.0) (1.10.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.9/dist-packages (from scikit-learn>=1.0->declearn==2.1.0) (3.1.0)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.9/dist-packages (from scikit-learn>=1.0->declearn==2.1.0) (1.2.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.9/dist-packages (from cffi>=1.12->cryptography>=35.0->declearn==2.1.0) (2.21)
Building wheels for collected packages: fire, declearn
Building wheel for fire (setup.py) ... [?25l[?25hdone
Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116952 sha256=ab01943c400d3267450974ec56a6572193bed40710845edd44623e56c7757799
Stored in directory: /root/.cache/pip/wheels/f7/f1/89/b9ea2bf8f80ec027a88fef1d354b3816b4d3d29530988972f6
Building wheel for declearn (pyproject.toml) ... [?25l[?25hdone
Created wheel for declearn: filename=declearn-2.1.0-py3-none-any.whl size=276123 sha256=4969a91ded8b704c8c9497bcda8f514f847c49098715d659cc8e96a947ec887f
Stored in directory: /tmp/pip-ephem-wheel-cache-fgkx9jiw/wheels/cc/79/79/6586306a117d40a1f8b251a22e50583b8abb2d7e855a62ecf7
Successfully built fire declearn
Installing collected packages: websockets, fire, declearn
Successfully installed declearn-2.1.0 fire-0.5.0 websockets-10.4
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Running your first experiment # Running your first experiment
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We are going to train a common model between three simulated clients on the classic [MNIST dataset](http://yann.lecun.com/exdb/mnist/). The input of the model is a set of images of handwritten digits, and the model needs to determine which number between 0 and 9 each image corresponds to. We are going to train a common model between three simulated clients on the classic [MNIST dataset](http://yann.lecun.com/exdb/mnist/). The input of the model is a set of images of handwritten digits, and the model needs to determine which number between 0 and 9 each image corresponds to.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## The model ## The model
To do this, we will use a simple CNN, defined in `examples/mnist_quickrun/model.py` To do this, we will use a simple CNN, defined in `examples/mnist_quickrun/model.py`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
from examples.mnist_quickrun.model import model from examples.mnist_quickrun.model import network
model.summary() network.summary()
``` ```
%% Output %% Output
Model: "sequential" Model: "sequential"
_________________________________________________________________ _________________________________________________________________
Layer (type) Output Shape Param # Layer (type) Output Shape Param #
================================================================= =================================================================
conv2d (Conv2D) (None, 26, 26, 8) 80 conv2d (Conv2D) (None, 26, 26, 8) 80
max_pooling2d (MaxPooling2D (None, 13, 13, 8) 0 max_pooling2d (MaxPooling2D (None, 13, 13, 8) 0
) )
dropout (Dropout) (None, 13, 13, 8) 0 dropout (Dropout) (None, 13, 13, 8) 0
flatten (Flatten) (None, 1352) 0 flatten (Flatten) (None, 1352) 0
dense (Dense) (None, 64) 86592 dense (Dense) (None, 64) 86592
dropout_1 (Dropout) (None, 64) 0 dropout_1 (Dropout) (None, 64) 0
dense_1 (Dense) (None, 10) 650 dense_1 (Dense) (None, 10) 650
================================================================= =================================================================
Total params: 87,322 Total params: 87,322
Trainable params: 87,322 Trainable params: 87,322
Non-trainable params: 0 Non-trainable params: 0
_________________________________________________________________ _________________________________________________________________
/content/declearn2/declearn/model/tensorflow/utils/_gpu.py:66: UserWarning: Cannot use a GPU device: either CUDA is unavailable or no GPU is visible to tensorflow.
warnings.warn(
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## The data ## The data
We start by splitting the MNIST dataset between 3 clients and storing the output in the `examples/mnist_quickrun` folder. For this we use an experimental utility provided by `declearn`. We start by splitting the MNIST dataset between 3 clients and storing the output in the `examples/mnist_quickrun` folder. For this we use an experimental utility provided by `declearn`.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
from declearn.dataset import split_data from declearn.dataset import split_data
split_data(folder="examples/mnist_quickrun") split_data(folder="examples/mnist_quickrun")
``` ```
%% Output %% Output
Downloading MNIST source file train-images-idx3-ubyte.gz. Downloading MNIST source file train-images-idx3-ubyte.gz.
Downloading MNIST source file train-labels-idx1-ubyte.gz. Downloading MNIST source file train-labels-idx1-ubyte.gz.
Splitting data into 3 shards using the 'iid' scheme. Splitting data into 3 shards using the 'iid' scheme.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The python code above is equivalent to running `declearn-split examples/mnist_quickrun/` in a shell command-line.
%% Cell type:markdown id: tags:
Here is what the first image of the first client looks like: Here is what the first image of the first client looks like:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
images = np.load("examples/mnist_quickrun/data_iid/client_0/train_data.npy") images = np.load("examples/mnist_quickrun/data_iid/client_0/train_data.npy")
sample_img = images[0] sample_img = images[0]
sample_fig = plt.imshow(sample_img,cmap='Greys') sample_fig = plt.imshow(sample_img,cmap='Greys')
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
For more information on how the `split_data` function works, you can look at the documentation. For more information on how the `split_data` function works, you can look at the documentation.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
print(split_data.__doc__) print(split_data.__doc__)
``` ```
%% Output %% Output
Randomly split a dataset into shards. Randomly split a dataset into shards.
The resulting folder structure is : The resulting folder structure is :
folder/ folder/
└─── data*/ └─── data*/
└─── client*/ └─── client*/
│ train_data.* - training data │ train_data.* - training data
│ train_target.* - training labels │ train_target.* - training labels
│ valid_data.* - validation data │ valid_data.* - validation data
│ valid_target.* - validation labels │ valid_target.* - validation labels
└─── client*/ └─── client*/
│ ... │ ...
Parameters Parameters
---------- ----------
folder: str, default = "." folder: str, default = "."
Path to the folder where to add a data folder Path to the folder where to add a data folder
holding output shard-wise files holding output shard-wise files
data_file: str or None, default=None data_file: str or None, default=None
Optional path to a folder where to find the data. Optional path to a folder where to find the data.
If None, default to the MNIST example. If None, default to the MNIST example.
target_file: str or int or None, default=None target_file: str or int or None, default=None
If str, path to the labels file to import. If int, column of If str, path to the labels file to import, or name of a `data`
the data file to be used as labels. Required if data is not None, column to use as labels (only if `data` points to a csv file).
ignored if data is None. If int, index of a `data` column of to use as labels).
Required if data is not None, ignored if data is None.
n_shards: int n_shards: int
Number of shards between which to split the data. Number of shards between which to split the data.
scheme: {"iid", "labels", "biased"}, default="iid" scheme: {"iid", "labels", "biased"}, default="iid"
Splitting scheme(s) to use. In all cases, shards contain mutually- Splitting scheme(s) to use. In all cases, shards contain mutually-
exclusive samples and cover the full raw training data. exclusive samples and cover the full raw training data.
- If "iid", split the dataset through iid random sampling. - If "iid", split the dataset through iid random sampling.
- If "labels", split into shards that hold all samples associated - If "labels", split into shards that hold all samples associated
with mutually-exclusive target classes. with mutually-exclusive target classes.
- If "biased", split the dataset through random sampling according - If "biased", split the dataset through random sampling according
to a shard-specific random labels distribution. to a shard-specific random labels distribution.
perc_train: float, default= 0.8 perc_train: float, default= 0.8
Train/validation split in each client dataset, must be in the Train/validation split in each client dataset, must be in the
]0,1] range. ]0,1] range.
seed: int or None, default=None seed: int or None, default=None
Optional seed to the RNG used for all sampling operations. Optional seed to the RNG used for all sampling operations.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Quickrun ## Quickrun
We can now run our experiment. As explained in the section 2.1 of the [quickstart documentation](https://magnet.gitlabpages.inria.fr/declearn/docs/2.1/quickstart), using the mode `declearn-quickrun` requires a configuration file, some data, and a model: We can now run our experiment. As explained in the section 2.1 of the [quickstart documentation](https://magnet.gitlabpages.inria.fr/declearn/docs/latest/quickstart), using the `declearn-quickrun` entry-point requires a configuration file, some data, and a model:
* A TOML file, to store your experiment configurations. Here: * A TOML file, to store your experiment configurations. Here:
`examples/mnist_quickrun/config.toml`. `examples/mnist_quickrun/config.toml`.
* A folder with your data, split by client. Here: `examples/mnist_quickrun/data_iid` * A folder with your data, split by client. Here: `examples/mnist_quickrun/data_iid`
* A model file, to store your model wrapped in a `declearn` object. Here: `examples/mnist_quickrun/model.py`. * A model python file, to declare your model wrapped in a `declearn` object. Here: `examples/mnist_quickrun/model.py`.
We then only have to run the `quickrun` util with the path to the TOML file: We then only have to run the `quickrun` util with the path to the TOML file:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
from declearn.quickrun import quickrun from declearn.quickrun import quickrun
quickrun(config="examples/mnist_quickrun/config.toml") quickrun(config="examples/mnist_quickrun/config.toml")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The python code above is equivalent to running `declearn-quickrun examples/mnist_quickrun/config.toml` in a shell command-line.
%% Cell type:markdown id: tags:
The output obtained is the combination of the CLI output of our server and our clients, going through: The output obtained is the combination of the CLI output of our server and our clients, going through:
* `INFO:Server:Starting clients registration process.` : a first registration step, where clients register with the server * `INFO:Server:Starting clients registration process.` : a first registration step, where clients register with the server
* `INFO:Server:Sending initialization requests to clients.`: the initilization of the object needed for training on both the server and clients side. * `INFO:Server:Sending initialization requests to clients.`: the initilization of the object needed for training on both the server and clients side.
* `Server:INFO: Initiating training round 1`: the training starts, where each client makes its local update(s) and send the result to the server which aggregates them * `Server:INFO: Initiating training round 1`: the training starts, where each client makes its local update(s) and send the result to the server which aggregates them
* `INFO: Initiating evaluation round 1`: the model is evaluated at each round * `INFO: Initiating evaluation round 1`: the model is evaluated at each round
* `Server:INFO: Stopping training`: the training is finalized * `Server:INFO: Stopping training`: the training is finalized
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Results ## Results
You can have a look at the results in the `examples/mnist_quickrun/result_*` folder, including the metrics evolution during training. You can have a look at the results in the `examples/mnist_quickrun/result_*` folder, including the metrics evolution during training.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
import pandas as pd import pandas as pd
import glob import glob
import os import os
res_file = glob.glob('examples/mnist_quickrun/result*') res_file = glob.glob('examples/mnist_quickrun/result*')
res = pd.read_csv(os.path.join(res_file[0],'server/metrics.csv')) res = pd.read_csv(os.path.join(res_file[0],'server/metrics.csv'))
res_fig = res.plot() res_fig = res.plot()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Experiment further # Experiment further
You can change the TOML config file to experiment with different strategies. You can change the TOML config file to experiment with different strategies.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
For instance, try splitting the data in a very heterogenous way, by distributing digits in mutually exclusive way between clients. For instance, try splitting the data in a very heterogenous way, by distributing digits in mutually exclusive way between clients.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
split_data(folder="examples/mnist_quickrun",scheme='labels') split_data(folder="examples/mnist_quickrun",scheme='labels')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
And change the `examples/mnist_quickrun/config.toml` file with: And change the `examples/mnist_quickrun/config.toml` file with:
``` ```
[data] [data]
data_folder = "examples/mnist_quickrun/data_labels" data_folder = "examples/mnist_quickrun/data_labels"
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
If you run the model as is, you should see a drop of performance If you run the model as is, you should see a drop of performance
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
quickrun(config="examples/mnist_quickrun/config.toml") quickrun(config="examples/mnist_quickrun/config.toml")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now try modifying the `examples/mnist_quickrun/config.toml` file like this, to implement the [scaffold algorithm](https://arxiv.org/abs/1910.06378) and running the experiment again. Now try modifying the `examples/mnist_quickrun/config.toml` file like this, to implement the [scaffold algorithm](https://arxiv.org/abs/1910.06378) and running the experiment again.
``` ```
[optim] [optim]
[optim.client_opt] [optim.client_opt]
lrate = 0.005 lrate = 0.005
modules = ["scaffold-client"] modules = ["scaffold-client"]
[optim.server_opt] [optim.server_opt]
lrate = 1.0 lrate = 1.0
modules = ["scaffold-client"] modules = ["scaffold-client"]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ``` python
quickrun(config="examples/mnist_quickrun/config.toml") quickrun(config="examples/mnist_quickrun/config.toml")
``` ```
......
# Demo training task : MNIST in Quickrun Mode
## Overview
**We are going to use the declearn-quickrun tool to easily run a simulated
federated learning experiment on the classic
[MNIST dataset](http://yann.lecun.com/exdb/mnist/)**. The input of the model
is a set of images of handwritten digits, and the model needs to determine to
which digit between $0$ and $9$ each image corresponds.
## Setup
A Jupyter Notebook tutorial is provided, that you may import and run on Google
Colab so as to avoid having to set up a local python environment.
Alternatively, you may run the notebook on your personal computer, or follow
its instructions to install declearn and operate the quickrun tools directly
from a shell command-line.
## Contents
This example's folder is structured the following way:
```
mnist/
│ config.toml - configuration file for the quickrun FL experiment
| mnist.ipynb - tutorial for this example, as a jupyter notebook
| model.py - python file declaring the model to be trained
└─── data_iid - mnist data generated with `declearn-split`
└─── results_* - results generated after running `declearn-quickrun`
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment