CREMONESI Francesco requested to merge feature/297-demo-notebook into develop Jun 08, 2022

Notebooks showcasing brain segmentation model (UNet) and MedicalFolder dataset

The main purpose of this MR is to provide two notebooks with a complex use case of training a brain segmentation model based on the UNet network, with data distributed over 3 centers, using the MedicalFolderDataset tools. In order to make this notebook work, some changes were made to the Fed-BioMed library code, and additional utilities have also been provided.

The notebooks and related utility scripts have been put in the subfolder notebooks/medical-image-segmentation, and we also use the already-existing notebooks/data folder as a destination for downloading the data.

As a utility script, we provide notebooks/medical-image-segmentation/download_and_split_ixi.py to ease the process of downloading the data for this notebook and splitting it into separate subfolders according to the providing center.

Testing the notebooks

We have two notebooks. The purpose of this duplication is only to show the tradeoffs between asking nodes to install additional dependencies (i.e. the unet library) or asking the researcher to provide the code themselves.

To test the notebooks, instructions are provided within the notebooks themselves in the folder notebooks/medical-image-segmentation. First, download and split the data with

download_and_split_ixi.py -f <path to Fed-BioMed root folder>

Then upload the datasets to three nodes (one per center).

./scripts/fedbiomed_run config <center name>.ini add

The data files for each center will be stored in the folder UniCancer-Centers/<center name>/train/ with the corresponding demographics files at UniCancer-Centers/<center name>/train/participants.csv.

Make sure to insert the value ixi-train as tags.

Finally, start the three nodes (one per center).

./scripts/fedbiomed_run config <center name>.ini start

Then execute the notebook cells in the order provided.

Changes to Fed-BioMed library

To ease the review process, this section summarizes the main changes to the library that are included in this MR.

Meaning of `data` and `target` in the TrochTrainingPlan's training routine

Before, we made the assumption that data and target were torch Tensors. To support complex datasets such as the MedicalFolderDataset where the data contains both images and tabular modalities, now data and target can be:

torch Tensor
any nesting of list, tuple or dict, as long as the last levels contain torch Tensors

As a consequence, we also changed the send_to_device function to recursively handle the arbitrary nesting of collections listed above.

Demographics transform in MedicalFolderDataset

The researcher can now specify a function to transform the loaded demographics data. Note that this function serves two purposes:

allow the researcher to preprocess this data in the way that they like
convert the demographics data to a torch Tensor, or to something that can be automatically converted to a torch Tensor via the torch.as_tensor function.

Since the second usage puts some "programming" burden on the researcher, an error message reminding the researcher to convert demographics data to a torch Tensor is outputted if automatic conversion attempts fail.

Added `log_interval` to Validator scheme for TrainingArgs

Unit tests

made sure that existing unit tests comply with new "interpretation" of data and target
added unit tests for send to device and other functions that we changed
some cosmetics on blank lines (unsure whether it was done intentionally)

Edited Jun 10, 2022 by CREMONESI Francesco

Admin message

Demo of brain segmentation model using MedicalFolder dataset