Demo of brain segmentation model using MedicalFolder dataset
Notebooks showcasing brain segmentation model (UNet) and MedicalFolder dataset
The main purpose of this MR is to provide two notebooks with a complex use case of training a brain segmentation model based on the UNet network, with data distributed over 3 centers, using the MedicalFolderDataset
tools. In order to make this notebook work, some changes were made to the Fed-BioMed library code, and additional utilities have also been provided.
The notebooks and related utility scripts have been put in the subfolder notebooks/medical-image-segmentation
, and we also use the already-existing notebooks/data
folder as a destination for downloading the data.
As a utility script, we provide notebooks/medical-image-segmentation/download_and_split_ixi.py
to ease the process of downloading the data for this notebook and splitting it into separate subfolders according to the providing center.
Testing the notebooks
We have two notebooks. The purpose of this duplication is only to show the tradeoffs between asking nodes to install additional dependencies (i.e. the unet library) or asking the researcher to provide the code themselves.
To test the notebooks, instructions are provided within the notebooks themselves in the folder notebooks/medical-image-segmentation
.
First, download and split the data with
download_and_split_ixi.py -f <path to Fed-BioMed root folder>
Then upload the datasets to three nodes (one per center).
./scripts/fedbiomed_run config <center name>.ini add
The data files for each center will be stored in the folder UniCancer-Centers/<center name>/train/
with the corresponding demographics files at UniCancer-Centers/<center name>/train/participants.csv
.
Make sure to insert the value ixi-train
as tags.
Finally, start the three nodes (one per center).
./scripts/fedbiomed_run config <center name>.ini start
Then execute the notebook cells in the order provided.
Changes to Fed-BioMed library
To ease the review process, this section summarizes the main changes to the library that are included in this MR.
data
and target
in the TrochTrainingPlan's training routine
Meaning of Before, we made the assumption that data
and target
were torch Tensors. To support complex datasets such as the MedicalFolderDataset where the data contains both images and tabular modalities, now data
and target
can be:
- torch Tensor
- any nesting of
list
,tuple
ordict
, as long as the last levels contain torch Tensors
As a consequence, we also changed the send_to_device
function to recursively handle the arbitrary nesting of collections listed above.
Demographics transform in MedicalFolderDataset
The researcher can now specify a function to transform the loaded demographics data. Note that this function serves two purposes:
- allow the researcher to preprocess this data in the way that they like
- convert the demographics data to a torch Tensor, or to something that can be automatically converted to a torch Tensor via the torch.as_tensor function.
Since the second usage puts some "programming" burden on the researcher, an error message reminding the researcher to convert demographics data to a torch Tensor is outputted if automatic conversion attempts fail.
log_interval
to Validator scheme for TrainingArgs
Added Unit tests
- made sure that existing unit tests comply with new "interpretation" of data and target
- added unit tests for send to device and other functions that we changed
- some cosmetics on blank lines (unsure whether it was done intentionally)