Mentions légales du service

Skip to content
Snippets Groups Projects
README.md 6.81 KiB
Newer Older
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
COMBES Benoit committed
# MSMultiSpine Dataset Specification
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
up  
COMBES Benoit committed
From the challenge proposal:
COMBES Benoit's avatar
up  
COMBES Benoit committed
> Depending on the center and context, any combination of existing MR sequences can be provided. In this challenge, that represents a concrete complex case of multisequence datasets, we focus on four commonly used sequences: the sagittal T2 (that is always provided in the challenge and will be considered as the reference to segment), the sagittal STIR, the sagittal PSIR and the 3D MP2RAGE. From a methodological point of view, this challenge is a concrete and paradigmatic case of missing modalities setting where, depending on the case, some modalities may be missing both at inference or training time.
COMBES Benoit's avatar
up  
COMBES Benoit committed

COMBES Benoit's avatar
up  
COMBES Benoit committed
Thus, in a nutshell, pipelines will have access to different sequences depending on the case understudy. Moreover, for each sequence, the pipeline will have access to the original nii.gz volume, a preprocessed version of the original volume and a registered version of the preprocessed volume (see [the dedicated subsection](#preprocessing-and-registration)). A pipeline will be allowed to use any combinaison of the available volumes (including e.g. only the T2 sagittal data, that will be always available).
COMBES Benoit's avatar
up  
COMBES Benoit committed

COMBES Benoit's avatar
up  
COMBES Benoit committed
The training dataset is downloadable via the [Shanoir interface](https://shanoir.irisa.fr/shanoir-ng/welcome). Pipelines will be submitted in the form of docker (described [here](#dockerization-and-integration-to-vip)) and evaluated by the MS-Multi-Spine team on a dedicated data set.
COMBES Benoit's avatar
up  
COMBES Benoit committed

COMBES Benoit's avatar
up  
COMBES Benoit committed
The combinations provided in the training set and that will be tested during the test phase is described in the following table:
COMBES Benoit's avatar
up  
COMBES Benoit committed

| Subset | Sequence Combination | Training    | Testing     | 
| ------ | -------------------- | ----------- | ----------- |
| 1      | (t2,stir)            | 50          | 40          |
| 2      | (t2,psir)            | 25          | 20          |
| 3      | (t2,mp2rage)         | 25          | 20          |
| 4      | (t2,stir,mp2rage)    | 00          | 20          |
COMBES Benoit's avatar
up  
COMBES Benoit committed
|        | Overall              | 100         | 100         | 

In the following of this section, we described how data have been generated and structured. 
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
COMBES Benoit committed
## Original data, preprocessing and registration
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
up  
COMBES Benoit committed
In addition to the original SC acquisitions, two other versions of the data will be provided to the challengers: 

* Preprocessed volumes: in a nutshell, for a given case, all raw volumes will be sequentially i) reoriented in a sagittal
orientation, ii) resampled in a single frame associated to a fine resolution (0.5mm3) and iii) zeroed outside a
square area of side 35 mm centered (slice-wise) on the spinal cord barycenter.
* Registered and preprocessed volumes: the preprocessed volumes after having applied a rigid followed by a highly
regularized non-linear registration will also be provided.
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
COMBES Benoit committed
## Data Structure
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
COMBES Benoit committed
The training dataset is structured in the following manner:
COMBES Benoit's avatar
COMBES Benoit committed

SOISNARD Gwendal's avatar
SOISNARD Gwendal committed
```
MS_MultiSpine_dataset
COMBES Benoit's avatar
COMBES Benoit committed
├── rawdata                           # Original raw volumes and lesion masks
│   ├── sub-001
│   ├── ...

COMBES Benoit's avatar
COMBES Benoit committed
└── derivatives                       # Contains preprocessed versions of the volumes
    ├── preprocessed                  # Preprocessed volumes and lesion masks
    │   ├── sub-001
    │   ├── ...

COMBES Benoit's avatar
COMBES Benoit committed
    └── preprocessedAndRegistered     # Preprocessed and registered volumes and masks
        ├── sub-001
        ├── ...
SOISNARD Gwendal's avatar
SOISNARD Gwendal committed
```
COMBES Benoit's avatar
up  
COMBES Benoit committed
Below are examples of filenames:
SOISNARD Gwendal's avatar
SOISNARD Gwendal committed
```
SOISNARD Gwendal's avatar
SOISNARD Gwendal committed
├── sub-001                           # Split 1 : Training, Subset 1 : T2 + STIR
│   ├── 11-001_T2.nii.gz
│   ├── 11-001_STIR.nii.gz
│   └── 11-001_LESIONMASK.nii.gz

├── ...

├── sub-075                           # Split 1 : Training, Subset 2 : T2 + PSIR
│   ├── 11-075_T2.nii.gz
│   ├── 11-075_PSIR.nii.gz
│   └── 11-075_LESIONMASK.nii.gz

├── ...

└── sub-100                           # Split 1 : Training, Subset 3 : T2 + MP2RAGE
    ├── 13-100_T2.nii.gz
    ├── 13-100_MP2RAGE.nii.gz
    └── 13-100_LESIONMASK.nii.gz
SOISNARD Gwendal's avatar
SOISNARD Gwendal committed
```
COMBES Benoit's avatar
up  
COMBES Benoit committed
In practice, all volumes and masks are prefixed by `\d\d-<subject_id>` where:
COMBES Benoit's avatar
up  
COMBES Benoit committed
	
COMBES Benoit's avatar
up  
COMBES Benoit committed
* the first digit correspond to the split: 1 for training, 2 for testing
  	Remark: Only the training split is shared in this dataset, there are only the 100 first subjects.
* the second digit correspond to the subset:

	* subset 1: case with T2 and STIR
	* subset 2: case with T2 and PSIR
	* subset 3: case with T2 and MP2RAGE
	* subset 4: case with T2, STIR and MP2RAGE (only at testing)
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
COMBES Benoit committed

# MSMultiSpine Model Specification

## Pipeline inputs

During the inference phase, the data provided in a folder to be mounted as a volume in the Docker containers will be organized the same way, the difference being that the data related to a single subject will be provided for each execution:

```
root_folder
├── rawdata                           # Original raw volumes and lesion masks
│   ├── sub-xxx

└── derivatives                       # Contains preprocessed versions of the volumes
    ├── preprocessed                  # Preprocessed volumes and lesion masks
    │   ├── sub-xxx

    └── preprocessedAndRegistered     # Preprocessed and registered volumes and masks
        ├── sub-xxx
```

COMBES Benoit's avatar
COMBES Benoit committed
The `listing_inputs.py` code provided in this is repository is a helper to determine which input files exist and the associated filepaths inside the input folder.
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
up  
COMBES Benoit committed
## Expected pipeline outputs 
COMBES Benoit's avatar
COMBES Benoit committed

Description given in the challenge proposal:
COMBES Benoit's avatar
up  
COMBES Benoit committed

COMBES Benoit's avatar
COMBES Benoit committed
> A given method will be asked to output a mask of label as well as a corresponding csv file assigning a probability
to each lesion's label (one row per lesions).

More specifically, for given run on a set of sequence, a pipeline is required to output:

COMBES Benoit's avatar
COMBES Benoit committed
* a consistent nii.gz file with dimension and orientation matrix equals to those of the input rawdata t2 sagital acquisitions (the spatial coverage of the sagittal T2 will always be the one considered for annotating the lesions and evaluating the methods). This nii.gz file must contain only integer values where each lesion of a given instance in a given case is assigned to a unique integer value (≥1, 0 being the background).
COMBES Benoit's avatar
COMBES Benoit committed

* a consistent csv file with the two columns : `id` and `p` and one row per instance in the corresponding nii.gz file. `id` must match a instance number from the nii file and `p` must be a float in [0, 1].

The script `data_check.py` takes as input the input data and the corresponding output data and check that their are consistent with the rules above. The success of this script is required to ensure the compatibility of the method output to the overall evaluation process. 
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
up  
COMBES Benoit committed
## Pipeline Interface, Dockerization and Integration to VIP
COMBES Benoit's avatar
COMBES Benoit committed

COMBES Benoit's avatar
COMBES Benoit committed
Docker must then be interfaced through the VIP portal, details about this process are provided [here](https://github.com/virtual-imaging-platform/VIP-portal/wiki/Packaging-guide).