README.md


# MSMultiSpine Dataset Specification

From the challenge proposal:
> Depending on the center and context, any combination of existing MR sequences can be provided. In this challenge, that represents a concrete complex case of multisequence datasets, we focus on four commonly used sequences: the sagittal T2 (that is always provided in the challenge and will be considered as the reference to segment), the sagittal STIR, the sagittal PSIR and the 3D MP2RAGE. From a methodological point of view, this challenge is a concrete and paradigmatic case of missing modalities setting where, depending on the case, some modalities may be missing both at inference or training time.

Thus, in a nutshell, pipelines will have access to different sequences depending on the case understudy. Moreover, for each sequence, the pipeline will have access to the original nii.gz volume, a preprocessed version of the original volume and a registered version of the preprocessed volume (see [the dedicated subsection](#preprocessing-and-registration)). A pipeline will be allowed to use any combinaison of the available volumes (including e.g. only the T2 sagittal data, that will be always available).

The training dataset is downloadable via the [Shanoir interface](https://shanoir.irisa.fr/shanoir-ng/welcome). Pipelines will be submitted in the form of docker (described [here](#dockerization-and-integration-to-vip)) and evaluated by the MS-Multi-Spine team on a dedicated data set.

The combinations provided in the training set and that will be tested during the test phase is described in the following table:

| Subset | Sequence Combination | Training    | Testing     | 
| ------ | -------------------- | ----------- | ----------- |
| 1      | (t2,stir)            | 50          | 40          |
| 2      | (t2,psir)            | 25          | 20          |
| 3      | (t2,mp2rage)         | 25          | 20          |
| 4      | (t2,stir,mp2rage)    | 00          | 20          |
|        | Overall              | 100         | 100         | 

In the following of this section, we described how data have been generated and structured. 

## Original data, preprocessing and registration

In addition to the original SC acquisitions, two other versions of the data will be provided to the challengers: 

* Preprocessed volumes: in a nutshell, for a given case, all raw volumes will be sequentially i) reoriented in a sagittal
orientation, ii) resampled in a single frame associated to a fine resolution (0.5mm3) and iii) zeroed outside a
square area of side 35 mm centered (slice-wise) on the spinal cord barycenter.
* Registered and preprocessed volumes: the preprocessed volumes after having applied a rigid followed by a highly
regularized non-linear registration will also be provided.

## Data Structure

The training dataset is structured in the following manner:

```
MS_MultiSpine_dataset
├── rawdata                           # Original raw volumes and lesion masks
│   ├── sub-001
│   ├── ...
│
└── derivatives                       # Contains preprocessed versions of the volumes
    ├── preprocessed                  # Preprocessed volumes and lesion masks
    │   ├── sub-001
    │   ├── ...
    │
    └── preprocessedAndRegistered     # Preprocessed and registered volumes and masks
        ├── sub-001
        ├── ...
```

Below are examples of filenames:

```
rawdata
├── sub-001                           # Split 1 : Training, Subset 1 : T2 + STIR
│   ├── 11-001_T2.nii.gz
│   ├── 11-001_STIR.nii.gz
│   └── 11-001_LESIONMASK.nii.gz
│
├── ...
│
├── sub-075                           # Split 1 : Training, Subset 2 : T2 + PSIR
│   ├── 11-075_T2.nii.gz
│   ├── 11-075_PSIR.nii.gz
│   └── 11-075_LESIONMASK.nii.gz
│
├── ...
│
└── sub-100                           # Split 1 : Training, Subset 3 : T2 + MP2RAGE
    ├── 13-100_T2.nii.gz
    ├── 13-100_MP2RAGE.nii.gz
    └── 13-100_LESIONMASK.nii.gz
```

In practice, all volumes and masks are prefixed by `\d\d-<subject_id>` where:
	
* the first digit correspond to the split: 1 for training, 2 for testing
  	Remark: Only the training split is shared in this dataset, there are only the 100 first subjects.
* the second digit correspond to the subset:

	* subset 1: case with T2 and STIR
	* subset 2: case with T2 and PSIR
	* subset 3: case with T2 and MP2RAGE
	* subset 4: case with T2, STIR and MP2RAGE (only at testing)


# MSMultiSpine Model Specification

## Pipeline inputs

During the inference phase, the data provided in a folder to be mounted as a volume in the Docker containers will be organized the same way, the difference being that the data related to a single subject will be provided for each execution:

```
root_folder
├── rawdata                           # Original raw volumes and lesion masks
│   ├── sub-xxx
│
└── derivatives                       # Contains preprocessed versions of the volumes
    ├── preprocessed                  # Preprocessed volumes and lesion masks
    │   ├── sub-xxx
    │
    └── preprocessedAndRegistered     # Preprocessed and registered volumes and masks
        ├── sub-xxx
```

The `listing_inputs.py` code provided in this is repository is a helper to determine which input files exist and the associated filepaths inside the input folder.


## Expected pipeline outputs 

Description given in the challenge proposal:

> A given method will be asked to output a mask of label as well as a corresponding csv file assigning a probability
to each lesion's label (one row per lesions).

More specifically, for given run on a set of sequence, a pipeline is required to output:

* a consistent nii.gz file with dimension and orientation matrix equals to those of the input rawdata t2 sagital acquisitions (the spatial coverage of the sagittal T2 will always be the one considered for annotating the lesions and evaluating the methods). This nii.gz file must contain only integer values where each lesion of a given instance in a given case is assigned to a unique integer value (≥1, 0 being the background).

* a consistent csv file with the two columns : `id` and `p` and one row per instance in the corresponding nii.gz file. `id` must match a instance number from the nii file and `p` must be a float in [0, 1].

The script `data_check.py` takes as input the input data and the corresponding output data and check that their are consistent with the rules above. The success of this script is required to ensure the compatibility of the method output to the overall evaluation process. 

## Pipeline Interface, Dockerization and Integration to VIP

Docker must then be interfaced through the VIP portal, details about this process are provided [here](https://github.com/virtual-imaging-platform/VIP-portal/wiki/Packaging-guide).