Mentions légales du service

Skip to content

Build and test Melissa in Jean-Zay (slurm cluster)

SCHOULER Marc requested to merge build-on-jean-zay into master

This MR aims at giving guidelines on how to build Melissa in JZ.

It also includes fixes implemented when executing Melissa in it.

Build Melissa on Jean-Zay

The pytorch-gpu/py3/1.13.0 module comes with:

$ module load pytorch-gpu/py3/1.13.0
$ module list
1) cuda/11.2           3) cudnn/8.1.1.33-cuda   5) openmpi/4.1.1-cuda   7) magma/2.5.4-cuda   9) sparsehash/2.0.3        
2) nccl/2.9.6-1-cuda   4) gcc/8.4.1(8.3.1)      6) intel-mkl/2020.4     8) sox/14.4.2        10) pytorch-gpu/py3/1.13.0

In addition, it has the following python packages:

$ python3 -c "import numpy; print(numpy.__version__)"
1.23.3
$ python3 -c "import mpi4py; print(mpi4py.__version__)"
3.1.4
$ python3 -c "import zmq; print(zmq.__version__)"
23.2.0
$ python3 -c "import tensorboard; print(tensorboard.__version__)"
2.11.0

Hence Melissa can be built with the following commands:

git clone https://gitlab.inria.fr/melissa/melissa-combined.git
cd melissa-combined
mkdir build && cd build
module load pytorch-gpu/py3/1.13.0
module load zeromq
module load cmake
cmake -DMELISSA_USER_MODE=ON -DMELISSA_DEVELOP_MODE=ON -DCMAKE_INSTALL_PREFIX=../install ..
make
make install

This should only result in the pip installation of jsonschema (you may need to pip3 install --user --upgrade pip first).

Run a Melissa study on Jean-Zay

In order to execute Melissa, the configuration file must include account options and the appropriate loading commands:

"launcher_config": {
    "scheduler_arg_server": [
        "--account=ifg@gpu",
        "--ntasks=X",
        "--time=HH:MM:SS"
    ],
    "scheduler_arg_client": [
        "--account=ig@cpu",
        "--ntasks=Y",
        "--time=HH:MM:SS"
    ],
    ...
},
"client_config": {
    "preprocessing_commands": [
        "module load pytorch-gpu/py3/1.13.0",
        "module load zeromq"
    ]
},
"server_config": {
    "preprocessing_commands": [
        "module load pytorch-gpu/py3/1.13.0",
        "module load zeromq"
    ]
}
Edited by SCHOULER Marc

Merge request reports