Build and test Melissa in Jean-Zay (slurm cluster)
This MR aims at giving guidelines on how to build Melissa in JZ.
It also includes fixes implemented when executing Melissa in it.
Build Melissa on Jean-Zay
The pytorch-gpu/py3/1.13.0
module comes with:
$ module load pytorch-gpu/py3/1.13.0
$ module list
1) cuda/11.2 3) cudnn/8.1.1.33-cuda 5) openmpi/4.1.1-cuda 7) magma/2.5.4-cuda 9) sparsehash/2.0.3
2) nccl/2.9.6-1-cuda 4) gcc/8.4.1(8.3.1) 6) intel-mkl/2020.4 8) sox/14.4.2 10) pytorch-gpu/py3/1.13.0
In addition, it has the following python packages:
$ python3 -c "import numpy; print(numpy.__version__)"
1.23.3
$ python3 -c "import mpi4py; print(mpi4py.__version__)"
3.1.4
$ python3 -c "import zmq; print(zmq.__version__)"
23.2.0
$ python3 -c "import tensorboard; print(tensorboard.__version__)"
2.11.0
Hence Melissa can be built with the following commands:
git clone https://gitlab.inria.fr/melissa/melissa-combined.git
cd melissa-combined
mkdir build && cd build
module load pytorch-gpu/py3/1.13.0
module load zeromq
module load cmake
cmake -DMELISSA_USER_MODE=ON -DMELISSA_DEVELOP_MODE=ON -DCMAKE_INSTALL_PREFIX=../install ..
make
make install
This should only result in the pip
installation of jsonschema
(you may need to pip3 install --user --upgrade pip
first).
Run a Melissa study on Jean-Zay
In order to execute Melissa, the configuration file must include account options and the appropriate loading commands:
"launcher_config": {
"scheduler_arg_server": [
"--account=ifg@gpu",
"--ntasks=X",
"--time=HH:MM:SS"
],
"scheduler_arg_client": [
"--account=ig@cpu",
"--ntasks=Y",
"--time=HH:MM:SS"
],
...
},
"client_config": {
"preprocessing_commands": [
"module load pytorch-gpu/py3/1.13.0",
"module load zeromq"
]
},
"server_config": {
"preprocessing_commands": [
"module load pytorch-gpu/py3/1.13.0",
"module load zeromq"
]
}