Melissa/Code_Saturne compilation and launching on Jean-Zay
Text reformatted by @cconrads on 2021-05-07
Successful compilation using intel compiler but execution still failing (2021-01-19 Tue)
Context: compilation and execution of code_saturne and melissa (deep) on Jean-Zay supercomputer
- Modules loaded:
zeromq/4.2.5
cuda/10.1.2
nccl/2.5.6-2-cuda
cudnn/7.6.5.32-cuda-10.1
intel-compilers/19.0.4
openmpi/4.0.2-cuda
tensorflow-gpu/py3/2.3.1+hvd-0.21.0
cmake/3.11.2
- Compilation of a specific melissa version with icc for staying compliant with code_saturne (in addition to base install for the server)
cmake -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DCMAKE_INSTALL_PREFIX=/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install-code-saturn ../.
make
make install
- Compilation of
hdf5-1.10.6
(the hdf5 version coming with tensorflow was giving problems)cmake -DBUILD_SHARED_LIBS=1 -DCMAKE_INSTALL_PREFIX=/gpfswork/rech/igf/commun/Code_Saturne/hdf5-1.10.6
- Compilation of
MED-4.0.0
LANG=en_US.utf-8 LC_ALL=en_US.utf-8 CXX=icc CC=icc ./configure --prefix=/gpfswork/rech/igf/commun/Code_Saturne/med-4.0.0/ --with-med_int=long --with-hdf5=/gpfswork/rech/igf/commun/Code_Saturne/hdf5-1.10.6/ --disable-python
LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make
LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make install
- Code-saturne 6.3.0
LANG=en_US.utf-8 LC_ALL=en_US.utf-8 CXX=icc CC=icc ./configure --prefix=/gpfswork/rech/igf/commun/Code_Saturne/6.3.0 --with-melissa=/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install-code-saturn --with-zeromq=/gpfslocalsup/spack_soft/zeromq/4.2.5/gcc-8.3.1-kgeshbfhmrekggniahzlmk3jctiblxsu --with-med=/linkhome/rech/genini01/upt67eh/work/Code_Saturne/med-4.0.0/ --with-hdf5=/gpfswork/rech/igf/commun/Code_Saturne/hdf5-1.10.6/ --enable-melissa-as-plugin
LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make
LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make install
-lmpi
lib
Fix 1: Failing test during configure because missing LANG=en_US.utf-8 LC_ALL=en_US.utf-8 icc -std=c11 -restrict -funsigned-char -Wall -Wcheck -Wshadow -Wpointer-arith -Wmissing-prototypes -Wuninitialized -Wunused -wd981 -qopenmp -D_POSIX_C_SOURCE=200809L -DNDEBUG -I/gpfslocalsys/intel/parallel_studio_xe_2020_update1_cluster_edition/compilers_and_libraries_2020.1.217/linux/mpi/intel64//include -O -Wl,-export-dynamic -qopenmp -L/gpfslocalsys/intel/parallel_studio_xe_2020_update1_cluster_editino/compilers_and_libraries_2020.1.217/linux/mpi/intel64//lib conftest.c
Fix: patched configure file to add -lmpi
line 24453 (LIBS="$MPI_LIBS $saved_LIBS -lmpi"
)
Test: try reading the mesh file (as faling in previous compilation attempts):
/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/libexec/code_saturne/cs_preprocess --out mesh_input.csm /gpfswork/rech/igf/commun/bruno/examples/vonkarman/MESH/COARSE/Cylinder_7361.med
melissa_launcher option.py
)
Fix 2: Error when starting melissa ((tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay3: SCRIPTS]$ more saturne.1227710.err
Loading zeromq/4.2.5
Loading tensorflow-gpu/py3/2.3.1+hvd-0.21.0
Loading cmake/3.11.2
Traceback (most recent call last):
File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/bin/code_saturne", line 85, in <module>
retcode = cs.execute()
File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_script.py", line 95, in execute
return self.commands[command](options)
File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_script.py", line 179, in run
return cs_run.main(options, self.package)
File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_run.py", line 651, in main
return run(argv, pkg)[0]
File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_run.py", line 601, in run
domains=d)
File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_case.py", line 219, in __init__
cs_exec_environment.set_modules(self.package_compute)
File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_exec_environment.py", line 668, in set_modules
exec(output)
File "<string>", line 12
. /gpfslocalsup/pub/anaconda-py3/2020.02/etc/profile.d/conda.sh;
^
SyntaxError: invalid syntax
Code_saturne is here trying to exec some shell script while it is expected some python code here.
- Dirty fix:
- file:
/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_exec_environment.py
- line 662:
- file:
for cmd in cmds:
# Bad trick to skip a shell command that makes everything fais (Bruno 8/1/2021)
if cmd.find(".",0,3):
pass
else:
(output, error) = subprocess.Popen([cmd_prefix, 'python'] + cmd.split(),
universal_newlines=True,
stdout=subprocess.PIPE).communicate()
print(output)
exec(output)
Melissa tests:
- Still failing as of 2021-01-19 Tue
- Issue not very explicit:
- Everything starts, but code_saturn does not connect to the server
- Logs:
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: scratchb]$ cd EXP-2021-01-19-T21-45-49
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: EXP-2021-01-19-T21-45-49]$ cd group0/
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: group0]$ cd rank0/
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: rank0]$ cd SCRIPTS/
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: SCRIPTS]$ ls
runcase.sh saturne.1229691.err saturne.1229691.log saturne.1229707.err saturne.1229707.log server_name.txt
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: SCRIPTS]$ more saturne.1229707.err
Loading zeromq/4.2.5
Loading tensorflow-gpu/py3/2.3.1+hvd-0.21.0
Loading cmake/3.11.2
Warning:
Both case1.xml and setup.xml exist in
/gpfsssd/scratch/rech/igf/commun/bruno/EXP-2021-01-19-T21-45-49/group0/rank0/DATA.
case1.xml will be used for the computation.
Be aware that to follow best practices only one of the two should be present.
solver script exited with status 2.
Error running the calculation.
Check code_saturne log (listing) and error* files for details.
Error in calculation stage.
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: SCRIPTS]$ more saturne.1229707.log
sys.path = [
'/gpfsssd/scratch/rech/igf/commun/bruno/EXP-2021-01-19-T21-45-49/group0/rank0/SCRIPTS',
'/linkhome/rech/genini01/upt67eh/.local',
'/gpfswork/rech/igf/commun/bruno/deepmelissa',
'/gpfswork/rech/igf/commun/bruno/melissa/utility/melissa4py',
'/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install/share/melissa',
'/linkhome/rech/genini01/upt67eh/work/bruno/melissa',
'/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install/share/melissa/launcher',
'/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python37.zip',
'/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7',
'/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/lib-dynload',
'/linkhome/rech/genini01/upt67eh/.local/lib/python3.7/site-packages',
'/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/site-packages',
'/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/site-packages/cdat_info-8.2.1-py3.6.egg',
'/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/site-packages/horovod-0.21.0-py3.7-linux-x86_64.egg',
'/gpfs7kro/gpfslocalsup/src/pub/anaconda-py3/2020.02/tensorflow-2.3.1+horovod-0.21.0/horovod-0.21.0/.eggs/psutil-5.7.3-py3.7-linux-x86_64.egg',
'/gpfs7kro/gpfslocalsup/src/pub/anaconda-py3/2020.02/tensorflow-2.3.1+horovod-0.21.0/horovod-0.21.0/.eggs/cloudpickle-1.6.0-py3.7.egg',
]
USER_BASE: '/linkhome/rech/genini01/upt67eh/.local' (exists)
USER_SITE: '/linkhome/rech/genini01/upt67eh/.local/lib/python3.7/site-packages' (exists)
ENABLE_USER_SITE: True
19/01/21 21:47:32
code_saturne
************
Version: 6.3.0
Path: /gpfswork/rech/igf/commun/Code_Saturne/6.3.0
Result directory:
/gpfsssd/scratch/rech/igf/commun/bruno/EXP-2021-01-19-T21-45-49/group0/rank0/RESU/20210119-2147_1
****************************************
Compiling user subroutines and linking
****************************************
****************************
Preparing calculation data
****************************
Parallel code_saturne on 2 processes.
***************************
Preprocessing calculation
***************************
**********************
Starting calculation
**********************
This version of Spack (openmpi ~legacylaunchers schedulers=slurm)
is installed without the mpiexec/mpirun commands to prevent
unintended performance issues. See https://github.com/spack/spack/pull/10340
for more details.
If you understand the potential consequences of a misconfigured mpirun, you can
use spack to install 'openmpi+legacylaunchers' to restore the executables.
Otherwise, use srun to launch your MPI executables.
*****************************
Post-calculation operations
*****************************
19/01/21 21:47:34
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information