Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • melissa melissa
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 35
    • Issues 35
    • List
    • Boards
    • Service Desk
    • Milestones
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • melissa
  • melissamelissa
  • Issues
  • #84

Closed
Open
Created Jan 19, 2021 by RAFFIN Bruno@braffinOwner

Melissa/Code_Saturne compilation and launching on Jean-Zay

Text reformatted by @cconrads on 2021-05-07

Successful compilation using intel compiler but execution still failing (2021-01-19 Tue)

Context: compilation and execution of code_saturne and melissa (deep) on Jean-Zay supercomputer

  1. Modules loaded:
    1. zeromq/4.2.5
    2. cuda/10.1.2
    3. nccl/2.5.6-2-cuda
    4. cudnn/7.6.5.32-cuda-10.1
    5. intel-compilers/19.0.4
    6. openmpi/4.0.2-cuda
    7. tensorflow-gpu/py3/2.3.1+hvd-0.21.0
    8. cmake/3.11.2
  2. Compilation of a specific melissa version with icc for staying compliant with code_saturne (in addition to base install for the server)
    • cmake -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DCMAKE_INSTALL_PREFIX=/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install-code-saturn ../.
    • make
    • make install
  3. Compilation of hdf5-1.10.6 (the hdf5 version coming with tensorflow was giving problems)
    • cmake -DBUILD_SHARED_LIBS=1 -DCMAKE_INSTALL_PREFIX=/gpfswork/rech/igf/commun/Code_Saturne/hdf5-1.10.6
  4. Compilation of MED-4.0.0
    • LANG=en_US.utf-8 LC_ALL=en_US.utf-8 CXX=icc CC=icc ./configure --prefix=/gpfswork/rech/igf/commun/Code_Saturne/med-4.0.0/ --with-med_int=long --with-hdf5=/gpfswork/rech/igf/commun/Code_Saturne/hdf5-1.10.6/ --disable-python
    • LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make
    • LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make install
  5. Code-saturne 6.3.0
    • LANG=en_US.utf-8 LC_ALL=en_US.utf-8 CXX=icc CC=icc ./configure --prefix=/gpfswork/rech/igf/commun/Code_Saturne/6.3.0 --with-melissa=/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install-code-saturn --with-zeromq=/gpfslocalsup/spack_soft/zeromq/4.2.5/gcc-8.3.1-kgeshbfhmrekggniahzlmk3jctiblxsu --with-med=/linkhome/rech/genini01/upt67eh/work/Code_Saturne/med-4.0.0/ --with-hdf5=/gpfswork/rech/igf/commun/Code_Saturne/hdf5-1.10.6/ --enable-melissa-as-plugin
    • LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make
    • LANG=en_US.utf-8 LC_ALL=en_US.utf-8 make install

Fix 1: Failing test during configure because missing -lmpi lib

 LANG=en_US.utf-8 LC_ALL=en_US.utf-8  icc -std=c11 -restrict -funsigned-char -Wall -Wcheck -Wshadow -Wpointer-arith -Wmissing-prototypes -Wuninitialized -Wunused -wd981  -qopenmp   -D_POSIX_C_SOURCE=200809L -DNDEBUG -I/gpfslocalsys/intel/parallel_studio_xe_2020_update1_cluster_edition/compilers_and_libraries_2020.1.217/linux/mpi/intel64//include   -O -Wl,-export-dynamic -qopenmp -L/gpfslocalsys/intel/parallel_studio_xe_2020_update1_cluster_editino/compilers_and_libraries_2020.1.217/linux/mpi/intel64//lib conftest.c

Fix: patched configure file to add -lmpi line 24453 (LIBS="$MPI_LIBS $saved_LIBS -lmpi")

Test: try reading the mesh file (as faling in previous compilation attempts):

/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/libexec/code_saturne/cs_preprocess  --out mesh_input.csm  /gpfswork/rech/igf/commun/bruno/examples/vonkarman/MESH/COARSE/Cylinder_7361.med

Fix 2: Error when starting melissa (melissa_launcher option.py)

(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay3: SCRIPTS]$ more saturne.1227710.err
Loading zeromq/4.2.5
Loading tensorflow-gpu/py3/2.3.1+hvd-0.21.0
Loading cmake/3.11.2
Traceback (most recent call last):
  File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/bin/code_saturne", line 85, in <module>
    retcode = cs.execute()
  File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_script.py", line 95, in execute
    return self.commands[command](options)
  File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_script.py", line 179, in run
    return cs_run.main(options, self.package)
  File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_run.py", line 651, in main
    return run(argv, pkg)[0]
  File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_run.py", line 601, in run
    domains=d)
  File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_case.py", line 219, in __init__
    cs_exec_environment.set_modules(self.package_compute)
  File "/gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_exec_environment.py", line 668, in set_modules
    exec(output)
  File "<string>", line 12
    . /gpfslocalsup/pub/anaconda-py3/2020.02/etc/profile.d/conda.sh;
    ^
SyntaxError: invalid syntax

Code_saturne is here trying to exec some shell script while it is expected some python code here.

  • Dirty fix:
    • file: /gpfswork/rech/igf/commun/Code_Saturne/6.3.0/lib/python3.7/site-packages/code_saturne/cs_exec_environment.py
    • line 662:
    for cmd in cmds:
    # Bad trick to skip a shell command that makes everything fais (Bruno 8/1/2021)
    if cmd.find(".",0,3):
        pass
    else:
        (output, error) = subprocess.Popen([cmd_prefix, 'python'] + cmd.split(),
   					universal_newlines=True,
   					stdout=subprocess.PIPE).communicate()
        print(output)
        exec(output)

Melissa tests:

  • Still failing as of 2021-01-19 Tue
  • Issue not very explicit:
  • Everything starts, but code_saturn does not connect to the server
  • Logs:
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: scratchb]$ cd EXP-2021-01-19-T21-45-49
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: EXP-2021-01-19-T21-45-49]$ cd group0/
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: group0]$ cd rank0/
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: rank0]$ cd SCRIPTS/
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: SCRIPTS]$ ls
runcase.sh  saturne.1229691.err  saturne.1229691.log  saturne.1229707.err  saturne.1229707.log  server_name.txt
(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: SCRIPTS]$ more saturne.1229707.err
Loading zeromq/4.2.5
Loading tensorflow-gpu/py3/2.3.1+hvd-0.21.0
Loading cmake/3.11.2
Warning:
  Both case1.xml and setup.xml exist in
    /gpfsssd/scratch/rech/igf/commun/bruno/EXP-2021-01-19-T21-45-49/group0/rank0/DATA.
  case1.xml will be used for the computation.
  Be aware that to follow best practices only one of the two should be present.


 solver script exited with status 2.

Error running the calculation.

Check code_saturne log (listing) and error* files for details.

 Error in calculation stage.

(tensorflow-gpu-2.3.1+hvd-0.21.0) [upt67eh@jean-zay1: SCRIPTS]$ more saturne.1229707.log
sys.path = [
    '/gpfsssd/scratch/rech/igf/commun/bruno/EXP-2021-01-19-T21-45-49/group0/rank0/SCRIPTS',
    '/linkhome/rech/genini01/upt67eh/.local',
    '/gpfswork/rech/igf/commun/bruno/deepmelissa',
    '/gpfswork/rech/igf/commun/bruno/melissa/utility/melissa4py',
    '/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install/share/melissa',
    '/linkhome/rech/genini01/upt67eh/work/bruno/melissa',
    '/linkhome/rech/genini01/upt67eh/work/bruno/melissa/install/share/melissa/launcher',
    '/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python37.zip',
    '/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7',
    '/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/lib-dynload',
    '/linkhome/rech/genini01/upt67eh/.local/lib/python3.7/site-packages',
    '/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/site-packages',
    '/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/site-packages/cdat_info-8.2.1-py3.6.egg',
    '/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.3.1+hvd-0.21.0/lib/python3.7/site-packages/horovod-0.21.0-py3.7-linux-x86_64.egg',
    '/gpfs7kro/gpfslocalsup/src/pub/anaconda-py3/2020.02/tensorflow-2.3.1+horovod-0.21.0/horovod-0.21.0/.eggs/psutil-5.7.3-py3.7-linux-x86_64.egg',
    '/gpfs7kro/gpfslocalsup/src/pub/anaconda-py3/2020.02/tensorflow-2.3.1+horovod-0.21.0/horovod-0.21.0/.eggs/cloudpickle-1.6.0-py3.7.egg',
]
USER_BASE: '/linkhome/rech/genini01/upt67eh/.local' (exists)
USER_SITE: '/linkhome/rech/genini01/upt67eh/.local/lib/python3.7/site-packages' (exists)
ENABLE_USER_SITE: True
19/01/21 21:47:32

		      code_saturne
		      ************

 Version:   6.3.0
 Path:      /gpfswork/rech/igf/commun/Code_Saturne/6.3.0

 Result directory:
   /gpfsssd/scratch/rech/igf/commun/bruno/EXP-2021-01-19-T21-45-49/group0/rank0/RESU/20210119-2147_1


 ****************************************
  Compiling user subroutines and linking
 ****************************************


 ****************************
  Preparing calculation data
 ****************************

 Parallel code_saturne on 2 processes.


 ***************************
  Preprocessing calculation
 ***************************


 **********************
  Starting calculation
 **********************

This version of Spack (openmpi ~legacylaunchers schedulers=slurm)
is installed without the mpiexec/mpirun commands to prevent
unintended performance issues. See https://github.com/spack/spack/pull/10340
for more details.
If you understand the potential consequences of a misconfigured mpirun, you can
use spack to install 'openmpi+legacylaunchers' to restore the executables.
Otherwise, use srun to launch your MPI executables.

 *****************************
  Post-calculation operations
 *****************************

19/01/21 21:47:34
Edited May 07, 2021 by Christoph Conrads
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking