Invalid communicator when running time_dpotrf_tile
aprun -N 1 -cc none -n 64 /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/build_cray_mpich/timing/time_dpotrf_tile -n 262144 --nb=256 --ib=256 -P 8
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] No performance model for the bus, calibrating...
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
[starpu][check_bus_config_file] ... done
Rank 1 [Sat Dec 15 01:03:16 2018] [c3-2c2s11n0] Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb1cc11a0, rank=0x2aaaca410aa8) failed
PMPI_Comm_rank(67).: Invalid communicator
Rank 0 [Sat Dec 15 01:03:16 2018] [c3-2c2s10n2] Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb1cc11a0, rank=0x2aaada20faa8) failed
PMPI_Comm_rank(67).: Invalid communicator
Rank 2 [Sat Dec 15 01:03:16 2018] [c3-2c2s11n3] Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb1cc11a0, rank=0x2aaad6410aa8) failed
PMPI_Comm_rank(67).: Invalid communicator
Rank 3 [Sat Dec 15 01:03:16 2018] [c3-2c2s12n0] Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb1cc11a0, rank=0x2aaad6410aa8) failed
PMPI_Comm_rank(67).: Invalid communicator
_pmiu_daemon(SIGCHLD): [NID 05354] [c3-2c2s10n2] [Sat Dec 15 01:03:16 2018] PE RANK 0 exit signal Aborted
[NID 05354] 2018-12-15 01:03:16 Apid 2287814: initiated application termination
_pmiu_daemon(SIGCHLD): [NID 05360] [c3-2c2s12n0] [Sat Dec 15 01:03:16 2018] PE RANK 3 exit signal Aborted
_pmiu_daemon(SIGCHLD): [NID 05359] [c3-2c2s11n3] [Sat Dec 15 01:03:16 2018] PE RANK 2 exit signal Aborted
_pmiu_daemon(SIGCHLD): [NID 05356] [c3-2c2s11n0] [Sat Dec 15 01:03:16 2018] PE RANK 1 exit signal Aborted
Stack trace:
#4 _starpu_mpi_progress_thread_func (arg=0x2aaab0f9e1c0) at /zhome/academic/HLRS/hlrs/hpcjschu/src/starpu/starpu-1.2.6/build/mpi/src/../../../mpi/src/starpu_mpi.c:1371 (at 0x00002aaaaacda59c)
#3 PMPI_Comm_rank () from /opt/cray/pe/lib64/libmpich_intel.so.3 (at 0x00002aaab0af4d62)
#2 MPIR_Err_return_comm () from /opt/cray/pe/lib64/libmpich_intel.so.3 (at 0x00002aaab0b94b6e)
#1 MPIR_Handle_fatal_error () from /opt/cray/pe/lib64/libmpich_intel.so.3 (at 0x00002aaab0b94a32)
#0 MPID_Abort () from /opt/cray/pe/lib64/libmpich_intel.so.3 (at 0x00002aaab0c0b700)
DDT reports argc_argv->comm
to be 0x0
, passed on line 1371 in starpu_mpi.c
:
MPI_Comm_rank(argc_argv->comm, &rank);
The struct looks like this:
*(((struct _starpu_mpi_argc_argv *)(arg))):
{initialize_mpi = 1140850688, argc = 0x4000000000, argv = 0x0, comm = 0x0, fargc = 64, fargv = 0x0}
CMake output:
$ PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HOME/opt-cray/starpu-1.2.6/lib/pkgconfig/ cmake .. -DCHAMELEON_SCHED_STARPU=$HOME/opt-cray/starpu-1.2.6/ -DBLAS_DIR=$MKLROOT -DCHAMELEON_USE_MPI=ON -DCMAKE_EXE_LINKER_FLAGS=-nofor_main -DCMAKE_BUILD_TYPE=Debug
-- Cray Programming Environment 2.5.15 Fortran
-- Cray Programming Environment 2.5.15 C
-- Cray Programming Environment 2.5.15 CXX
-- CHAMELEON_SCHED_STARPU is set to ON: CHAMELEON uses StarPU runtime
To use CHAMELEON with Quark runtime: set CHAMELEON_SCHED_QUARK to ON
To use CHAMELEON with PaRSEC runtime: set CHAMELEON_SCHED_PARSEC to ON
(CHAMELEON_SCHED_STARPU will be disabled)
-- CHAMELEON_USE_CUDA is set to OFF, turn it ON to use CUDA (unsupported by Quark)
-- CHAMELEON_ENABLE_TRACING is set to OFF, turn it ON to use FxT (with StarPU)
-- CHAMELEON_ENABLE_EXAMPLE is set to ON, turn it OFF to avoid building examples
-- CHAMELEON_ENABLE_TESTING is set to ON, turn it OFF to avoid building testing
-- CHAMELEON_ENABLE_TIMING is set to ON, turn it OFF to avoid building timing
-- CHAMELEON_SIMULATION is set to OFF, turn it ON to use SIMULATION mode (only with StarPU compiled with SimGrid)
-- CHAMELEON_ENABLE_PRUNING_STATS is set to OFF, turn it ON to build pruning statistics
-- A cache variable, namely CBLAS_DIR, has been set to specify the install directory of CBLAS
-- A cache variable, namely BLAS_DIR, has been set to specify the install directory of BLAS
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Looking for MKL BLAS: found
-- A library with BLAS API found.
-- BLAS_LIBRARIES /sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_lp64.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_sequential.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.so;/usr/lib64/libm.so
-- Looking for cblas_dscal
-- Looking for cblas_dscal - found
-- Looking for cblas: test with blas succeeds
-- Looking for cblas_dscal
-- Looking for cblas_dscal - found
-- A cache variable, namely LAPACKE_DIR, has been set to specify the install directory of LAPACKE
-- A cache variable, namely TMG_DIR, has been set to specify the install directory of TMG
-- A cache variable, namely LAPACK_DIR, has been set to specify the install directory of LAPACK
-- Looking for MKL BLAS: found
-- A library with BLAS API found.
-- BLAS_LIBRARIES /sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_lp64.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_sequential.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.so;/usr/lib64/libm.so
-- Looking for Fortran CHEEV
-- Looking for Fortran CHEEV - found
-- Looking for LAPACK in BLAS: found
-- A library with LAPACK API found.
-- LAPACK_LIBRARIES /sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_lp64.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_sequential.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.so;/usr/lib64/libm.so
-- Looking for Fortran dlarnv
-- Looking for Fortran dlarnv - found
-- Looking for Fortran dlagsy
-- Looking for Fortran dlagsy - found
-- Looking for tmg: test with lapack succeeds
-- Looking for Fortran dlarnv
-- Looking for Fortran dlarnv - found
-- Looking for Fortran dlagsy
-- Looking for Fortran dlagsy - found
-- Looking for MKL BLAS: found
-- A library with BLAS API found.
-- BLAS_LIBRARIES /sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_lp64.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_sequential.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.so;/usr/lib64/libm.so
-- Looking for Fortran CHEEV
-- Looking for Fortran CHEEV - found
-- Looking for LAPACK in BLAS: found
-- A library with LAPACK API found.
-- LAPACK_LIBRARIES /sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_lp64.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_sequential.so;/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.so;/usr/lib64/libm.so
-- Looking for LAPACKE_dgeqrf
-- Looking for LAPACKE_dgeqrf - found
-- Looking for LAPACKE_dlascl_work
-- Looking for LAPACKE_dlascl_work - found
-- Looking for lapacke: test with lapack succeeds
-- Looking for LAPACKE_dgeqrf
-- Looking for LAPACKE_dgeqrf - found
-- Looking for LAPACKE_dlatms_work
-- Looking for LAPACKE_dlatms_work - found
-- Looking for LAPACKE_dlascl_work
-- Looking for LAPACKE_dlascl_work - found
-- Add definition CHAMELEON_USE_MPI - Activate MPI in Chameleon
-- Looking for HWLOC - found using PkgConfig
-- Looking for hwloc_topology_init
-- Looking for hwloc_topology_init - found
-- Checking for one of the modules 'starpumpi-1.3'
-- Checking for one of the modules 'starpumpi-1.2'
-- Looking for STARPU - found using PkgConfig
-- Looking for starpu_init
-- Looking for starpu_init - found
-- Add definition CHAMELEON_SCHED_STARPU - Activate StarPU in Chameleon
-- Add definition HAVE_STARPU_IDLE_PREFETCH
-- Add definition HAVE_STARPU_ITERATION_PUSH
-- Add definition HAVE_STARPU_DATA_WONT_USE
-- Add definition HAVE_STARPU_DATA_SET_COORDINATES
-- Add definition HAVE_STARPU_MALLOC_ON_NODE_SET_DEFAULT_FLAGS
-- CHAMELEON_USE_MIGRATE is turned OFF because starpu_mpi_data_migrate not found
-- Add definition HAVE_STARPU_MPI_DATA_REGISTER - Activate use of starpu_mpi_data_register() in Chameleon with StarPU
-- Add definition HAVE_STARPU_MPI_COMM_RANK - Activate use of starpu_mpi_comm_rank() in Chameleon with StarPU
-- Add definition HAVE_STARPU_MPI_CACHED_RECEIVE
-- Add definition HAVE_STARPU_MPI_COMM_GET_ATTR
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/coreblas/include
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/coreblas/include - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/coreblas/compute
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/coreblas/compute - Done
-- A cache variable, namely EZTRACE_DIR, has been set to specify the install directory of EZTRACE
-- Checking for one of the modules 'eztrace'
-- Looking for EZTRACE - not found using PkgConfig.
Perhaps you should add the directory containing eztrace.pc to
the PKG_CONFIG_PATH environment variable.
-- Looking for EZTRACE - PkgConfig not used
-- Looking for eztrace -- eztrace.h not found
-- Looking for eztrace -- lib eztrace not found
-- Could NOT find EZTRACE (missing: EZTRACE_LIBRARIES EZTRACE_WORKS)
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/include
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/include - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/control
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/control - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/compute
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/compute - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/compute
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/compute - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/runtime/starpu
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/runtime/starpu - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/runtime/starpu
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/runtime/starpu - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/runtime/starpu
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/runtime/starpu - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/testing - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/timing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/timing - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/timing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/timing - Done
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/timing
-- Generate precision dependencies in /zhome/academic/HLRS/hlrs/hpcjschu/src/chameleon/chameleon_git/timing - Done
Configuration of Chameleon:
BUILDNAME ...........: Linux-amd64-cc-Debug-StarPU-MPI
SITE ................: eslogin005
Compiler: C .........: /opt/cray/pe/craype/2.5.15/bin/cc (Intel)
Compiler: Fortran ...: /opt/cray/pe/craype/2.5.15/bin/ftn (Intel)
Compiler: MPI .......: /opt/cray/pe/craype/2.5.15/bin/cc
compiler flags ......:
Linker: .............: /usr/bin/ld
Build type ..........: Debug
Build shared ........: OFF
CFlags ..............:
LDFlags .............:
EXE LDFlags .........: -nofor_main
Implementation paradigm
CUDA ................: OFF
MPI .................: ON
Runtime specific
PARSEC ..............: OFF
QUARK ...............: OFF
STARPU ..............: /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/starpu-1.2.6/
Kernels specific
BLAS ................: Intel MKL
LAPACK...............: Intel MKL
Trace ...............: OFF
Simulation mode .....: OFF
Binaries to build
documentation ........: OFF
example ..............: ON
testing ..............: ON
timing ...............: ON
CHAMELEON dependencies :
chameleon
coreblas
chameleon_starpu
hqr
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/starpu-1.2.6/lib/libstarpumpi-1.2.so
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/starpu-1.2.6/lib/libstarpu-1.2.so
/usr/lib64/libhwloc.so
/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_lp64.so
/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_sequential.so
/sw/hazelhen-cle6/hlrs/compiler/intel/Compiler/18.0.1.163/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.so
/usr/lib64/libm.so
/usr/lib64/librt.so
INSTALL_PREFIX ......: /usr/local
This is using the Cray MPICH with the Intel 18.0.1 compiler. I should note that I did not see this error when using Open MPI on the same machine. The same error occurs with fewer nodes.
Please let me know if I can provide any additional details.