Weird binding restriction on henri nodes with IB
Start from the following code:
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[])
{
system("hwloc-bind --get");
MPI_Init(&argc, &argv);
MPI_Barrier(MPI_COMM_WORLD);
system("hwloc-bind --get");
MPI_Finalize();
return EXIT_SUCCESS;
}
Build it:
mpicc barrier.c
If you launch without any specific options, the second call to hwloc-bind --get
says the process is bound only to one NUMA node, not the whole machine (what says the first call to hwloc-bind
). However, if you precise -DNMAD_DRIVER=tcp
, there is no such binding restriction.
% mpirun -n 2 -nodelist henri0,henri1 ./a.out
0x0000000f,0xffffffff
0x0000000f,0xffffffff
0x00000001,0x11111111
0x00000001,0x11111111
Connection to henri1 closed.
% mpirun -n 2 -DNMAD_DRIVER=tcp -nodelist henri0,henri1 ./a.out
0x0000000f,0xffffffff
0x0000000f,0xffffffff
0x0000000f,0xffffffff
0x0000000f,0xffffffff
The problem does not appear on billy0 and billy1 (regardless if the job is launched from a billy or a henri node).
I don't know where the difference comes from, I noticed the version of libibverbs is not the same on billys and henris.
I managed to dig down to find that the MPI process is bound to the NUMA node in the call to padico_group_barrier()
in this line.