Mentions légales du service

Skip to content
Snippets Groups Projects

Add support for registering MPI data types

Merged THIBAULT Samuel requested to merge thibault/chameleon:mpi_register into master

Fixes #97 (closed)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    • e86f4c94 - StarPU: Add support for registering MPI data types

    Compare with previous version

  • Mathieu Faverge resolved all threads

    resolved all threads

  • added 1 commit

    Compare with previous version

  • @pswartva Have you tested in distributed with this version ? Because I'm not so sure the CHAM_tile_t are correctly initialized on the remote side. If it's working, we merge.

  • @pswartva I'm still waiting for you answer to merge this PR :)

  • So...

    I didn't check this PR sooner because I had problems which I thought were related to NMAD: on two henri, with InfiniBand:

    % mpirun -n 2 -nodelist henri0,henri1 -DSTARPU_RESERVE_NCPU=2 -DSTARPU_FXT_TRACE=0 ~/chameleon/build/testing/chameleon_stesting -o potrf --n 4800:50000:6400 -H --niter 3 --mtxfmt=1                                                                                                                                                                                     ✭
    # nm_strat_prio: init- max = 2
    # nm_strat_prio: init- max = 2
    [starpu][starpu_initialize] Warning: StarPU was configured with --enable-debug (-O0), and is thus not optimized
    [starpu][starpu_initialize] Warning: StarPU was configured with --enable-spinlock-check, which slows down a bit
    [starpu][starpu_initialize] Warning: StarPU was configured with --with-fxt, which slows down a bit, limits scalability and makes worker initialization sequential
    [starpu][starpu_initialize] Warning: StarPU was configured with --enable-debug (-O0), and is thus not optimized
    [starpu][starpu_initialize] Warning: StarPU was configured with --enable-spinlock-check, which slows down a bit
    [starpu][starpu_initialize] Warning: StarPU was configured with --with-fxt, which slows down a bit, limits scalability and makes worker initialization sequential
     Id Function     threads gpus  P  Q mtxfmt  nb uplo        n   lda       seedA          time        gflops
    /home/pswartva/chameleon/build/testing/chameleon_stesting(+0x630fc)[0x55f696b4f0fc]
    /home/pswartva/starpu-build/lib/libstarpu-1.3.so.0(starpu_data_unpack+0x150)[0x7f76ee5d3bf0]
    /home/pswartva/starpu-build/lib/libstarpumpi-1.3.so.0(_starpu_mpi_handle_request_termination+0x89)[0x7f76ee861039]
    /home/pswartva/starpu-build/lib/libstarpumpi-1.3.so.0(_starpu_mpi_nmad_end_coop_callback+0x34)[0x7f76ee859c68]
    /home/pswartva/pm2/soft/x86_64/lib/libnmad.so(normal_receive_handler+0x173)[0x7f76ee4454a4]
    /home/pswartva/pm2/soft/x86_64/lib/libnmad.so(+0x8423d)[0x7f76ee41323d]
    /home/pswartva/pm2/soft/x86_64/lib/libnmad.so(+0x24b73)[0x7f76ee3b3b73]
    /home/pswartva/pm2/soft/x86_64/lib/libnmad.so(+0x2b412)[0x7f76ee3ba412]
    /home/pswartva/pm2/soft/x86_64/lib/libpioman.so(+0x107d5)[0x7f76ee3817d5]
    /home/pswartva/pm2/soft/x86_64/lib/libpioman.so(piom_ltask_schedule+0x127)[0x7f76ee382503]
    /home/pswartva/pm2/soft/x86_64/lib/libpioman.so(+0xd9a1)[0x7f76ee37e9a1]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x8fb7)[0x7f76e7d14fb7]
    /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f76e780f1af]
    chameleon_stesting: /home/pswartva/chameleon/runtime/starpu/interface/cham_tile_interface.c:335: cti_unpack_data: Assertion `cham_tile_interface->tile.m == dsttile.m' failed.

    Indeed, according to GDB:

    #4  0x00005555555b7145 in cti_unpack_data (handle=0x555555aca8a0, node=0, ptr=0x7ffe901b2890, count=409632) at /home/pswartva/chameleon/runtime/starpu/interface/cham_tile_interface.c:335
    335         STARPU_ASSERT( cham_tile_interface->tile.m == dsttile.m );
    (gdb) p *cham_tile_interface
    $1 = {id = STARPU_MAX_INTERFACE_ID, dev_handle = 140732193460240, flttype = ChamRealFloat, allocsize = 0, tilesize = 409600, tile = {format = 0 '\000', m = 320, n = 320, ld = 320, mat = 0x7ffec4665010}}
    (gdb) p dsttile
    $2 = {format = 0 '\000', m = 0, n = 0, ld = 0, mat = 0x0}

    But this problem disappeared when using TCP instead of InfiniBand, hence my suspicion towards NMAD...

    Anyway, I tried this PR, it works, and on the top of that the problem I just described above doesn't appear anymore ! (even with InfiniBand)

    So I don't know if this PR kills two birds with one stone, or if it is a hard race condition or something else. But this PR seems working on 2 nodes with NMAD.

  • This PR indeed makes chameleon avoid the pack/unpack way completely, and thus not even go through the assertion you mentioned, so it's not surprising that it "fixes" it :)

    Note that this PR will avoid the extra pack/unpack copy, so it will most probably improve your benchmarks.

    That being said, normally things should be working without with PR, so it would still be useful to open an issue saying "when I comment out starpu_mpi_interface_datatype_register to force pack/unpack, I'm getting this issue". The fact that the received content of dsttile is completely zero with infiniband but not with TCP is indeed very questioning.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading