Mentions légales du service

Skip to content
Snippets Groups Projects

Fix issues with the actual spm in order to make it works in distributed

Merged Mathieu Faverge requested to merge faverge/spm:disp/spm_bases into master

This PR fixes minor issues that were not impacting the library in the shared memory case, but that would have a large impact on the distributed case.

  • Add a spmInitDist() function to initialize an spm with a specific communicator (MPI_COMM_WORLD by default)
  • Add a function to compute the glob2loc array when needed.
Edited by Mathieu Faverge

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added MPI label

  • added 1 commit

    • 8abc4162 - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • Mathieu Faverge changed the description

    changed the description

  • @all Good for review

  • added 1 commit

    • d77b8f28 - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • Cool, it's pretty clean like that !

  • Mathieu Faverge added 4 commits

    added 4 commits

    • cd91ffce - Update fortran wrapper to handle mpi
    • 17b76c58 - Facto spmInit
    • 0f9a1229 - Update wrapper generator
    • a3431358 - Attempt to make the python interface to work with MPI

    Compare with previous version

  • Mathieu Faverge added 2 commits

    added 2 commits

    Compare with previous version

  • added 1 commit

    • 95a7adae - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • added 1 commit

    • ecc7419a - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • added 1 commit

    • d2ce1602 - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • added 1 commit

    • e1907231 - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • Now it is really ready for review :D

  • I got the following error at the end of MPI tests :

    /usr/local/bin/mpiexec --host localhost:4 "-np" "4" "./spm_convert_tests" "--lap" "p:10:10:10:10.:2." -v2
    
    ...
    SUCCESS
    ...
       -- Check the spm after cycle : SUCCESS
    
    [tthor.local:39306] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 206
    [tthor.local:39305] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 206
     -- All tests PASSED --
    mpiexec(39303,0x70000ba06000) malloc: can't allocate region
    :*** mach_vm_map(size=1125899906846720, flags: 60000100) failed (error code=3)
    mpiexec(39303,0x70000ba06000) malloc: *** set a breakpoint in malloc_error_break to debug
    [tthor:39303] *** Process received signal ***
    [tthor:39303] Signal: Segmentation fault: 11 (11)
    [tthor:39303] Signal code: Address not mapped (1)
    [tthor:39303] Failing at address: 0x0
    [tthor:39303] [ 0] 0   libsystem_platform.dylib            0x00007fff6af405fd _sigtramp + 29
    [tthor:39303] [ 1] 0   ???                                 0x000070000ba05790 0x0 + 123145497368464
    [tthor:39303] [ 2] 0   mca_rml_oob.so                      0x0000000109513db0 orte_rml_oob_send_buffer_nb + 942
    [tthor:39303] [ 3] 0   libopen-rte.40.dylib                0x00000001087739c7 pmix_server_log_fn + 308
    [tthor:39303] [ 4] 0   mca_pmix_pmix3x.so                  0x00000001092ba543 server_log + 850
    [tthor:39303] [ 5] 0   mca_plog_default.so                 0x00000001094c8846 mylog + 503
    [tthor:39303] [ 6] 0   mca_pmix_pmix3x.so                  0x000000010933707f pmix_plog_base_log + 1015
    [tthor:39303] [ 7] 0   mca_pmix_pmix3x.so                  0x0000000109301fda pmix_server_log + 2059
    [tthor:39303] [ 8] 0   mca_pmix_pmix3x.so                  0x00000001092e8ca3 pmix_server_message_handler + 5279
    [tthor:39303] [ 9] 0   mca_pmix_pmix3x.so                  0x000000010933f6d3 OPAL_MCA_PMIX3X_pmix_ptl_base_process_msg + 735
    [tthor:39303] [10] 0   libevent-2.1.7.dylib                0x00000001088c933a event_process_active_single_queue + 635
    [tthor:39303] [11] 0   libevent-2.1.7.dylib                0x00000001088c6712 event_base_loop + 1012
    [tthor:39303] [12] 0   mca_pmix_pmix3x.so                  0x000000010930d320 progress_engine + 26
    [tthor:39303] [13] 0   libsystem_pthread.dylib             0x00007fff6af4c109 _pthread_start + 148
    [tthor:39303] [14] 0   libsystem_pthread.dylib             0x00007fff6af47b8b thread_start + 15
    [tthor:39303] *** End of error message ***
    Segmentation fault: 11
  • This is not an MPI test. And if you do that all, the processes will write to the same file, so I'm not very surprised.

    Is it a command that is part of the tests ? If yes it should be removed.

  • added 1 commit

    • c713b860 - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • Strange name : shm_shm ?

    ctest -R shm_shm_python_spm_driver -V
    UpdateCTestConfiguration  from :/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl
    Parse Config file:/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl
    UpdateCTestConfiguration  from :/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl
    Parse Config file:/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl
    Test project /Users/ramet/Work/pastix/gitlab/spm/build
    Constructing a list of tests
    Done constructing a list of tests
    Updating test list for fixtures
    Added 0 tests to meet fixture requirements
    Checking test dependency graph...
    Checking test dependency graph end
    test 167
        Start 167: shm_shm_python_spm_driver
    
    167: Test command: /usr/local/Frameworks/Python.framework/Versions/3.7/bin/python3.7 "/Users/ramet/Work/pastix/gitlab/spm/build/wrappers/python/examples/spm_driver.py"
    167: Environment variables:
    167:  PYTHONPATH=/Users/ramet/Work/pastix/gitlab/spm/build/wrappers/python
    167: Test timeout computed to be: 1500
    167: Traceback (most recent call last):
    167:   File "/Users/ramet/Work/pastix/gitlab/spm/build/wrappers/python/examples/spm_driver.py", line 21, in <module>
    167:     tmp = np.eye(2).dot(np.ones(2))
    167: AttributeError: module 'numpy' has no attribute 'eye'
    167: [tthor.local:40006] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 206
    1/1 Test #167: shm_shm_python_spm_driver ........***Failed    1.29 sec
    
    0% tests passed, 1 tests failed out of 1
    
    Label Time Summary:
    spm    =   1.29 sec*proc (1 test)
    
    Total Test time (real) =   1.35 sec
    
    The following tests FAILED:
    	167 - shm_shm_python_spm_driver (Failed)
    Errors while running CTest

    I may have trouble with my numpy library...

  • added 1 commit

    • b3f2135a - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • added 1 commit

    • ed5dedd4 - Fix issue with the actual spm in order to make it works in distributed

    Compare with previous version

  • With a clean install of python libs, 'numpy' error is resolved.

  • 100% tests passed, 0 tests failed out of 89
  • Mathieu Faverge mentioned in commit 8fdc0e5d

    mentioned in commit 8fdc0e5d

Please register or sign in to reply
Loading