Fix issues with the actual spm in order to make it works in distributed
This PR fixes minor issues that were not impacting the library in the shared memory case, but that would have a large impact on the distributed case.
- Add a spmInitDist() function to initialize an spm with a specific communicator (MPI_COMM_WORLD by default)
- Add a function to compute the glob2loc array when needed.
Merge request reports
Activity
added MPI label
added 1 commit
- 8abc4162 - Fix issue with the actual spm in order to make it works in distributed
@all Good for review
added 1 commit
- d77b8f28 - Fix issue with the actual spm in order to make it works in distributed
added 2 commits
added 1 commit
- 95a7adae - Fix issue with the actual spm in order to make it works in distributed
added 1 commit
- ecc7419a - Fix issue with the actual spm in order to make it works in distributed
added 1 commit
- d2ce1602 - Fix issue with the actual spm in order to make it works in distributed
added 1 commit
- e1907231 - Fix issue with the actual spm in order to make it works in distributed
I got the following error at the end of MPI tests :
/usr/local/bin/mpiexec --host localhost:4 "-np" "4" "./spm_convert_tests" "--lap" "p:10:10:10:10.:2." -v2 ... SUCCESS ... -- Check the spm after cycle : SUCCESS [tthor.local:39306] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 206 [tthor.local:39305] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 206 -- All tests PASSED -- mpiexec(39303,0x70000ba06000) malloc: can't allocate region :*** mach_vm_map(size=1125899906846720, flags: 60000100) failed (error code=3) mpiexec(39303,0x70000ba06000) malloc: *** set a breakpoint in malloc_error_break to debug [tthor:39303] *** Process received signal *** [tthor:39303] Signal: Segmentation fault: 11 (11) [tthor:39303] Signal code: Address not mapped (1) [tthor:39303] Failing at address: 0x0 [tthor:39303] [ 0] 0 libsystem_platform.dylib 0x00007fff6af405fd _sigtramp + 29 [tthor:39303] [ 1] 0 ??? 0x000070000ba05790 0x0 + 123145497368464 [tthor:39303] [ 2] 0 mca_rml_oob.so 0x0000000109513db0 orte_rml_oob_send_buffer_nb + 942 [tthor:39303] [ 3] 0 libopen-rte.40.dylib 0x00000001087739c7 pmix_server_log_fn + 308 [tthor:39303] [ 4] 0 mca_pmix_pmix3x.so 0x00000001092ba543 server_log + 850 [tthor:39303] [ 5] 0 mca_plog_default.so 0x00000001094c8846 mylog + 503 [tthor:39303] [ 6] 0 mca_pmix_pmix3x.so 0x000000010933707f pmix_plog_base_log + 1015 [tthor:39303] [ 7] 0 mca_pmix_pmix3x.so 0x0000000109301fda pmix_server_log + 2059 [tthor:39303] [ 8] 0 mca_pmix_pmix3x.so 0x00000001092e8ca3 pmix_server_message_handler + 5279 [tthor:39303] [ 9] 0 mca_pmix_pmix3x.so 0x000000010933f6d3 OPAL_MCA_PMIX3X_pmix_ptl_base_process_msg + 735 [tthor:39303] [10] 0 libevent-2.1.7.dylib 0x00000001088c933a event_process_active_single_queue + 635 [tthor:39303] [11] 0 libevent-2.1.7.dylib 0x00000001088c6712 event_base_loop + 1012 [tthor:39303] [12] 0 mca_pmix_pmix3x.so 0x000000010930d320 progress_engine + 26 [tthor:39303] [13] 0 libsystem_pthread.dylib 0x00007fff6af4c109 _pthread_start + 148 [tthor:39303] [14] 0 libsystem_pthread.dylib 0x00007fff6af47b8b thread_start + 15 [tthor:39303] *** End of error message *** Segmentation fault: 11
added 1 commit
- c713b860 - Fix issue with the actual spm in order to make it works in distributed
Strange name : shm_shm ?
ctest -R shm_shm_python_spm_driver -V UpdateCTestConfiguration from :/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl Parse Config file:/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl UpdateCTestConfiguration from :/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl Parse Config file:/Users/ramet/Work/pastix/gitlab/spm/build/DartConfiguration.tcl Test project /Users/ramet/Work/pastix/gitlab/spm/build Constructing a list of tests Done constructing a list of tests Updating test list for fixtures Added 0 tests to meet fixture requirements Checking test dependency graph... Checking test dependency graph end test 167 Start 167: shm_shm_python_spm_driver 167: Test command: /usr/local/Frameworks/Python.framework/Versions/3.7/bin/python3.7 "/Users/ramet/Work/pastix/gitlab/spm/build/wrappers/python/examples/spm_driver.py" 167: Environment variables: 167: PYTHONPATH=/Users/ramet/Work/pastix/gitlab/spm/build/wrappers/python 167: Test timeout computed to be: 1500 167: Traceback (most recent call last): 167: File "/Users/ramet/Work/pastix/gitlab/spm/build/wrappers/python/examples/spm_driver.py", line 21, in <module> 167: tmp = np.eye(2).dot(np.ones(2)) 167: AttributeError: module 'numpy' has no attribute 'eye' 167: [tthor.local:40006] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 206 1/1 Test #167: shm_shm_python_spm_driver ........***Failed 1.29 sec 0% tests passed, 1 tests failed out of 1 Label Time Summary: spm = 1.29 sec*proc (1 test) Total Test time (real) = 1.35 sec The following tests FAILED: 167 - shm_shm_python_spm_driver (Failed) Errors while running CTest
I may have trouble with my
numpy
library...added 1 commit
- b3f2135a - Fix issue with the actual spm in order to make it works in distributed
added 1 commit
- ed5dedd4 - Fix issue with the actual spm in order to make it works in distributed
mentioned in commit 8fdc0e5d