Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • batsim batsim
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 38
    • Issues 38
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • batsim
  • batsimbatsim
  • Issues
  • #140

Closed
Open
Created Mar 28, 2022 by ANGELELLI Luc@langelel

c++ platform working in simgrid but not batsim

Describe the bug
I was trying to create c++ platforms, got it working on simgrid, but trying it on batsim failed. Following tests showed that i can't get any c++ platform working on batsim, but xml platforms work fine.

Provide information so the bug can be reproduced
Attached are the nix shells i'm using to compile and run, the platforms in c++, so and xml, and relevent other files.

default.nix platform_compiling.nix tuto-env.nix

very_small_platform.cpp very_small_platform.so very_small_platform.xml

test_one_delay_job.json s4u-comm-wait.cpp

I'm compiling the platform in platform_compiling.nix with :

$CXX -shared -o very_small_platform.so $(pkg-config --libs --cflags simgrid) very_small_platform.cpp

I'm running simgrid with simgrid-template-s4u-master. I compiled s4u-comm-wait.cpp with:

$CXX -o s4u-comm-wait $(pkg-config --libs --cflags simgrid) s4u-comm-wait.cpp

and ran :

./s4u-comm-wait ./very_small_platform.so

and

./s4u-comm-wait ./very_small_platform.xml

which gave very similar results (as expected, see logs).

Then I tried in batsim something similar. In the nix shell tuto-env.nix, I ran:

batsim -p ./platforms/very_small_platform.xml -w ./workloads/test_one_delay_job.json

and

batsim -p ./platforms/very_small_platform.so -w ./workloads/test_one_delay_job.json

With the xml platform, it ran as expected, but c++ platform crashed with an error

../src/xbt/config.cpp:255: [root/CRITICAL] Refusing to register the config element 'smpi/iprobe' twice.

As far as i can tell, i'm not using SMPI here.

Logs

  • Running with simgrid
[nix-shell:~/proj/simgrid-template-s4u-master]$ ./s4u-comm-wait ./very_small_platform.so
[master_host:sender:(1) 0.000000] [s4u_comm_wait/INFO] sleep_start_time : 5.000000 , sleep_test_time : 0.000000
[Jupiter:receiver:(2) 0.000000] [s4u_comm_wait/INFO] sleep_start_time : 1.000000 , sleep_test_time : 0.100000
[Jupiter:receiver:(2) 1.000000] [s4u_comm_wait/INFO] Wait for my first message
[master_host:sender:(1) 5.000000] [s4u_comm_wait/INFO] Send 'Message 0' to 'receiver'
[master_host:sender:(1) 17.041445] [s4u_comm_wait/INFO] Send 'Message 1' to 'receiver'
[Jupiter:receiver:(2) 17.100000] [s4u_comm_wait/INFO] I got a 'Message 0'.
[master_host:sender:(1) 29.141445] [s4u_comm_wait/INFO] Send 'Message 2' to 'receiver'
[Jupiter:receiver:(2) 29.200000] [s4u_comm_wait/INFO] I got a 'Message 1'.
[master_host:sender:(1) 41.241445] [s4u_comm_wait/INFO] Send 'finalize' to 'receiver'
[Jupiter:receiver:(2) 41.300000] [s4u_comm_wait/INFO] I got a 'Message 2'.
[Jupiter:receiver:(2) 41.400000] [s4u_comm_wait/INFO] I got a 'finalize'.

[nix-shell:~/proj/simgrid-template-s4u-master]$ ./s4u-comm-wait ./very_small_platform.xml
[master_host:sender:(1) 0.000000] [s4u_comm_wait/INFO] sleep_start_time : 5.000000 , sleep_test_time : 0.000000
[Jupiter:receiver:(2) 0.000000] [s4u_comm_wait/INFO] sleep_start_time : 1.000000 , sleep_test_time : 0.100000
[Jupiter:receiver:(2) 1.000000] [s4u_comm_wait/INFO] Wait for my first message
[master_host:sender:(1) 5.000000] [s4u_comm_wait/INFO] Send 'Message 0' to 'receiver'
[master_host:sender:(1) 17.643478] [s4u_comm_wait/INFO] Send 'Message 1' to 'receiver'
[Jupiter:receiver:(2) 17.700000] [s4u_comm_wait/INFO] I got a 'Message 0'.
[master_host:sender:(1) 30.343478] [s4u_comm_wait/INFO] Send 'Message 2' to 'receiver'
[Jupiter:receiver:(2) 30.400000] [s4u_comm_wait/INFO] I got a 'Message 1'.
[master_host:sender:(1) 43.043478] [s4u_comm_wait/INFO] Send 'finalize' to 'receiver'
[Jupiter:receiver:(2) 43.100000] [s4u_comm_wait/INFO] I got a 'Message 2'.
[Jupiter:receiver:(2) 43.200000] [s4u_comm_wait/INFO] I got a 'finalize'.
  • Running with batsim
[nix-shell:~/proj/Batsim]$ batsim -p ./platforms/very_small_platform.xml -w ./workloads/test_one_delay_job.json 
[0.000000] [batsim/INFO] Workload 'w0' corresponds to workload file '/home/defryder/proj/Batsim/./workloads/test_one_delay_job.json'.
[0.000000] [batsim/INFO] Batsim version: 4.1.0
[0.000000] [workload/INFO] Loading JSON workload '/home/defryder/proj/Batsim/./workloads/test_one_delay_job.json'...
[0.000000] [workload/INFO] JSON workload parsed sucessfully. Read 1 jobs and 1 profiles.
[0.000000] [workload/INFO] Checking workload validity...
[0.000000] [workload/INFO] Workload seems to be valid.
[0.000000] [workload/INFO] Removing unreferenced profiles from memory...
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'host/model' to 'ptask_L07'
[0.000000] [batsim/INFO] Checking whether SMPI is used or not...
[0.000000] [machines/INFO] Creating the machines from platform file './platforms/very_small_platform.xml'...
[0.000000] [xbt_cfg/INFO] Switching to the L07 model to handle parallel tasks.
[0.000000] [machines/INFO] Looking for master host 'master_host'
[0.000000] [machines/INFO] The machines have been created successfully. There are 1 computing machines.
[0.000000] [batsim/INFO] Batsim's export prefix is 'out'.
[0.000000] [batsim/INFO] The process 'workload_submitter_w0' has been created.
[0.000000] [batsim/INFO] The process 'server' has been created.
[master_host:Scheduler REQ-REP:(3) 0.000000] [network/INFO] Sending '{"now":0.000000,"events":[{"timestamp":0.000000,"type":"SIMULATION_BEGINS","data":{"nb_resources":1,"nb_compute_resources":1,"nb_storage_resources":0,"allow_compute_sharing":false,"allow_storage_sharing":true,"config":{"redis-enabled":false,"redis-hostname":"127.0.0.1","redis-port":6379,"redis-prefix":"default","profiles-forwarded-on-submission":false,"dynamic-jobs-enabled":false,"dynamic-jobs-acknowledged":false,"profile-reuse-enabled":false,"sched-config":"","forward-unknown-events":false},"compute_resources":[{"id":0,"name":"Jupiter","state":"idle","properties":{"role":""},"zone_properties":{}}],"storage_resources":[],"workloads":{"w0":"/home/defryder/proj/Batsim/./workloads/test_one_delay_job.json"},"profiles":{"w0":{"delay10":{"type":"delay","delay":10}}}}}]}'
^C
[master_host:Scheduler REQ-REP:(3) 0.000000] [ker_engine/INFO] CTRL-C pressed. The current status will be displayed before exit (disable that behavior with option 'debug/verbose-exit').
[master_host:Scheduler REQ-REP:(3) 0.000000] [ker_engine/INFO] 3 actors are still running, waiting for something.
[master_host:Scheduler REQ-REP:(3) 0.000000] [ker_engine/INFO] Legend of the following listing: "Actor <pid> (<name>@<host>): <status>"
[master_host:Scheduler REQ-REP:(3) 0.000000] [ker_engine/INFO] Actor 1 (workload_submitter_w0@master_host) simcall actor::CommIsendSimcall
[master_host:Scheduler REQ-REP:(3) 0.000000] [ker_engine/INFO] Actor 2 (server@master_host) simcall NONE
[master_host:Scheduler REQ-REP:(3) 0.000000] [ker_engine/INFO] Actor 3 (Scheduler REQ-REP@master_host) simcall NONE
Segmentation fault.
Segmentation fault (core dumped)

[nix-shell:~/proj/Batsim]$ batsim -p ./platforms/very_small_platform.so -w ./workloads/test_one_delay_job.json 
[0.000000] [batsim/INFO] Workload 'w0' corresponds to workload file '/home/defryder/proj/Batsim/./workloads/test_one_delay_job.json'.
[0.000000] [batsim/INFO] Batsim version: 4.1.0
[0.000000] [workload/INFO] Loading JSON workload '/home/defryder/proj/Batsim/./workloads/test_one_delay_job.json'...
[0.000000] [workload/INFO] JSON workload parsed sucessfully. Read 1 jobs and 1 profiles.
[0.000000] [workload/INFO] Checking workload validity...
[0.000000] [workload/INFO] Workload seems to be valid.
[0.000000] [workload/INFO] Removing unreferenced profiles from memory...
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'host/model' to 'ptask_L07'
[0.000000] [batsim/INFO] Checking whether SMPI is used or not...
[0.000000] [machines/INFO] Creating the machines from platform file './platforms/very_small_platform.so'...
[0.000000] ../src/xbt/config.cpp:255: [root/CRITICAL] Refusing to register the config element 'smpi/iprobe' twice.
Backtrace (displayed in actor maestro):
(backtrace not set -- did you install Boost.Stacktrace?)
Aborted (core dumped)

Possible fixes
(Share any insight you have about the bug.)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking