batsim issueshttps://gitlab.inria.fr/batsim/batsim/-/issues2018-12-17T16:03:36+01:00https://gitlab.inria.fr/batsim/batsim/-/issues/15User model for dynamic submission with feedback2018-12-17T16:03:36+01:00MERCIER MichaelUser model for dynamic submission with feedbackhttps://gitlab.inria.fr/batsim/batsim/-/issues/35Add the possibility to start simulation from a particular state (and enable c...2019-02-05T08:22:40+01:00MERCIER MichaelAdd the possibility to start simulation from a particular state (and enable checkpoint/restart)If Batsim would be able to start from a particular state: keeping the states of jobs:
- in queue
- running
- not submitted yet
We would be able to avoid warmup effect and, more importantly, to have a checkpoint restart of the simulation...If Batsim would be able to start from a particular state: keeping the states of jobs:
- in queue
- running
- not submitted yet
We would be able to avoid warmup effect and, more importantly, to have a checkpoint restart of the simulation.
It allows Batsim's long simulations to be split in several small run in best effort for example.
Do you think it is doable?https://gitlab.inria.fr/batsim/batsim/-/issues/39Add scheduler version in log (protocol handshake)2022-01-20T09:03:16+01:00Millian PoquetAdd scheduler version in log (protocol handshake)Batsim version is currently written in the log and in the ``_schedule.csv`` output file.
It would be interesting to ask its version to the scheduler (i.e., as an ACK to ``SIMULATION_BEGINS``) and to log it.Batsim version is currently written in the log and in the ``_schedule.csv`` output file.
It would be interesting to ask its version to the scheduler (i.e., as an ACK to ``SIMULATION_BEGINS``) and to log it.https://gitlab.inria.fr/batsim/batsim/-/issues/40Add Event logging in Json2019-03-12T15:51:53+01:00MERCIER MichaelAdd Event logging in JsonLog each internal and external event of batsim in a file (one event per line in Json object).
Each simulation can be logged like this: a good example of this is Spark application logging.
Advantages:
- It can lead to an easier checkpoin...Log each internal and external event of batsim in a file (one event per line in Json object).
Each simulation can be logged like this: a good example of this is Spark application logging.
Advantages:
- It can lead to an easier checkpoint restart.
- Permits the creation of interactive visualization with step by step and details on each events of the simulation.https://gitlab.inria.fr/batsim/batsim/-/issues/41Improve logging when input files are bad.2018-01-04T12:40:12+01:00Millian PoquetImprove logging when input files are bad.In the following log, the error should be the last printed message.
### Reproduce
``robin batsim_nosched_badinput.yaml``
### Log
```
+ batsim -p /home/carni/proj/batsim/platforms/nosuchplatform.xml -w /home/carni/proj/batsim/workload_p...In the following log, the error should be the last printed message.
### Reproduce
``robin batsim_nosched_badinput.yaml``
### Log
```
+ batsim -p /home/carni/proj/batsim/platforms/nosuchplatform.xml -w /home/carni/proj/batsim/workload_profiles/test_workload_profile.json -e /tmp/robin/batsim_nosched_badinput/out --batexec
[0.000000] /home/carni/proj/batsim/src/batsim.cpp:271: [batsim/ERROR] Platform file '/home/carni/proj/batsim/platforms/nosuchplatform.xml' cannot be read.
[0.000000] [batsim/INFO] Workload '44f067' corresponds to workload file '/home/carni/proj/batsim/workload_profiles/test_workload_profile.json'.
```Millian PoquetMillian Poquethttps://gitlab.inria.fr/batsim/batsim/-/issues/43Missing test about jobs execution times2018-01-22T17:23:19+01:00Millian PoquetMissing test about jobs execution timeshttps://gitlab.inria.fr/batsim/batsim/-/issues/45Add code style checks2018-02-01T14:41:28+01:00Millian PoquetAdd code style checksThere is currently no check about the coding style (but Codacy that cries from time to time).
[clang-format](https://clang.llvm.org/docs/ClangFormat.html) could be used to ensure the style is respected.
The desired style can be define...There is currently no check about the coding style (but Codacy that cries from time to time).
[clang-format](https://clang.llvm.org/docs/ClangFormat.html) could be used to ensure the style is respected.
The desired style can be defined precisely ([doc](https://clang.llvm.org/docs/ClangFormatStyleOptions.html).
Checks would be interesting on several levels:
- git pre-commit hook, as done in SimGrid [there](https://github.com/simgrid/simgrid/blob/master/tools/git-hooks/clang-format.pre-commit) (needs update and cleanup)
- in the CIhttps://gitlab.inria.fr/batsim/batsim/-/issues/56Distribute a fast Batsim version2021-08-31T19:37:51+02:00Millian PoquetDistribute a fast Batsim versionIt would be interesting to propose a fast version of Batsim to users,
to improve the performance of simulations highly impacted by Batsim's overhead (very fast schedulers).
### TODOs
- SimGrid
- [ ] Evaluate the performance of the v...It would be interesting to propose a fast version of Batsim to users,
to improve the performance of simulations highly impacted by Batsim's overhead (very fast schedulers).
### TODOs
- SimGrid
- [ ] Evaluate the performance of the various context switch implementations.
- [ ] Select other compilation flags (would be easy).
- Batsim
- [ ] Separate internal and external checks (good'old [assert vs enforce](https://dlang.org/library/std/exception/enforce.html)).
This way we could disable all internal checks but keep the ones about the external files and processes.
- [x] Manage optimization in build system
- Distribution
- [x] Update the simgrid/batsim and batsim nix expressions5.0.0https://gitlab.inria.fr/batsim/batsim/-/issues/64Add parallel job composition2022-01-20T06:26:32+01:00MERCIER MichaelAdd parallel job compositionOnly the sequence composition (a list of tasks that are executed one after the other) is implemented but we lack the possibility to compose tasks in parallel.
Making the composed profile to do that would be great.Only the sequence composition (a list of tasks that are executed one after the other) is implemented but we lack the possibility to compose tasks in parallel.
Making the composed profile to do that would be great.5.0.0Millian PoquetMillian Poquethttps://gitlab.inria.fr/batsim/batsim/-/issues/65Missing test: dynamic submissions without submission_finished2021-08-20T14:07:16+02:00Millian PoquetMissing test: dynamic submissions without submission_finishedAccording to @cmommess, batsim may terminate the simulation while submission_finished has not been told by the scheduler.
Can you provide a MWE @cmommess?According to @cmommess, batsim may terminate the simulation while submission_finished has not been told by the scheduler.
Can you provide a MWE @cmommess?https://gitlab.inria.fr/batsim/batsim/-/issues/70Batexec do not manage properly jobs that are exeeding the number of available...2022-01-19T20:16:28+01:00MERCIER MichaelBatexec do not manage properly jobs that are exeeding the number of available resourcesWhen a job in the workload exceed the number of available resources, batexec try to allocate non-existent machines here:
https://gitlab.inria.fr/batsim/batsim/blob/master/src/job_submitter.cpp#L407
Leading to an inconsistent message:
`...When a job in the workload exceed the number of available resources, batexec try to allocate non-existent machines here:
https://gitlab.inria.fr/batsim/batsim/blob/master/src/job_submitter.cpp#L407
Leading to an inconsistent message:
```Cannot get machine 4: it does not exist```
Adding a proper assert to check if the jobs fits in the total number of available machine would be greathttps://gitlab.inria.fr/batsim/batsim/-/issues/87Reduce error message redundancy in the protocol2018-11-07T18:02:56+01:00Millian PoquetReduce error message redundancy in the protocolThe `protocol.cpp` file is full of redundant error messages, stating that the received message is invalid.
Such error messages should be kept, but a cleaner code would not repeat the prefix at each line...
One way to improve this is t...The `protocol.cpp` file is full of redundant error messages, stating that the received message is invalid.
Such error messages should be kept, but a cleaner code would not repeat the prefix at each line...
One way to improve this is to:
- Use asserts that throws an exception on error. Only use the meaningful message there without prefix.
- Surround protocol functions calls with try/catch. Rethrow errors with the desired prefix.https://gitlab.inria.fr/batsim/batsim/-/issues/91Project best practices?2018-11-06T02:15:00+01:00Millian PoquetProject best practices?Some people thought about conditions to make open source projects prosper.
It would be interesting to check how far we are from such conditions — and it may give hints about directions we should invest into.
For example, [this check l...Some people thought about conditions to make open source projects prosper.
It would be interesting to check how far we are from such conditions — and it may give hints about directions we should invest into.
For example, [this check list](https://bestpractices.coreinfrastructure.org/en/projects/1845#quality) seems reasonable and allowed SimGrid to improve.Batsim 3.0https://gitlab.inria.fr/batsim/batsim/-/issues/97SMPI profiles do not handle absolute trace filenames2019-01-15T23:58:54+01:00Millian PoquetSMPI profiles do not handle absolute trace filenamesJust tried and it gave me this error:
`
[0.000000] /home/carni/proj/batsim/src/profiles.cpp:621: [root/CRITICAL] Invalid JSON: profile 'LU.S.4' has an invalid 'trace' field ('/tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4'), which leads t...Just tried and it gave me this error:
`
[0.000000] /home/carni/proj/batsim/src/profiles.cpp:621: [root/CRITICAL] Invalid JSON: profile 'LU.S.4' has an invalid 'trace' field ('/tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4'), which leads to a non-existent file ('/tmp/meh/.//tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4')
`https://gitlab.inria.fr/batsim/batsim/-/issues/101set_receiver should be called on most (all?) mailboxes2018-12-14T15:26:31+01:00Millian Poquetset_receiver should be called on most (all?) mailboxes[SimGrid doc](https://simgrid.frama.io/simgrid/app_s4u.html#_CPPv3N7simgrid3s4u7Mailbox12set_receiverE8ActorPtr)[SimGrid doc](https://simgrid.frama.io/simgrid/app_s4u.html#_CPPv3N7simgrid3s4u7Mailbox12set_receiverE8ActorPtr)https://gitlab.inria.fr/batsim/batsim/-/issues/103Rework data staging job profiles2022-01-20T08:59:19+01:00MOMMESSIN ClementRework data staging job profilesThere are multiple problems with this profile:
- At the moment the alloc (`res` in the `EXECUTE_JOB` protocol event) permits to take a number different from 2 resources.
- The matrix of communication seems to be inverted in the case wher...There are multiple problems with this profile:
- At the moment the alloc (`res` in the `EXECUTE_JOB` protocol event) permits to take a number different from 2 resources.
- The matrix of communication seems to be inverted in the case where the `from` resource id is greater than the `to` resource id (i.e., when you want a communication between resources 14 -> 12) BUT the simulated communication is in the correct way.
- Some other ugly stuff that @mmercier can explain better?https://gitlab.inria.fr/batsim/batsim/-/issues/104Add a new batsim output file that contains the simulation begins data2022-01-20T06:26:13+01:00MERCIER MichaelAdd a new batsim output file that contains the simulation begins dataSome of the information that is available in the simulation begin message are not exported by batsim but are needed for post analysis (e.g, workload hash/filename mapping, configuration, resource list and properties, etc.)
I propose tha...Some of the information that is available in the simulation begin message are not exported by batsim but are needed for post analysis (e.g, workload hash/filename mapping, configuration, resource list and properties, etc.)
I propose that Batsim dumps the content of this message directly in a json file, so it can be used afterward without parsing simulation logs (this is what I'm doing right now...).
I propose `<prefix>_metadata.json`, so we can extend this with any kind of information in the future.https://gitlab.inria.fr/batsim/batsim/-/issues/105Add a progress bar2020-01-07T15:16:14+01:00MERCIER MichaelAdd a progress barWe can have a simple progress bar output by default, based on the number of jobs to be submitted by workload for example.
Something like:
```sh
Workload 1: 75% [|||||||||||||||||||||||||||||||||||||||||| ]
Workload 2: 72% ...We can have a simple progress bar output by default, based on the number of jobs to be submitted by workload for example.
Something like:
```sh
Workload 1: 75% [|||||||||||||||||||||||||||||||||||||||||| ]
Workload 2: 72% [|||||||||||||||||||||||||||||||||||||| ]
```https://gitlab.inria.fr/batsim/batsim/-/issues/107no-shed option is not working as expected2022-01-20T05:47:15+01:00MERCIER Michaelno-shed option is not working as expectedThe `--no-shed` CLI option documentation reads:
```
If set, the jobs in the workloads are
computed one by one, one after the other,
without scheduler nor Redis.
```
But currently, all the jobs launch at time 0 and share the resources.
I...The `--no-shed` CLI option documentation reads:
```
If set, the jobs in the workloads are
computed one by one, one after the other,
without scheduler nor Redis.
```
But currently, all the jobs launch at time 0 and share the resources.
I've made some changes to make the jobs start at their submission time and not before, but we still have resource sharing and all the jobs are placed on the first hosts and not dispatched on the resources or queued.
The question is what we do? The `no-sched-fix` branch contains my patch and this is the behavior I wanted bu maybe we should consider to have multiple very simple policies in argument to the `no-shed` option...https://gitlab.inria.fr/batsim/batsim/-/issues/109Help scheduler error diagnostic on SimGrid deadlock2022-01-20T15:41:22+01:00Millian PoquetHelp scheduler error diagnostic on SimGrid deadlockIt is common to face SimGrid deadlocks in Batsim when one develops its scheduler.
As SimGrid now enables the reaction to deadlock events thanks to the `on_deadlock` callback,
we should print a more user-friendly error to users when it h...It is common to face SimGrid deadlocks in Batsim when one develops its scheduler.
As SimGrid now enables the reaction to deadlock events thanks to the `on_deadlock` callback,
we should print a more user-friendly error to users when it happens.
- Common mistakes
- Current scheduler state
- ...5.0.0