batsim issueshttps://gitlab.inria.fr/batsim/batsim/-/issues2022-01-20T08:59:19+01:00https://gitlab.inria.fr/batsim/batsim/-/issues/103Rework data staging job profiles2022-01-20T08:59:19+01:00MOMMESSIN ClementRework data staging job profilesThere are multiple problems with this profile:
- At the moment the alloc (`res` in the `EXECUTE_JOB` protocol event) permits to take a number different from 2 resources.
- The matrix of communication seems to be inverted in the case wher...There are multiple problems with this profile:
- At the moment the alloc (`res` in the `EXECUTE_JOB` protocol event) permits to take a number different from 2 resources.
- The matrix of communication seems to be inverted in the case where the `from` resource id is greater than the `to` resource id (i.e., when you want a communication between resources 14 -> 12) BUT the simulated communication is in the correct way.
- Some other ugly stuff that @mmercier can explain better?https://gitlab.inria.fr/batsim/batsim/-/issues/102Add replay of machine failures2019-01-28T13:27:52+01:00MOMMESSIN ClementAdd replay of machine failuresThat would be great to have a mechanism to replay failures of machines during simulation.
This could be implemented in a similar way as for static submission of jobs using workload input files.
An example of input JSON file:
```
{
...That would be great to have a mechanism to replay failures of machines during simulation.
This could be implemented in a similar way as for static submission of jobs using workload input files.
An example of input JSON file:
```
{
failures: [
{"machine_id" : 0, "failure_start" : 0, "failure_end": 1000},
{"machine_name" : "Foo", "failure_start" : 100, "failure_end" : 5000},
}
```
With for each item in the list the name or id of the SG host and the time interval where the failure appears.https://gitlab.inria.fr/batsim/batsim/-/issues/101set_receiver should be called on most (all?) mailboxes2018-12-14T15:26:31+01:00Millian Poquetset_receiver should be called on most (all?) mailboxes[SimGrid doc](https://simgrid.frama.io/simgrid/app_s4u.html#_CPPv3N7simgrid3s4u7Mailbox12set_receiverE8ActorPtr)[SimGrid doc](https://simgrid.frama.io/simgrid/app_s4u.html#_CPPv3N7simgrid3s4u7Mailbox12set_receiverE8ActorPtr)https://gitlab.inria.fr/batsim/batsim/-/issues/98Expose SimGrid log options to Batsim CLI2018-11-30T14:34:02+01:00Millian PoquetExpose SimGrid log options to Batsim CLISimilarly to `--sg-cfg`, we should expose a `--sg-log` command-line option.Similarly to `--sg-cfg`, we should expose a `--sg-log` command-line option.https://gitlab.inria.fr/batsim/batsim/-/issues/97SMPI profiles do not handle absolute trace filenames2019-01-15T23:58:54+01:00Millian PoquetSMPI profiles do not handle absolute trace filenamesJust tried and it gave me this error:
`
[0.000000] /home/carni/proj/batsim/src/profiles.cpp:621: [root/CRITICAL] Invalid JSON: profile 'LU.S.4' has an invalid 'trace' field ('/tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4'), which leads t...Just tried and it gave me this error:
`
[0.000000] /home/carni/proj/batsim/src/profiles.cpp:621: [root/CRITICAL] Invalid JSON: profile 'LU.S.4' has an invalid 'trace' field ('/tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4'), which leads to a non-existent file ('/tmp/meh/.//tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4')
`https://gitlab.inria.fr/batsim/batsim/-/issues/96Tutorials should be on periodic CI2019-08-12T14:19:29+02:00Millian PoquetTutorials should be on periodic CIThere is now at least one Batsim tutorial on the doc.
The tutorials are however not run by a CI yet, which will be very embarrassing with future breaks.
I propose to use a nightly CI for the tutorials, as it is important to make sure ...There is now at least one Batsim tutorial on the doc.
The tutorials are however not run by a CI yet, which will be very embarrassing with future breaks.
I propose to use a nightly CI for the tutorials, as it is important to make sure kapack remains consistent with them.
Running the tutorials in some CI jobs would not be very hard, as all code blocks are separated from the documentation itself.https://gitlab.inria.fr/batsim/batsim/-/issues/94[ci] ignore useless files from src?2022-01-20T05:55:25+01:00Millian Poquet[ci] ignore useless files from src?Filtering sources would avoid Nix to rebuild the binary when random files are modified — e.g., CI files or doc.Filtering sources would avoid Nix to rebuild the binary when random files are modified — e.g., CI files or doc.https://gitlab.inria.fr/batsim/batsim/-/issues/92CMake -> Meson?2019-08-12T14:19:52+02:00Millian PoquetCMake -> Meson?[Meson] is gaining popularity and we may think about using it for building and testing Batsim.
Many big projects are now using it as their main build system, so I do not think we will be blocked by missing features.
IMHO the main bene...[Meson] is gaining popularity and we may think about using it for building and testing Batsim.
Many big projects are now using it as their main build system, so I do not think we will be blocked by missing features.
IMHO the main benefit would be an improved maintainability.
### Pros
- Concise description.
- Well designed, consistent, default options are the ones desired (e.g., parallel build/test, enable warnings).
- Does not reimplement the wheel.
- Use [Ninja] as main backend — therefore generates a **good** Ninja description file.
Ninja is blazing fast. Dependencies between objects are finer grain than CMake ones.
Incremental build is both saner and faster.
(NB: CMake can use Ninja [but not as well as Meson](http://mesonbuild.com/Simple-comparison.html)).
- Uses external tool for dependency management (e.g., [pkg-config]). I believe this choice is good.
Letting each project determining how it should be used (e.g., by providing a .pc file) is probably the best choice.
- **It might be a way to escape the current test hell**.
(read `test/robin` if you do not know what I mean, or just ask @cmommess how easy it is to simply remove tests...)
- [Meson's syntax] provides **usable loops** and data types, such as arrays and **dictionaries**!
- Meson's testing system seems well designed (e.g., wrapping simple tests with valgrind or gdb is trivial)
### Cons
- Less mature. Documentation is not perfect but seems to be enough for our use.
(NB: CMake documentation is not great, but a lot of help can be found on stakoverflow-like sites).
- Having an up-to-date Meson may be hard on some systems [such as NixOS](https://github.com/NixOS/nixpkgs/pull/46020).
- Some work is required to change our CMake stuff into something else.
[Meson]: https://mesonbuild.com
[Meson's syntax]: https://mesonbuild.com/Syntax.html
[Ninja]: https://ninja-build.org/
[pkg-config]: https://www.freedesktop.org/wiki/Software/pkg-config/https://gitlab.inria.fr/batsim/batsim/-/issues/90Add batsim to SimGrid's nightly CI2018-11-29T18:50:10+01:00Millian PoquetAdd batsim to SimGrid's nightly CIAs #37 seems on its way to be fixed, we should take some time to include Batsim to SimGrid continuous integration suite.
### Why?
This would allow SimGrid developers to detect more easily when they break things, as Batsim uses **many** ...As #37 seems on its way to be fixed, we should take some time to include Batsim to SimGrid continuous integration suite.
### Why?
This would allow SimGrid developers to detect more easily when they break things, as Batsim uses **many** SimGrid features.
This would also be very beneficial for Batsim.
- This would limit the appearance of deep breaks (e.g., #37), as we would seem them as soon as they appear (and can request change in SimGrid at this time or do it ourselves).
- This would help keeping up-to-date with SimGrid, as breaks would be detected sooner.
- Knowing whether Batsim works with up-to-date SimGrid would also be beneficial to end users, avoiding the version hell we had with SimGrid's clone (for batsim-2.0.0).
### What?
It would be interesting to know (at least):
- Whether the latest Batsim release works with the latest SimGrid commit.
- Whether the latest Batsim version (master branch) works with latest SimGrid commit.
### How?
Many ways to do it. The more obvious one is to repeat what has been done for StarPU.
This has been done on Jenkins (https://ci.inria.fr/simgrid/).
As SimGrid uses many other CI systems, we could also provide something else.
- SimGrid dev repo is now hosted on framagit, which should allow us to easily use our Nix recipes on Gitlab CI.
- We may hack something on Travis (GitHub)
- We may also set up an Hydra infrastructure.https://gitlab.inria.fr/batsim/batsim/-/issues/87Reduce error message redundancy in the protocol2018-11-07T18:02:56+01:00Millian PoquetReduce error message redundancy in the protocolThe `protocol.cpp` file is full of redundant error messages, stating that the received message is invalid.
Such error messages should be kept, but a cleaner code would not repeat the prefix at each line...
One way to improve this is t...The `protocol.cpp` file is full of redundant error messages, stating that the received message is invalid.
Such error messages should be kept, but a cleaner code would not repeat the prefix at each line...
One way to improve this is to:
- Use asserts that throws an exception on error. Only use the meaningful message there without prefix.
- Surround protocol functions calls with try/catch. Rethrow errors with the desired prefix.https://gitlab.inria.fr/batsim/batsim/-/issues/86Msg_par with different number of resources2018-10-15T14:28:05+02:00MOMMESSIN ClementMsg_par with different number of resourcesWhen dynamically submitting a msg_par job asking a different number of resources than the size of the cpu matrix (or sqrt of com matrix) Batsim executes the job without complaining.When dynamically submitting a msg_par job asking a different number of resources than the size of the cpu matrix (or sqrt of com matrix) Batsim executes the job without complaining.https://gitlab.inria.fr/batsim/batsim/-/issues/84Remove git submodules?2018-10-02T14:17:21+02:00Millian PoquetRemove git submodules?Git submodules remain for batsched and pybatsim, but they are outdated and not used by the CI anymore.
Is anyone still using them or should I remove them?Git submodules remain for batsched and pybatsim, but they are outdated and not used by the CI anymore.
Is anyone still using them or should I remove them?https://gitlab.inria.fr/batsim/batsim/-/issues/83Coverage results2020-02-19T15:52:54+01:00Millian PoquetCoverage resultsBatsim should give coverage results for its tests.
This has been implemented in batsched recently, the same technique can be applied for Batsim.Batsim should give coverage results for its tests.
This has been implemented in batsched recently, the same technique can be applied for Batsim.Millian PoquetMillian Poquethttps://gitlab.inria.fr/batsim/batsim/-/issues/82Avoid Deadlock when a host has a speed of 02018-10-15T14:52:05+02:00MERCIER MichaelAvoid Deadlock when a host has a speed of 0A check should avoid any allocation with computation on a host that has a speed of 0.A check should avoid any allocation with computation on a host that has a speed of 0.https://gitlab.inria.fr/batsim/batsim/-/issues/81[doc] bad markdown link to demo2018-10-25T10:24:46+02:00Millian Poquet[doc] bad markdown link to demoThe markdown link from the README to the notebook is not rendered correctly by gitlab @mmercier (free commit ahead!)The markdown link from the README to the notebook is not rendered correctly by gitlab @mmercier (free commit ahead!)https://gitlab.inria.fr/batsim/batsim/-/issues/80Making tests run in parallel2018-12-26T12:42:10+01:00MERCIER MichaelMaking tests run in parallelAdd a function to select a free port automatically so the tests can run in parallel.
It would speed up the tests.
It would also ensure that multiple batsim can run in the same machine without collision.
I use to write something to do...Add a function to select a free port automatically so the tests can run in parallel.
It would speed up the tests.
It would also ensure that multiple batsim can run in the same machine without collision.
I use to write something to do this, that we can use:
https://github.com/oar-team/kameleon/blob/380ca697bad28b120e7df65c6262402f073d8107/contrib/kameleon_bashrc.sh#L208https://gitlab.inria.fr/batsim/batsim/-/issues/79Better ci2018-11-09T16:13:40+01:00MERCIER MichaelBetter ci- [x] Use a lighter docker Nix based image for the CI.
Needed some hack on the base Nix image, see https://github.com/LnL7/nix-docker/pull/24
This branch can be used tu build the `oarteam/nix` and `oarteam/batsim_ci` images:
https://git...- [x] Use a lighter docker Nix based image for the CI.
Needed some hack on the base Nix image, see https://github.com/LnL7/nix-docker/pull/24
This branch can be used tu build the `oarteam/nix` and `oarteam/batsim_ci` images:
https://github.com/mickours/nix-docker/tree/mickours
- [ ] Use the cache of gitlab CI for the store.
**NOT POSSIBLE** because /nix is out of the project scope
- [X] Use the scripts in. /ci inside the Nix expressions for tests.
**Done** in kapack with e74bdf3 and in batsim with 9ed97d4
- [ ] use the scheduling capabilities of the gitlab CI to test with upstream Simgrid regularly
**DEFERED** This is discussed in https://gitlab.inria.fr/batsim/batsim/issues/90
- [x] check that Batsim compiles without warning on both clang and gcc.https://gitlab.inria.fr/batsim/batsim/-/issues/77Sphinx documentation?2018-10-16T18:12:38+02:00Millian PoquetSphinx documentation?What about transforming our documentation to a readthedocs-like one?
I didn't like rst a lot at first sight but it has very interesting features.
1. Include the content of files.
This is amazing to create up-to-date and CI-proof ...What about transforming our documentation to a readthedocs-like one?
I didn't like rst a lot at first sight but it has very interesting features.
1. Include the content of files.
This is amazing to create up-to-date and CI-proof tutorials.
2. Clear navigation between parts.
Current architecture is okay, but moving from one file to another is hard.
Is is also quite easy to miss a documentation part with current markdown doc.https://gitlab.inria.fr/batsim/batsim/-/issues/75Separate Simgrid process tracing and simgrid resource usage tracing in two op...2018-11-17T14:35:43+01:00MERCIER MichaelSeparate Simgrid process tracing and simgrid resource usage tracing in two optionsIt is now under the same option, this should be in two options.
Also, The simgrid process tracing bug when killing a process, see https://github.com/simgrid/simgrid/issues/285It is now under the same option, this should be in two options.
Also, The simgrid process tracing bug when killing a process, see https://github.com/simgrid/simgrid/issues/285https://gitlab.inria.fr/batsim/batsim/-/issues/74add a "no more jobs in workload" event2018-08-22T17:19:13+02:00MERCIER Michaeladd a "no more jobs in workload" eventThe schedulers that are doing dynamic submission have to notify batsim that the submission is finished.
But, without knowing if there is still jobs in the workload(s) that will be submitted in the future, and if there is a pause in the...The schedulers that are doing dynamic submission have to notify batsim that the submission is finished.
But, without knowing if there is still jobs in the workload(s) that will be submitted in the future, and if there is a pause in the submissions, the scheduler is unable to take a decision. That's why a "no more jobs in workload" event is required.