batsim issueshttps://gitlab.inria.fr/batsim/batsim/-/issues2021-05-20T16:13:33+02:00https://gitlab.inria.fr/batsim/batsim/-/issues/118Incoherent documentation of csv output2021-05-20T16:13:33+02:00Raphaël BleuseIncoherent documentation of csv outputThe latest documentation and the code are out of sync (rev 38068329).
see:
- https://gitlab.inria.fr/batsim/batsim/blob/380683295787300ed9458668c0ab24e0b3f08b0b/src/export.cpp#L970
- https://gitlab.inria.fr/batsim/batsim/blob/38068329578...The latest documentation and the code are out of sync (rev 38068329).
see:
- https://gitlab.inria.fr/batsim/batsim/blob/380683295787300ed9458668c0ab24e0b3f08b0b/src/export.cpp#L970
- https://gitlab.inria.fr/batsim/batsim/blob/380683295787300ed9458668c0ab24e0b3f08b0b/docs/output-jobs.rst (or https://batsim.readthedocs.io/en/latest/output-jobs.html)https://gitlab.inria.fr/batsim/batsim/-/issues/117Fix csv columns order for simulation output2021-05-20T16:58:18+02:00Raphaël BleuseFix csv columns order for simulation outputAccording to the documentation, the simulation results are formatted as csv files (see https://batsim.readthedocs.io/en/latest/output-schedule.html and https://batsim.readthedocs.io/en/latest/output-jobs.html).
There is however no guaran...According to the documentation, the simulation results are formatted as csv files (see https://batsim.readthedocs.io/en/latest/output-schedule.html and https://batsim.readthedocs.io/en/latest/output-jobs.html).
There is however no guaranty the order of the columns and their types will not change.
It would make parsing results more robust if Batsim guaranty such properties for the output format.
This mainly requires documenting the order of columns and how it may evolve; and writing some tests to avoid any unintentional regressions.Millian PoquetMillian Poquethttps://gitlab.inria.fr/batsim/batsim/-/issues/115Package Batsim in some distros2022-01-20T05:43:40+01:00Millian PoquetPackage Batsim in some distrosWe only provide Nix packages for now. Packaging Batsim in some distros would reduce the entry cost of some users.
NixOS
=====
Everything is already packaged, we just have to put these packages in nixpkgs then to push batsim.
We could a...We only provide Nix packages for now. Packaging Batsim in some distros would reduce the entry cost of some users.
NixOS
=====
Everything is already packaged, we just have to put these packages in nixpkgs then to push batsim.
We could also just stay in kapack (slowly moving stuff to our NUR).
Archlinux: Mostly Done
======================
All deps are now available (either in classical repos or in AUR).
A PKGBUILD similar to this one should work, I'll publish it for next Batsim release.
```
# Maintainer: Millian Poquet <millian.poquet@gmail.com>
pkgname=batsim
pkgver=b0f59fd35a49aa331877b30d544a1e3afa4f86ff
pkgrel=1
pkgdesc='An infrastructure simulator that enables the study of resource management techniques.'
arch=('i686' 'x86_64')
url='https://framagit.org/batsim/batsim'
license=('LGPL-3.0')
source=('https://framagit.org/batsim/batsim/-/archive/b0f59fd35a49aa331877b30d544a1e3afa4f86ff/batsim-b0f59fd35a49aa331877b30d544a1e3afa4f86ff.tar.gz')
depends=('simgrid' 'boost' 'intervalset' 'rapidjson' 'pugixml' 'zeromq' 'redox-pkgconfig' 'docopt')
makedepends=('meson' 'ninja' 'pkgconf' 'gtest')
md5sums=('0e8057d057e3d616918b9a27742e490b')
build() {
cd "${srcdir}/${pkgname}-${pkgver}"
meson --prefix=/usr build
ninja -C build
}
check() {
cd "${srcdir}/${pkgname}-${pkgver}"
meson test -C build
}
package() {
cd "${srcdir}/${pkgname}-${pkgver}"
DESTDIR="${pkgdir}" meson install -C build
rm -rf build
}
```
- [x] boost
- [x] rapidjson
- [x] simgrid: AUR https://aur.archlinux.org/packages/simgrid/
- [x] redox: AUR https://aur.archlinux.org/packages/redox-pkgconfig/
- [x] hiredis
- [x] libev
- [x] libzmq: zeromq
- [x] docopt
- [x] intervalset: AUR https://aur.archlinux.org/packages/intervalset
- [x] pugixml
Debian: **NOPE**
================
Dependencies:
- [x] simgrid: https://packages.debian.org/sid/libsimgrid-dev (3.25 should be available soon)
- [x] boost
- [x] rapidjson: https://packages.debian.org/sid/rapidjson-dev
- [ ] redox: NOPE
- [x] hiredis: https://packages.debian.org/sid/libhiredis-dev
- [x] libev: https://packages.debian.org/sid/libev-dev
- [x] libzmq
- [x] docopt: https://packages.debian.org/sid/libdocopt-dev
- [ ] intervalset: TO DO?
- [x] pugixml: https://packages.debian.org/sid/libpugixml-devMillian PoquetMillian Poquethttps://gitlab.inria.fr/batsim/batsim/-/issues/114Enable dynamic host/link cumulative usage probing2021-08-31T19:38:53+02:00Millian PoquetEnable dynamic host/link cumulative usage probingGoal
====
Let the scheduler do measures about host or link usage so it can adapt its decisions depending on saturation/etc..
Implementation plan
===================
Create SimGrid plugins to monitor hosts/links usage
-------------------...Goal
====
Let the scheduler do measures about host or link usage so it can adapt its decisions depending on saturation/etc..
Implementation plan
===================
Create SimGrid plugins to monitor hosts/links usage
---------------------------------------------------
- Accumulate usage on state change (examples in energy plugins)
- Define a reset() function in the plugin API (to reset counters to 0)
- (perf issue: dynamically enable the plugin for the specified resources rather than for all of them all the time)
Expose it in the batprotocol
----------------------------
Something like:
```json
{
"timestamp": 10.0,
"type": "QUERY",
"data": {
"requests": {"consumed_bytes": {
"resources": "link42",
"reset_after_probe": true
}}
}
}
```
```json
{
"timestamp": 10.1,
"type": "ANSWER",
"data": {
"requests": {
"consumed_bytes": {
"link42": 4096
}
}
}
}
```Convenient and powerful probeshttps://gitlab.inria.fr/batsim/batsim/-/issues/113CI : cachix push seems broken2020-02-19T15:46:41+01:00Millian PoquetCI : cachix push seems brokenJobs rebuild some dependencies (gtest, batexpe...) each time, which should not happen.Jobs rebuild some dependencies (gtest, batexpe...) each time, which should not happen.https://gitlab.inria.fr/batsim/batsim/-/issues/112EXECUTE_JOB: mapping with non-smpi profiles2022-01-20T05:37:57+01:00Millian PoquetEXECUTE_JOB: mapping with non-smpi profilesThe `mapping` optional field of the `EXECUTE_JOB` event should work for various job profiles, but current protocol doc says that it only works for `smpi` ones.
I think the doc is wrong but it would be nice to test (and put under CI) suc...The `mapping` optional field of the `EXECUTE_JOB` event should work for various job profiles, but current protocol doc says that it only works for `smpi` ones.
I think the doc is wrong but it would be nice to test (and put under CI) such cases.https://gitlab.inria.fr/batsim/batsim/-/issues/111CI : force warning check on clang+gcc2019-06-05T00:56:24+02:00Millian PoquetCI : force warning check on clang+gcchttps://gitlab.inria.fr/batsim/batsim/-/issues/110Protocol doc: Automatize JSON examples2022-01-20T06:04:12+01:00Millian PoquetProtocol doc: Automatize JSON examplesCurrently (669c383), examples of protocol events are hardcoded in the RST documentation file.
This increases the likelihood of documentation/implementation mismatch.
It would be better to do the same as in the tutorials:
- Generate fi...Currently (669c383), examples of protocol events are hardcoded in the RST documentation file.
This increases the likelihood of documentation/implementation mismatch.
It would be better to do the same as in the tutorials:
- Generate files from a reproducible simulation.
Here, at least one JSON file per event type.
- Include the generated files instead of hardcoding examples in the RST file.
- Check in CI that there is no mismatch:
- Run the simulations and get a new copy of the example files.
- Make sure the result files match the ones included in the doc.5.0.0Millian PoquetMillian Poquethttps://gitlab.inria.fr/batsim/batsim/-/issues/109Help scheduler error diagnostic on SimGrid deadlock2022-01-20T15:41:22+01:00Millian PoquetHelp scheduler error diagnostic on SimGrid deadlockIt is common to face SimGrid deadlocks in Batsim when one develops its scheduler.
As SimGrid now enables the reaction to deadlock events thanks to the `on_deadlock` callback,
we should print a more user-friendly error to users when it h...It is common to face SimGrid deadlocks in Batsim when one develops its scheduler.
As SimGrid now enables the reaction to deadlock events thanks to the `on_deadlock` callback,
we should print a more user-friendly error to users when it happens.
- Common mistakes
- Current scheduler state
- ...5.0.0https://gitlab.inria.fr/batsim/batsim/-/issues/108Kill a sequence of delays: crash with "Internal error"2020-07-29T18:05:27+02:00Millian PoquetKill a sequence of delays: crash with "Internal error"**Describe the bug**
Killing a sequence of delays can make Batsim crash with an "Internal error"
**Provide information so the bug can be reproduced**
- Grab a copy of Batsim 19b6386 (framagit, pytest branch)
- Replace `pytest.xfail(...**Describe the bug**
Killing a sequence of delays can make Batsim crash with an "Internal error"
**Provide information so the bug can be reproduced**
- Grab a copy of Batsim 19b6386 (framagit, pytest branch)
- Replace `pytest.xfail("something seems wrong with sequences")` by `pass` in `./test/test_kill.py`.
- Run `nix-build ./release.nix -A integration_tests`
- Two tests from `test_kill.py` should fail as expected.
Test report can be opened with `firefox ./result/pytest_report.html`
**Logs**
```
...
[master_host:Scheduler REQ-REP:(4) 0.000045] [network/INFO] Received '{"now":10.000045,"events":[{"timestamp":0.000045,"type":"EXECUTE_JOB","data":{"job_id":"d3e758!1","alloc":"0"}},{"timestamp":10.000045,"type":"KILL_JOB","data":{"job_ids":["d3e758!1"]}}]}'
[Bourassa:job_d3e758!1:(5) 0.000060] [jobs_execution/INFO] Sleeping the whole task length
[Bourassa:job_d3e758!1:(5) 10.000060] [jobs_execution/INFO] Sleeping done
[Bourassa:job_d3e758!1:(5) 10.000060] [jobs_execution/INFO] Sleeping the whole task length
[master_host:killer_process:(6) 10.000060] /tmp/nix-build-batsim-3.0.0.drv-2/batsim/src/jobs.cpp:136: [root/CRITICAL] Internal error
(backtrace not set -- did you install Boost.Stacktrace?)
/home/carni/proj/batsim/test/test-out/kill-after10s-killer-small-delaysequences-noredis/cmd/batsim.bash: line 1: 15623 Aborted (core dumped) batsim -p '/home/carni/proj/batsim/platforms/small_platform.xml' -w '/home/carni/proj/batsim/workloads/test_sequence_delay.json' -e '/home/carni/proj/batsim/test/test-out/kill-after10s-killer-small-delaysequences-noredis/batres' --forward-profiles-on-submission
```
[pytest.log](/uploads/4d8fc33c42294b6c664be0c04c0c7097/pytest.log)https://gitlab.inria.fr/batsim/batsim/-/issues/107no-shed option is not working as expected2022-01-20T05:47:15+01:00MERCIER Michaelno-shed option is not working as expectedThe `--no-shed` CLI option documentation reads:
```
If set, the jobs in the workloads are
computed one by one, one after the other,
without scheduler nor Redis.
```
But currently, all the jobs launch at time 0 and share the resources.
I...The `--no-shed` CLI option documentation reads:
```
If set, the jobs in the workloads are
computed one by one, one after the other,
without scheduler nor Redis.
```
But currently, all the jobs launch at time 0 and share the resources.
I've made some changes to make the jobs start at their submission time and not before, but we still have resource sharing and all the jobs are placed on the first hosts and not dispatched on the resources or queued.
The question is what we do? The `no-sched-fix` branch contains my patch and this is the behavior I wanted bu maybe we should consider to have multiple very simple policies in argument to the `no-shed` option...https://gitlab.inria.fr/batsim/batsim/-/issues/105Add a progress bar2020-01-07T15:16:14+01:00MERCIER MichaelAdd a progress barWe can have a simple progress bar output by default, based on the number of jobs to be submitted by workload for example.
Something like:
```sh
Workload 1: 75% [|||||||||||||||||||||||||||||||||||||||||| ]
Workload 2: 72% ...We can have a simple progress bar output by default, based on the number of jobs to be submitted by workload for example.
Something like:
```sh
Workload 1: 75% [|||||||||||||||||||||||||||||||||||||||||| ]
Workload 2: 72% [|||||||||||||||||||||||||||||||||||||| ]
```https://gitlab.inria.fr/batsim/batsim/-/issues/104Add a new batsim output file that contains the simulation begins data2022-01-20T06:26:13+01:00MERCIER MichaelAdd a new batsim output file that contains the simulation begins dataSome of the information that is available in the simulation begin message are not exported by batsim but are needed for post analysis (e.g, workload hash/filename mapping, configuration, resource list and properties, etc.)
I propose tha...Some of the information that is available in the simulation begin message are not exported by batsim but are needed for post analysis (e.g, workload hash/filename mapping, configuration, resource list and properties, etc.)
I propose that Batsim dumps the content of this message directly in a json file, so it can be used afterward without parsing simulation logs (this is what I'm doing right now...).
I propose `<prefix>_metadata.json`, so we can extend this with any kind of information in the future.https://gitlab.inria.fr/batsim/batsim/-/issues/103Rework data staging job profiles2022-01-20T08:59:19+01:00MOMMESSIN ClementRework data staging job profilesThere are multiple problems with this profile:
- At the moment the alloc (`res` in the `EXECUTE_JOB` protocol event) permits to take a number different from 2 resources.
- The matrix of communication seems to be inverted in the case wher...There are multiple problems with this profile:
- At the moment the alloc (`res` in the `EXECUTE_JOB` protocol event) permits to take a number different from 2 resources.
- The matrix of communication seems to be inverted in the case where the `from` resource id is greater than the `to` resource id (i.e., when you want a communication between resources 14 -> 12) BUT the simulated communication is in the correct way.
- Some other ugly stuff that @mmercier can explain better?https://gitlab.inria.fr/batsim/batsim/-/issues/102Add replay of machine failures2019-01-28T13:27:52+01:00MOMMESSIN ClementAdd replay of machine failuresThat would be great to have a mechanism to replay failures of machines during simulation.
This could be implemented in a similar way as for static submission of jobs using workload input files.
An example of input JSON file:
```
{
...That would be great to have a mechanism to replay failures of machines during simulation.
This could be implemented in a similar way as for static submission of jobs using workload input files.
An example of input JSON file:
```
{
failures: [
{"machine_id" : 0, "failure_start" : 0, "failure_end": 1000},
{"machine_name" : "Foo", "failure_start" : 100, "failure_end" : 5000},
}
```
With for each item in the list the name or id of the SG host and the time interval where the failure appears.https://gitlab.inria.fr/batsim/batsim/-/issues/101set_receiver should be called on most (all?) mailboxes2018-12-14T15:26:31+01:00Millian Poquetset_receiver should be called on most (all?) mailboxes[SimGrid doc](https://simgrid.frama.io/simgrid/app_s4u.html#_CPPv3N7simgrid3s4u7Mailbox12set_receiverE8ActorPtr)[SimGrid doc](https://simgrid.frama.io/simgrid/app_s4u.html#_CPPv3N7simgrid3s4u7Mailbox12set_receiverE8ActorPtr)https://gitlab.inria.fr/batsim/batsim/-/issues/100Job ids appear twice in REGISTER_JOB2022-01-20T06:18:01+01:00MOMMESSIN ClementJob ids appear twice in REGISTER_JOBAs discussed during last meeting, the id of a job appears twice in a `REGISTER_JOB` event:
- a `job_id` field in the `data` field of the event
- an `id` field in the job description (`data[job]`) of the event
Second point: the `id` fiel...As discussed during last meeting, the id of a job appears twice in a `REGISTER_JOB` event:
- a `job_id` field in the `data` field of the event
- an `id` field in the job description (`data[job]`) of the event
Second point: the `id` field is sometimes of the form `wload_name!id` and sometimes just `id`.
The discussion finished with "we should get rid of the `id` field in the job description", is that correct?5.0.0https://gitlab.inria.fr/batsim/batsim/-/issues/99Improve column names of _jobs.csv2018-12-11T14:25:25+01:00Millian PoquetImprove column names of _jobs.csvAs @mmercier said, some fields of _job.csv are misleading and should be improved, such as `allocated_processors`.As @mmercier said, some fields of _job.csv are misleading and should be improved, such as `allocated_processors`.Batsim 3.0https://gitlab.inria.fr/batsim/batsim/-/issues/98Expose SimGrid log options to Batsim CLI2018-11-30T14:34:02+01:00Millian PoquetExpose SimGrid log options to Batsim CLISimilarly to `--sg-cfg`, we should expose a `--sg-log` command-line option.Similarly to `--sg-cfg`, we should expose a `--sg-log` command-line option.https://gitlab.inria.fr/batsim/batsim/-/issues/97SMPI profiles do not handle absolute trace filenames2019-01-15T23:58:54+01:00Millian PoquetSMPI profiles do not handle absolute trace filenamesJust tried and it gave me this error:
`
[0.000000] /home/carni/proj/batsim/src/profiles.cpp:621: [root/CRITICAL] Invalid JSON: profile 'LU.S.4' has an invalid 'trace' field ('/tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4'), which leads t...Just tried and it gave me this error:
`
[0.000000] /home/carni/proj/batsim/src/profiles.cpp:621: [root/CRITICAL] Invalid JSON: profile 'LU.S.4' has an invalid 'trace' field ('/tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4'), which leads to a non-existent file ('/tmp/meh/.//tmp/simgrid-template-smpi/NPB3.3-MPI/LU.S.4')
`