Protocol: Use flatbuffers instead of JSON?
Protocol: Use flatbuffers instead of JSON?
For performance reasons, I would like to change Batsim's architecture:
- Do not force the use of a network socket (enable the use of scheduler as libraries instead)
- Enable the use of a custom stupid and fast discrete event simulator (instead of simgrid)
These two changes require some modularity within Batsim's code itself, notably to separate the protocol de/serialization from the injection of events in the simulation. Currently, Batsim deserializes protocol messages manually (via calls to rapidjson functions) and direcly injects events in the simulation. There are no C++ structures corresponding to the protocol messages yet, and writing them would be quite painful.
Proposition: Use a serialization library that can generate tedious code for us. In particular, FlatBuffers seems to nicely fit our needs as it focuses on performance and usability.
FlatBuffers in a nutshell
Open source (apache license), developed and maintained by Google, already on most distros (including NixOS).
Concept very similar to protobuf:
- Describe data structures in a domain specific language
- Generate source code (in the programming language of your choice) corresponding to the desired data structures, as well as serializes/deserializes functions.
- Use generated functions in your code.
Pros
- I think that protocol maintainability would be improved.
- Less boilerplate in Batsim and in schedulers, notably in Batsim since JSON is not pleasant to use in C++.
- All the data structures involved in the protocol would be defined in a single versioned file. Currently, this is split in Batsim's code (real code and C++ comments) and on the Sphinx documentation.
- Propagating protocol updates to schedulers would become very simple, as most updates would consist in copying the new description file (from Batsim) to the various scheduler implementations.
- Sphinx doc could focus on pedagocical aspects, rather than being forced to describe data structures (comments are of course possible in the description language).
- Not forced to use binary format, JSON can still be used. This would allow to have a
--json
CLI flag so that Batsim generates JSON protocol messages instead of binary. This way, writing schedulers in funny languages (that have no flatbuffers support yet) would remain possible. - De/serialization performance would be greatly improved, which is consistent with current focus.
Cons
- Big protocol break. Even if JSON will remain possible, format will most probably break.
- Some work needed to update the scheduler libraries. FlatBuffers is available in all our scheduler libraries (C++ Python and Rust from official support, D from unofficial package) so the required amount of work would not be huge.
Concern 1: Is JSON usable?
Yes. Here are example codes to de/serialize into JSON with flatbuffers. This seems simple, we just need some compilation magic to put the description file into Batsim's memory (e.g., as a C string) so the --json
variant is convenient to use.
Concern 2: Won't this make schedulers annoying to compile?
Not much. Flatbuffers's compiler deterministically generates source code in the target language. Let's assume the target language is python. The generated python files can be versioned in pybatsim, so that the project can be compiled in pure python that remains transparent to users. In particular, it will not annoy users that use language-specific package managers: pip install pybatsim
will not require to install flatbuffers's compiler.
Having access to the flatbuffers's compiler will still be required to update de/serialization functions, but this is not a problem as we use Nix for our development environments.