Commit b382ed03 authored by Millian Poquet's avatar Millian Poquet

[doc] remove old doc, squeeze readme

parent 4e47e50f
......@@ -13,133 +13,11 @@ Batsim
[![changelog](https://img.shields.io/badge/doc-changelog-blue.svg)](https://github.com/oar-team/batsim/blob/master/doc/changelog.md)
[![protocol](https://img.shields.io/badge/doc-protocol-blue.svg)](https://github.com/oar-team/batsim/blob/master/doc/proto_description.md)
The next vesion of Batsim documentation can be found
[on readthedocs](https://batsim.readthedocs.io/en/latest/) (**under construction !**).
Batsim is an infrastructure simulator that enables the study of resource management techniques.
It can be used for many scenarios:
- Compare various scheduling heuristics (research prototypes or real implementations)
- Study non simple phenomena such as network interferences, energy consumption (DVFS, shutdown…) or I/O data movements
Batsim is a Batch scheduler simulator.
A batch scheduler -- AKA Resources and Jobs Management System (RJMS) --
is a system that manages resources in large-scale computing centers,
notably by scheduling and placing jobs, and by setting up energy policies.
Batsim is open source and distributed under LGPL-3.0 license.
See [copyright](copyright) for more details.
Please refer to [Batsim's documentation](https://batsim.readthedocs.io/en/latest/) for more.
![Batsim overview figure]
Batsim simulates a computing center behavior.
It is made such that any event-based scheduling algorithm can be plugged to it.
Thus, it allows to compare decision algorithms coming from production and
academics worlds.
Getting started
---------------
The best way to start to use Batsim, or at least to see how it works, is to have
a look at the [Batsim demo](demo).
**Note**: Others workloads and platforms examples can be found in the
current repository. More sophisticated (and more up-to-date) platforms can be
found in the [SimGrid repository](https://github.com/simgrid/simgrid).
External References
-------------------
* Chapters 2 and 3 of Millian Poquet's
[PhD manuscript](https://mpoquet.github.io/research/2017-phd-manuscript.pdf)
explain in detail some of Batsim design choices and how Batsim works
internally. The corresponding
[defense slides](https://mpoquet.github.io/research/2017-phd-slides.pdf)
may also interest you.
* Batsim scientific publication pre-print is available on HAL:
https://hal.inria.fr/hal-01333471v1.
The corresponding [slides](./publications/Batsim\_JSSPP\_2016.pdf) may
also interest you for a better understanding of what Batsim is
and for seeking whether it may be interesting for you.
These slides have been made for the JSSPP 2016 IPDPS workshop.
* Batsim code documentation can be found
[there](http://batsim.gforge.inria.fr/batsim/doxygen).
Quick links
-----------
- Please read our [contribution guidelines](CONTRIBUTING.md) if you want to
contribute to Batsim
- The [changelog](doc/changelog.md) summarizes information about the project
evolution.
- Tutorials shows how to use Batsim and how it works:
- The [usage tutorial](doc/tuto_usage.md) explains how to execute a Batsim
simulation, and how to setup a development docker environment
- The [time tutorial](doc/tuto_time.md) explains how the time is managed in a
Batsim simulation, shows essential protocol communications and gives an
overview of how Batsim works internally
- The [protocol documentation](doc/proto_description.md) defines the protocol
used between Batsim and the scheduling algorithms
Visualisation
-------------
Batsim output files can be visualised using external tools:
- [Evalys](http://evalys.readthedocs.io) can be used to visualise Gantt chart from the Batsim job.csv files
and SWF files
- [Vite] for the Pajé traces
Tools
-----
As Batsim simulation involve multiple processes, they may be tricky to manage.
Some tools already exist to achieve this goal:
- python tools are located [there](./tools/experiments)
- a more robust and modular approach is conducted
[there](https://gitlab.inria.fr/batsim/batexpe) and is expected to deprecate
aforementioned python tools.
You can also find other tools in the [tools](./tools) directory,
for example to conduct convertions between SWF and Batsim workload formats.
Write your own scheduler (or adapt an existing one)
---------------------------------------------------
Schedulers must follow a text-based protocol to communicate with Batsim.
More details about the protocol can be found in the [protocol description].
You may also base your work on existing Batsim-compatible schedulers:
- C++: [batsched][batsched gitlab]
- D: [datsched][datsched gitlab]
- Perl: [there][perl sched repo] (deprecated)
- Python: [pybatsim][pybatsim gitlab]
- Rust: [there][rust sched repo]
Installation
------------
### For users
You can install batsim (and batsched) using one of the methods defined the
[install and Run](doc/run_batsim.md) documentation page.
### For developers
It is highly recommended to use the method describe in the
[Development environment](doc/dev_batsim.md) page to get everything setup and
running: from compilation to tests.
Executing complete experiments
------------------------------
If you want to run more complex scenarios, giving a look at our
[experiment tools](./tools/experiments) may save you some time! (May be
deprecated in the future by [batexpe](https://gitlab.inria.fr/batsim/batexpe))
[Batsim overview figure]: ./doc/batsim_rjms_overview.png
[./publications/Batsim\_JSSPP\_2016.pdf]: ./publications/Batsim_JSSPP_2016.pdf
[Evalys]: https://github.com/oar-team/evalys
[Vite]: http://vite.gforge.inria.fr/
[protocol description]: ./doc/proto_description.md
[oar3]: https://github.com/oar-team/oar3
[pybatsim gitlab]: https://gitlab.inria.fr/batsim/pybatsim
[batsched gitlab]: https://gitlab.inria.fr/batsim/batsched
[datsched gitlab]: https://gitlab.inria.fr/batsim/datsched
[rust sched repo]: https://gitlab.inria.fr/adfaure/schedulers
[perl sched repo]: https://github.com/fernandodeperto/batch-simulator
[batsim ci]: https://gricad-gitlab.univ-grenoble-alpes.fr/batsim/batsim/pipelines
# Deprecated
Batsim changelog has moved.
It can now be found [there](https://batsim.readthedocs.io/en/latest/changelog.html)
as part of the [batsim readthedocs](https://batsim.readthedocs.io/en/latest/index.html).
Batsim Continuous Integration
=============================
The continuous integration (CI) mechanism used in Batsim is based on Gitlab CI.
We are currently using the [GRICAD Gitlab server][GRICAD server]
for this purpose.
![gitlab-ci-arch](https://about.gitlab.com/images/ci/arch-1.jpg "Gitlab CI architecture")
Building and Testing Environment
--------------------------------
The CI uses a controlled Docker environment, which has been built with Kameleon
thanks to [this recipe](../environments/batsim_ci.yaml). This allows to:
- improve separation of concerns
- avoid installing dependencies within the CI script
- test whether the Batsim Docker environment works, which is nice for users
Gitlab CI script
----------------
The script can be found [there](../.gitlab-ci.yml). It essentially:
- builds Batsim with clang, checking that no warning is thrown
- builds Batsim with gcc, checking that no warning is thrown
- tests whether Batsim works, running different tests via CMake and
Batsim experiment tools
- Checks whether the code is fully documented via Doxygen
- Deploys the code documentation on
[this gforge page](http://batsim.gforge.inria.fr/). To do so, some SSH
key management is done within the script.
Gitlab Project Configuration
----------------------------
Edit your project configuration page ([there for Batsim](https://gricad-gitlab.univ-grenoble-alpes.fr/batsim/batsim/edit))
and make sure that ``Pipelines`` are enabled.
Some additional CI related configuration can be done in the CI settings page
([there for Batsim](https://gricad-gitlab.univ-grenoble-alpes.fr/batsim/batsim/pipelines/settings)).
Runner Configuration
--------------------
Batsim currently uses Docker runners provided by the GRICAD Gitlab server.
However, as we previously used our very own machines to host the CI runners,
the rest of this section describes how we managed to do it.
First, install the gitlab-ci-runner on the machine which should execute the
various CI operations. It is probably in your favourite package manager, but
more detailed information can be found on
[the Gitlab CI runner installation manual](https://docs.gitlab.com/runner/install/).
When running a runner for the first time, you have to tell it some information
about the server and the project. This information is given
[there for Batsim](https://gricad-gitlab.univ-grenoble-alpes.fr/batsim/batsim/runners).
``` bash
sudo gitlab-ci-multi-runner register
# Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com/):
https://gricad-gitlab.univ-grenoble-alpes.fr/
# Please enter the gitlab-ci token for this runner:
[PROJECT-DEPENDENT TOKEN]
```
[GRICAD server]: https://gricad-gitlab.univ-grenoble-alpes.fr
# Introduction #
[The Batsim protocol](proto_description.md) is used to synchronise Batsim and
the Decision process. However, to keep this protocol as simple as possible,
metadata (notably associated to jobs) are shared via the Redis data storage.
This data storage mechanism associates *values* to *keys*.
These pairs are written (**set**) by Batsim, and can be read (**get**) by the
Decision process.
# Keys Prefix #
Since several Batsim instances can be run at the same time, all the keys
explained in this document must be prefixed by some instance-specific prefix.
At the moment, this prefix is set to the absolute filename of the socket used
in [the Batsim protocol](proto_description.md), followed by a colon ':'.
# List of Used keys #
This document gathers the different keys used in the data storage.
## Platform size ##
The size of the simulation platform (the number of computing entities) is set
in the **nb_res** key. The value is a string representation of an unsigned
integer.
## Jobs' Information ##
The jobs are identified by a string.
Let's name **jID** the job identifier of job *j*.
As soon as *j* is submitted within Batsim, the corresponding job and profiles
are set, respectively into the **job_jID** and **profile_jID** keys.
The values associated with these keys are in JSON, detailed below.
### Job JSON details ###
Here is job JSON description string:
``` json
{
"id":1,
"subtime":10,
"walltime": 100,
"res": 4,
"profile": "2"
}
```
This string is a JSON object, which must contain the following keys:
- **id**, the job number within a given workload
- **subtime**, the minimum time at which the job can have been submitted
- **walltime**, the user-given upper bound on the job execution time
- **res**, the number of required resources
- **profile**, the profile associated with the job
More information can be added into the JSON object, depending on Batsim's
input files.
### Profile JSON details ###
A profile describes how a job should be computed. This information can be
used in clairvoyant schedulers. The profiles can be quite different depending
on the computation model you wish to use. Here are some examples of profile
strings:
``` json
{
"type": "parallel",
"cpu": [5e6,5e6,5e6,5e6],
"com": [5e6,5e6,5e6,5e6,
5e6,5e6,5e6,5e6,
5e6,5e6,5e6,5e6,
5e6,5e6,5e6,5e6]
}
```
``` json
{
"type": "parallel_homogeneous",
"cpu": 10e6,
"com": 1e6
}
```
``` json
{
"type": "composed",
"nb" : 4,
"seq": ["1","2","1"]
}
```
``` json
{
"type": "delay",
"delay": 20.20
}
```
# Setup a Development Environment
## Using Nix (**Recommended**)
See the [install and run](run_batsim.md) page to setup Nix with our
repository.
You can simply enter a shell that comes with all you need to build and
test Batsim with this command at the root of the repository:
```sh
cd batsim
nix-shell shell.nix
```
If it does not work (for old batsim versions) you can do:
```sh
nix-shell /path/to/kapack -A batsim_dev
```
This command will open a new Bash shell with all the environment variable set
correctly to find all the dependencies and build batsim.
**NOTE**: You can use `nix-shell --pure` to avoid the conflict with already
installed tools.
**WARNING**: The environment created by the `nix-shell` command is heavily
based on environment variables injected in the provided Bash shell. Do NOT
switch to an other shell (zsh, fish, ...) because environment variables will not
be present and the build will fail.
Then you can configure build batsim with these commands:
```
rm -rf build
mkdir build
cd build
cmake .. $cmakeFlags
make -j $(nproc)
```
To run the test you need to start redis on the same shell:
```sh
redis-server &
```
Or in an other shell:
```sh
nix-shell -p redis --command redis-server
```
Finally run the tests without the remote test (that requires self SSH):
```
ctest --output-on-failure -E 'remote'
```
## Using Docker (**DEPRECATED**)
If you need to change the code of Batsim you can use the docker environment ``oarteam/batsim_ci``
and use the docker volumes to make your Batsim version of the code inside the container.
```bash
# launch a batsim container
docker run -ti -v /home/myuser/mybatrepo:/root/batsim --name batsim_dev oarteam/batsim_ci bash
```
Then, inside the container run the instructions provided in the following part.
With this setting you can use your own development tools outside the
container to hack the batsim code and use the container only to build
and test your your code.
## Manual installation (not recommended)
Batsim uses [Kameleon](http://kameleon.imag.fr/index.html) to build controlled
environments. These environments allow us to generate Docker containers, which
are used by [our CI][batsim ci] to test
whether Batsim can be built correctly and whether some integration tests pass.
Thus, the most up-to-date information about how to build Batsim dependencies
and Batsim itself can be found in our Kameleon recipes:
- [batsim_ci.yaml](../environments/batsim_ci.yaml), for the dependencies (Debian)
- [batsim.yaml](../environments/batsim.yaml), for Batsim itself (Debian)
- Please note that [the steps directory](../environments/steps/) contain
subcommands that can be used by the recipes.
However, some information is also written below for the sake of simplicity, but
please note it might be outdated.
### Dependencies
Batsim dependencies are listed below:
- SimGrid. dev version is recommended (203ec9f99 for example).
To use SMPI jobs, use commit 587483ebe of
[mpoquet's fork](https://github.com/mpoquet/simgrid/).
To use energy, please consider using the Batsim upstream_sg branch and
SimGrid commit e96681fb8.
- RapidJSON (1.02 or greater)
- Boost 1.62 or greater (system, filesystem, regex, locale)
- C++11 compiler
- Redox (and its dependencies: hiredis and libev)
### Compile and Test
When you have setup your environment (see previous section), you can
go to the already cloned Batsim repository (or clone this repository)
and configure the build.
```sh
cd batsim_repo
rm -rf build
mkdir build
cd build
cmake ..
```
Now you can code your stuff, (**note**: It is recommended to do it in a branch)
and add some tests. Then build and run the tests with:
```sh
make -j $(nproc)
make install
make test
```
# Batsim Job Profiles
In order to know what has to be simulated for each job of a workload, Batsim is
using the notion of job profile. Each job is associated to a profile, but a
profile can be associated to multiple jobs.
Each profile is defined in the workload JSON file in the ``profiles`` section.
The only common field on the profile is the ``type``. Here is a list of all the
profile types supported by Batsim, with an explanation on how they work and how
to use it.
**Note**: You can use scientific notation to represent big numbers, e.g.
``8e3`` for ``8x10^3``.
## Delay
This is the simplest profile. In fact there is no job execution, only a certain
amount of time. It does **NOT** take into account the platform at all.
### Example
Waiting for 20.20 seconds.
```json
{
"type": "delay",
"delay": 20.20
}
```
## Parallel task
This profile correspond to a parallel task executed simultaneously on each node
allocated to the job.
### Parameters
- ``cpu``: a vector containing the amount of flops to be compute on
each nodes.
- ``com``: a vector containing the amount of bytes to be transferred between
nodes. You can see this vector as matrix where host in row is sending to the
host in column. When row equals column it is intranode communication using
local loopback interface.
### Example
```json
{
"type": "parallel",
"cpu": [5e6, 0, 0, 0],
"com": [5e6, 0, 0, 0,
5e6,5e6, 0, 0,
5e6,5e6, 0, 0,
5e6,5e6,5e6, 0]
}
```
## Parallel homogeneous task
This model is a convenient way to generate homogeneous task computation and
communication. The loopback communication is set to 0.
### Parameters
- ``cpu``: the amount of flops to be compute by each nodes.
- ``com``: the amount of bytes to be send and receive by each nodes.
### Example
```json
{
"type": "parallel_homogeneous",
"cpu": 10e6,
"com": 1e6
}
```
## Parallel homogeneous task with total amount
This model is a convenient way to generate homogeneous task computation and
communication by giving the total amount work to be done. The loopback
communication is set to 0. It give to this job the ability to be allocated on
any number of resources while conserving the same amount of work to do.
### Parameters
- ``cpu``: the total amount of flops to be compute spread over all nodes: each
node will have ``cpu / number of nodes`` amount of flops to compute.
- ``com``: the amount of bytes to be send and receive by each nodes: each
node will have ``com / number of nodes`` amount of bytes to send and the same
amount to receive.
### Example
```json
{
"type": "parallel_homogeneous_total",
"cpu": 10e6,
"com": 1e6
}
```
## Composed
This job profile is a list of profiles to be executed in a sequence.
### Parameters
- ``seq``: the list of profiles by name.
- ``repeat`` (optional): the number of times the sequence will be repeated (none by default).
### Example
```json
{
"type": "composed",
"repeat" : 4,
"seq": ["prof1","prof2","prof1"]
}
```
## Homogeneous IO to/from a PFS storage (Parallel File System)
Represents an IO transfer between all the nodes of a job's allocation and a
centralized storage tier. The storage tier is represented by one node.
### Parameters
- ``bytes_to_read``: the amount of bytes to read from the PFS to each nodes.
- ``bytes_to_write``: the amount of bytes to write to the PFS from each nodes.
- ``storage``: The name of the storage. It will be mapped to a specific node at the job
execution time. (optional: Default value is ``pfs``).
### Example
```json
{
"type": "parallel_homogeneous_pfs",
"bytes_to_read": 10e5,
"bytes_to_write": 10e5,
"storage": "nfs"
}
```
## IO staging between two storage tiers
This profile represents an IO transfer between two storage tiers.
### Parameters
- ``nb_bytes``: the amount of bytes to be transferred.
- ``from``: The name of the storage that sends. It will be mapped to a specific node at the job execution time.
- ``to``: The name of the storage that receives. It will be mapped to a specific node at the job execution time.
### Example
```json
{
"type": "data_staging",
"nb_bytes": 10e5,
"from": "pfs",
"to": "nfs"
}
```
# Deprecated
Batsim protocol documentation has moved.
It can now be found [there](https://batsim.readthedocs.io/en/latest/protocol.html)
as part of the [batsim readthedocs](https://batsim.readthedocs.io/en/latest/index.html).
DIA= $(wildcard *.dia)
SVG= $(DIA:.dia=.svg)
PDF= $(SVG:.svg=.pdf)
PNG= $(SVG:.svg=.png)
all: pdf png
pdf: ${PDF}
png: ${PNG}
%.svg: %.dia
dia -e $@ -t svg $^
%.pdf: %.svg
inkscape -A $@ $^ -b "#ffffff"
%.png: %.pdf
inkscape -e $@ $^ -b "#ffffff"
clean:
rm -f *.pdf *.png
# Install and Run batsim
**Important note**: It is highly recommended to install Batsim with the
provided methods because we are using specific version of SimGrid and
up-to-date packages (like boost) that may not be easily available in your
distribution yet.
If you are looking for development setup to be able to compile batsim and
run the tests, see the [Setup a Development Environment](dev_batsim.md)
documentation.
## Install batsim with Nix (**Recommended**)
First you need to install Nix but don't worry it is pretty straightforward:
```sh
curl https://nixos.org/nix/install | sh
```
Follow the instructions provided at the end of the script: You need to
source a file to access to the Nix commands:
```sh
~/nix-profiles/etc/profile.d/nix.sh
```
Then, get our Nix repository that contains the batsim package:
```sh
git clone https://github.com/oar-team/kapack.git kapack
nix-env --file ./kapack --install batsim
```
Batsim is now available directly:
```sh
batsim --help
```
You can also install Batsched, the scheduler used for the tests and the
examples, with the same mechanism:
```sh
nix-env --file ./kapack -iA batsched
```
## Run batsim directly with docker (Deprecated)
A simple way to run batsim is to run it directly with docker because you
have nothing to install and/or configure (except Docker itself...).
You can run batsim directly using this image without any installation. For
example:
```sh
docker run --net host -u $(id -u):$(id -g) -v $PWD:/data oarteam/batsim -p ./platforms/energy_platform_homogeneous_no_net_32.xml -w ./workload_seed20_200jobs.json -e seed20
```
To make it more understandable, here is the command decomposition:
- ``--net host`` to access external redis server (optional)
- ``--user $(id -u):$(id -g)`` to generate outputs with your own user permission instead of root permission
- ``--volume $PWD:/data`` to share your local folder with batsim so it can
find the platform file and so on: Batsim is running inside docker in the
``/data`` folder.
- ``oarteam/batsim`` image name (you can add a tag to get a specific version like ``oarteam/batsim:1.2.0``
- ``--platform plt.xml --workload wl.json ...`` add batsim parameters
Then you can run your own scheduler to make the simulation begins.
# Bonus :)
## Create the docker image with Nix
We use the [Nix package manager](https://nixos.org/nix/) to build a minimal
docker image for batsim.
Get the Nix repository that contains the batsim package [here](https://github.com/oar-team/kapack/):
```sh
git clone https://github.com/oar-team/kapack.git kapack
cd kapack
# For stable version
nix-build . -A batsimDocker
# For latest version from master head
nix-build . -A batsimDocker_git
```
Then you need docker to load the image:
```sh
cat result | docker load
# see it in docker
docker images
# add some tag