Ochestra is a set of tools meant to help in performing experimental campaigns in computer science. It provides you with
Ochestra is a project that aims at providing a set of tools to help performing experimental campaigns in computer science. In particular, we would like to address the following points:
simple tools to:
+ Organize a manual experimental workflow, leveraging git and lfs through a simple interface.
- Running a large set of experiments on a large cluster should be as simple as running one such experiment on your laptop.
+ Collaborate with other people on a single experimental campaign.
- Hyper-Parameter search should be left to algorithms rather than students.
+ Execute pieces of code on remote hosts such as clusters or clouds, in one line.
- Experimental code should be versioned. Results should be linked to code version.
+ Automate the execution of batches of experiments and the presentation of the results through a clean web ui.
- Collaboration on experimental campaigns should be hassle free.
- Newcomers should be able to crank-up their campaign in a day.
A lot of advanced tools exists on the net to handle similar situations. Most of them target very complicated workflows,
To address those points, we work on a few tools developed on the same backbone library:
e.g. DAGs of tasks. Those tools are very powerful but lack the simplicity needed by newcomers. Here, we propose a
limited but very simple tool to handle one of the most common situation of experimental campaigns: the repeated
execution of an experiment on variations of parameters.
In particular, we include three tools:
- runaway: a command-line tool to execute code on distant clusters as simply as possible.
+ expegit: a tool to organize your experimental campaign results in a git repository using git-lfs (large file storage).
- expegit: a command-line tool to organize an experimental campaign as a repository, allowing code versioning, results versioning, and collaboration.
+ runaway: a tool to execute code on distant hosts parameterized with easy to use file templates.
- parasearch: a command-line tool to automate hyper-parameter search using a few common algorithms.
+ orchestra: a tool to automate the use of the two previous tools on large campaigns.
- orchestra: a tool to manage the whole lifecycle of an experimental campaign addressed in the preceding other tools, through a simple web ui.
## Quick Start
## Status
The preferred way is through the web interface, but if you prefer the command line, after setting up
In 2018 a first prototype of those tools was written, but a foundational design decision prevented the tools to scale to massive campaigns. In the 0.1.0 release, a new backbone library was introduced, allowing the expected performance. Only a new version of runaway was shipped with this release, and we are currently working toward updating the other tools. The project is currently under active development, and breaking changes should be expected until mid 2020.
the tools, you should be able to go through the following steps:
Assuming you have a repository containing your experimental code, create a run handle:
## Current release
```bash
$ cd my_experiment
$ ln-s my_script.py run &&chmod +x run
$ git add run && git commit -m"Adds orchestra run handle"&& git push
```
We let orchestra create a campaign repository at a new address for us:
The current release of the orchestra tools is 0.1.
As you can imagine, sending and fetching the whole folder back and forth can be resource consuming. Runaway proposes two
ways to temper this:
+ Code Reuse: The code is sent to remote as a tarball archive, which is compared with archives already existing on the
remote. If the remote already contains this archive, then it will skip the sending and will directly used the one there.
Basically, runaway cleans the whole remote execution at the end of the run `--leave=nothing`, but you can leave either
the code or the code+results with `--leave=code` and `--leave=everything` respectively.
+ Files Ignoring: When packing the files the be sent to the remote, files and folders can be ignored with the
`.sendignore` file. This one is nothing but a simple text file containing patterns and globs, in the same way as a
`.gitignore` file. For example if you are at the root of a git repository, adding your `.git` folder in the
`.sendignore` may be interesting. The same ignoring can be parameterized for the fetching of the results in the
`.fetchignore` file. If you are in an expegit execution, you can parameterize this one to only fetch the `data` folder
for example. Beware to not ignore your `.fetchignore` in your `.sendignore` !
### Automating Expegit and Runaway with Orchestra
The two previous tools allows to handle the results and the run of a single experiment execution. In the case of an
experimental campaign, we are, in genereal, interested in automating those stuffs to run large batches of experiments.
Orchestra allows to automate the creation and execution of expegit executions through a simple interface. You can choose
to use orchestra only via command line, but the prefered way is to use the web ui:
```bash
$ orchestra -v gui
```
This will launch an application that allows you to generate batch of executions, and to monitor and access the results.
Indeed, every images (plots, gifs, ...) is rendered in the execution report at `https://localhost:8088/executions/{id}`.
This allows you to quickly check the results of a single experiment. As you may expect, the results are automatically
synchronized with the remote repository.
Moreover, thanks to the use of Git to store the results, a same remote repository can be fed results by multiple
Orchestra instances. This allows you to execute experiments from different machines which may not have access to the
same computational ressources. On such case appears if multiple people collaborate on an experimental campaign, with
access to different ressources. This quickly becomes the case with clusters, since your number of simultaneous
executions may be limited by your user account. With Orchestra, you can multiply your limit by running executions from
the computers of the different people working on your campaign.
- A shift to `futures`-based concurrency in the whole codebase. From the ssh connection to the resource allocation, slot acquisition, and repository interactions; every blocking operations involved in the execution was made non-blocking. This allows to concurrently execute as much executions as allowed by the different resources (ssh connections, schedulers, nodes), and not by the task scheduler.
- A concurrent model of cluster schedulers. The scheduling of executions that was once deferred to the remote schedulers such as slurms, is now concurrently managed from the library. This allows to substitute for the platform queue, locally and concurrently.
- A fine-grained slot acquisition model. The placement of executions processes on nodes that was once deferred to the scheduler is now managed by the library. This means that we can acquire 10 nodes and place any number (e.g. threds number) of execution processes on every single nodes independently and concurrently. This allows for a much more intensive use of the acquires resources, while retaining the gain of managing executions one by one.
- Much more :)
## Install
## Install
To install orchestra tools, simply go to the [release page](https://gitlab.inria.fr/apere/orchestra/wikis/home) of the wiki.
To install the tools, consult the [wiki](https://gitlab.inria.fr/apere/orchestra/wikis/home)
There you'll be able to download the binaries of the three tools, which you only need to make available in your PATH. As of
today, only linux and osx binaries are available.
## Todo
Orchestra is far from being complete. Here are some ideas which may be implemented in the future:
+ Automating the execution of campaign wide analysis at new result
+ Automating the generation of runs, based on the existing results, using exploration, bayesian optimization or
diversity search.
+ ...
## About
Orchestra-tools are written in Rust. Though this choice was led by pure curiosity at first, this language proved to be
quite handy to produce reliable tools. Despite its strong emphasis on speed and memory-safety, Rust also benefits from a
set of nice abstractions to enforce good practices in software development (Ownership, Options, Results, etc..).
Moreover, when the right library is available, writing code in rust is not longer than with a scripting language such as
python. Because of its young age, rust does not provide such high level librairies for `git` and `ssh`, which are