Mentions légales du service

Skip to content

Draft: Docker deploy

BIGAUD Nathan requested to merge docker-deploy into develop

Goals: containerization of declearn objects in two settings:

  • Production setting: containerization on hospital servers
  • Experimental setting: more realistic experiments, deployed at scale on grid 5000

Production images (done)

Overview of implemented elements

  • Four dockerfiles, one for each framework + a base one, which are used to build and push images to the the container registry
  • An images.sh bash file, called on by the gitlab-ci, builds and pushes docker images when changes are made
  • A heart-uci bash script, which runs docker containers for server and clients using the registry images and launches training

Work flow :

  • To run the heart uci example: sudo bash examples/heart-uci/docker.sh from the declearn2 folder

  • To run say a server script in a production setting:

    • Create a run.sh file with the commands you want your server to run once launched
    • Run:
    docker run -dt --gpus all --mount type=bind,source=$PWD,target=/experiment --name server --env registry.gitlab.inria.fr/magnet/declearn/declearn2:declearn-base
  • To manually build and push to registry

    docker login registry.gitlab.inria.fr
    docker build -t registry.gitlab.inria.fr/magnet/declearn/declearn2:declearn-torch -f torch.Dockerfile .
    docker push registry.gitlab.inria.fr/magnet/declearn/declearn2:declearn-torch

Large-scale test

Mock image

We have a first version, using a mock containers, that scales up to 100 contrianers but is limited to a single host. To test it with 10 clients, simply run :

cd deploy_mock
bash build.sh 10

The next step is to extend that to several machines, deployed on grid5000. The goals is to find a tool that enables the docker compose file to interact with a cluster as if it was a single machine. Some ressources:

  • The most likely solution, as discussed on an email thread with grid5000 users, is to use the Enos, which enables us to book resources and abstract away the multiplicity of hosts, It can also substitute itself to the compose.yaml file see here, but this would require booking 1000 hosts.
  • Another possible solution is to use docker swarm, the docker solution for multihost applications. It is used for instance in this tutorial, and can directly deploy a compose file using docker stack deploy once set up (documentation)
  • There is possibly a way to use both, as enos seems to support using swarm

Real images

We then need to bring all the pieces together, using the system chosen above but replacing the source image and the template script by a declearn example.

TODO :

Production images

  • Build Dockerfile(s)
  • Build images
  • Build heart-uci script
  • No GPU access > solved on magnet5, magnet 8 requires some set up
  • Small issues
    • Clients do not write to results
    • Server does not close properly
  • Container registry :
    • Need to build image, need access to registry
    • Slim down the image from 8gb
    • Make sure docker.sh pulls from registry
    • Automate docker push in CI
      Deploy on grid 5000
  • Run 1000 mock countainers
    • Build example (dataset and model)
    • Explore deployment on grid 5000
    • Networking sharing approach not scalable, use swarms
  • Run 1000 actual decleran containers
  • Double check docker-image CI step works as intended
  • Document, lint, and merge
Edited by BIGAUD Nathan

Merge request reports