• BAIRE Anthony's avatar
    refactor the management of swarm/sandbox resources · 0e301e74
    BAIRE Anthony authored
    - add SwarmAbstractionClient: a class that extends docker.Client and
      hides the API differences between the docker remote API and the
      swarm API. Thus a single docker engine can be used like a swarm
    
    - add SharedSwarmClient: a class that extends SwarmAbstractionClient
      and monitors the swarm health and its resource (cpu/mem) and manages
      the resource allocation.
      - resources are partitioned in groups (to allow reserving resources
        for higher priority jobs)
      - two SharedSwarmClient can work together over TCP in a master/slave
        configuration (to allow the production and qualification platforms
        to use the same swarm without any interference)
    
    - the controller is modified to:
      - use SharedSwarmClient to:
        - wait for the end of a job (in place of DockerWatcher)
        - manage resource reservation (LONG_APPS vs. BIGMEM_APPS vs normal
          apps) and monitor swarm health (fix #124)
        - NOTE: resources of the swarm and sandbox are now managed
          separately (2 instances of SharedSwarmClient), whereas it was
          global before (this was suboptimal)
      - rely on SwarmAbstractionClient to compute the cpu quotas
      - store the container_id of jobs into the DB (fix #128), this is a
        prerequisite to permit renaming apps in the future
      - store the class of the job (normal vs. long app) in the container
        name (for the resource management with SharedSwarmClient)
      - read the configuration from a yaml file (/vol/ro/config.yml) for:
        - cpu/mem quotas
        - swarm resources allocation policy
        - master/slave configuration
    0e301e74
controller.py 57.1 KB