1. 14 Nov, 2017 7 commits
    • BAIRE Anthony's avatar
      extend shutdown case (in test_manager_futures) · 9f37db99
      BAIRE Anthony authored
      - add a pending task
      - add a timeout
      9f37db99
    • BAIRE Anthony's avatar
      factorisation · 5173f5e1
      BAIRE Anthony authored
      (disable_future_warning)
      5173f5e1
    • BAIRE Anthony's avatar
      add --verbose · 6a2a886f
      BAIRE Anthony authored
      console log level:
       (default)    ->  WARNING
       -v/--verbose ->  INFO
       -d/--debug   ->  DEBUG
      
      log files:
        /vol/log/controller.log   -> INFO
        /vol/log/debug.log        -> DEBUG  (disabled unless -d/--debug or
                                     unless env var DEBUG is set)
      6a2a886f
    • BAIRE Anthony's avatar
      refactor the management of swarm/sandbox resources · 0e301e74
      BAIRE Anthony authored
      - add SwarmAbstractionClient: a class that extends docker.Client and
        hides the API differences between the docker remote API and the
        swarm API. Thus a single docker engine can be used like a swarm
      
      - add SharedSwarmClient: a class that extends SwarmAbstractionClient
        and monitors the swarm health and its resource (cpu/mem) and manages
        the resource allocation.
        - resources are partitioned in groups (to allow reserving resources
          for higher priority jobs)
        - two SharedSwarmClient can work together over TCP in a master/slave
          configuration (to allow the production and qualification platforms
          to use the same swarm without any interference)
      
      - the controller is modified to:
        - use SharedSwarmClient to:
          - wait for the end of a job (in place of DockerWatcher)
          - manage resource reservation (LONG_APPS vs. BIGMEM_APPS vs normal
            apps) and monitor swarm health (fix #124)
          - NOTE: resources of the swarm and sandbox are now managed
            separately (2 instances of SharedSwarmClient), whereas it was
            global before (this was suboptimal)
        - rely on SwarmAbstractionClient to compute the cpu quotas
        - store the container_id of jobs into the DB (fix #128), this is a
          prerequisite to permit renaming apps in the future
        - store the class of the job (normal vs. long app) in the container
          name (for the resource management with SharedSwarmClient)
        - read the configuration from a yaml file (/vol/ro/config.yml) for:
          - cpu/mem quotas
          - swarm resources allocation policy
          - master/slave configuration
      0e301e74
    • BAIRE Anthony's avatar
      allow extra params in run-coverage · cfa6bb36
      BAIRE Anthony authored
      cfa6bb36
    • BAIRE Anthony's avatar
      a0e22e85
    • BAIRE Anthony's avatar
      908d1dff
  2. 13 Nov, 2017 1 commit
  3. 09 Nov, 2017 9 commits
  4. 27 Sep, 2017 1 commit
  5. 06 Jul, 2017 22 commits