1. 20 Nov, 2017 4 commits
  2. 16 Nov, 2017 1 commit
  3. 14 Nov, 2017 10 commits
    • BAIRE Anthony's avatar
      update the job container command · 2cfd30b8
      BAIRE Anthony authored
      - to have SIGTERM forwarded to the process
      - to propagate the exit code of the process
    • BAIRE Anthony's avatar
      order jobs by id in the job list · 9a4b2818
      BAIRE Anthony authored
      (should be less confusing)
    • BAIRE Anthony's avatar
      add the 'rescheduled' future · 7d5df09e
      BAIRE Anthony authored
      to let the task implementations detect that they are being rescheduled
    • BAIRE Anthony's avatar
      extend shutdown case (in test_manager_futures) · 9f37db99
      BAIRE Anthony authored
      - add a pending task
      - add a timeout
    • BAIRE Anthony's avatar
      factorisation · 5173f5e1
      BAIRE Anthony authored
    • BAIRE Anthony's avatar
      add --verbose · 6a2a886f
      BAIRE Anthony authored
      console log level:
       (default)    ->  WARNING
       -v/--verbose ->  INFO
       -d/--debug   ->  DEBUG
      log files:
        /vol/log/controller.log   -> INFO
        /vol/log/debug.log        -> DEBUG  (disabled unless -d/--debug or
                                     unless env var DEBUG is set)
    • BAIRE Anthony's avatar
      refactor the management of swarm/sandbox resources · 0e301e74
      BAIRE Anthony authored
      - add SwarmAbstractionClient: a class that extends docker.Client and
        hides the API differences between the docker remote API and the
        swarm API. Thus a single docker engine can be used like a swarm
      - add SharedSwarmClient: a class that extends SwarmAbstractionClient
        and monitors the swarm health and its resource (cpu/mem) and manages
        the resource allocation.
        - resources are partitioned in groups (to allow reserving resources
          for higher priority jobs)
        - two SharedSwarmClient can work together over TCP in a master/slave
          configuration (to allow the production and qualification platforms
          to use the same swarm without any interference)
      - the controller is modified to:
        - use SharedSwarmClient to:
          - wait for the end of a job (in place of DockerWatcher)
          - manage resource reservation (LONG_APPS vs. BIGMEM_APPS vs normal
            apps) and monitor swarm health (fix #124)
          - NOTE: resources of the swarm and sandbox are now managed
            separately (2 instances of SharedSwarmClient), whereas it was
            global before (this was suboptimal)
        - rely on SwarmAbstractionClient to compute the cpu quotas
        - store the container_id of jobs into the DB (fix #128), this is a
          prerequisite to permit renaming apps in the future
        - store the class of the job (normal vs. long app) in the container
          name (for the resource management with SharedSwarmClient)
        - read the configuration from a yaml file (/vol/ro/config.yml) for:
          - cpu/mem quotas
          - swarm resources allocation policy
          - master/slave configuration
    • BAIRE Anthony's avatar
      allow extra params in run-coverage · cfa6bb36
      BAIRE Anthony authored
    • BAIRE Anthony's avatar
    • BAIRE Anthony's avatar
  4. 13 Nov, 2017 1 commit
  5. 09 Nov, 2017 9 commits
  6. 27 Sep, 2017 1 commit
  7. 06 Jul, 2017 14 commits