1. 09 Apr, 2020 1 commit
    • BAIRE Anthony's avatar
      Ignore the image entrypoint when creating job/sandbox containers · e976883a
      BAIRE Anthony authored
      Given that user can now push any docker images (!204), we will
      have troubles if the user sets an arbitrary entrypoint (eg: using
      the ENTRYPOINT instruction in the Dockerfile).
      
      It is wiser to just ignore the entrypoint for the moment.
      
      Note: in the future, we should remove Webapp.entrypoint and use the
      ENTRYPOINT defined in the image instead.
      e976883a
  2. 27 Aug, 2019 1 commit
  3. 20 Dec, 2018 1 commit
  4. 26 Sep, 2018 1 commit
    • BAIRE Anthony's avatar
      Allow importing a webapp from a legacy allgo instance · 51f51d9c
      BAIRE Anthony authored
      
      This adds two views:
      
      - WebappImport for importing the webapp (but without the versions).
        The import is allowed if the requesting user has the same email
        as the owner of the imported app. The webapp is created with
        imported=True, which enables the WebappVersionImport view
      
      - WebappVersionImport for requisting the import of webapp version.
        This only creates the WebappVersion entry with state=IMPORT
        (the actual import is performed by the controller)
      
      A version may be imported multiple times. In that case, the newly
      imported version overwrite the local version with the same number.
      
      This features requires:
      - that the rails server implements !138
      - that the docker daemon hosting the sandboxes is configured with
        credentials for pulling from the legacy registry
      51f51d9c
  5. 19 Sep, 2018 1 commit
  6. 18 Sep, 2018 5 commits
  7. 17 Sep, 2018 1 commit
    • BAIRE Anthony's avatar
      derive docker tags names from WebappVersion.id · b7b30d3e
      BAIRE Anthony authored
      With this change docker images are no longer
      named as: <Webapp.docker_name>:<WebappVersion.number>
      but       <Webapp.docker_name>:id</WebappVersion.id>
      
      This is only for storage, for the user we still present the image as
      <Webapp.docker_name>:<WebappVersion.number>
      
      There are multiple reasons to do that:
      - this simplifies the controller design, because docker images are no
        longer replaced (once an image is committed with tag, 'id<SOMETHING>'
        it won't be modified anymore) -> thus it is no longer necessary to
        track the image state carefully (when pushing/pulling from/to the
        registry)
      - this prevent reusing dangling images from a removed webapp (because we
        now have a strong guarantee that the image tags are unique)
      - this will avoid nasty race conditions when we implement direct 'push'
        to the registry (because we then assign the new image id before the
        manifest is actually pushed, if a push and commit are done in the same
        time we will keep the latest one, i.e. with the highest id)
      - this will make easy to implement image recovery: we can keep removed
        images in the registry for some time (eg: 1 month) before they are
        really deleted
      
      Note: the REPLACED state is no longer transient (since we now keep the
      replaced images in the db and since we may still have remaining
      job/sandboxes using them). Maybe we can rename it as DELETED when we
      implement #265.
      b7b30d3e
  8. 12 Sep, 2018 1 commit
  9. 07 Aug, 2018 3 commits
    • BAIRE Anthony's avatar
      controller: always update the redis db after processing a job · dbb92573
      BAIRE Anthony authored
      This way we get redis updates when a job is deleted and this also
      prevents inserting a 'DONE' state when the job is not done
      
      Note: we never update the redis job state key from the django server to
      avoid race conditions
      dbb92573
    • BAIRE Anthony's avatar
      Add a redis key to store job results · d4ac4721
      BAIRE Anthony authored
      While this is not needed for the job_detail page (because we just reload
      the page when the job is done), we will need this information for the
      job_list page (because we do not want to reload the job_list each time a
      job terminates)
      d4ac4721
    • BAIRE Anthony's avatar
      make db state changes atomic on job start & job destroy · 4b3c478b
      BAIRE Anthony authored
      The controller and django can both change the job state, especially when
      it is in the WAITING state (django may delete the job and the controller
      may start the job).
      
      To prevent any inconsistency, we must ensure that these transitions are
      made atomically.
      4b3c478b
  10. 05 Jul, 2018 4 commits
  11. 27 Jun, 2018 7 commits
    • BAIRE Anthony's avatar
      Use the redis db to trigger controller actions · 01dd48e6
      BAIRE Anthony authored
      This commit removes the old notification channel (socket listening
      on port 4567), and uses the redis channel 'notify:controller' instead.
      
      The django job creation views are updated accordingly.
      01dd48e6
    • BAIRE Anthony's avatar
      Stream job logs and job state updates to the user · 1bb4acf4
      BAIRE Anthony authored
      This commit makes several changes.
      
      In the controller:
      
      - duplicates the logs produced by the jobs. Initially they were only
        stored into allgo.log, now they are also forwarded to the container
        output (using the 'tee' command) so that the controller can read
        them
      
      - add a log_task that reads the logs from docker and feeds them into
        the redis db key "log:job:<ID>" (this is implemented with aiohttp
        in order to be fully asynchronous)
      
      - store the job state in a new redis key "state:job:<ID>"
      
      - send notification to the redis pubsub 'notify:aio' channel when
        the job state has changed or when new logs are available
      
      In the allgo.aio frontend:
      
      - implement the /aio/jobs/<ID>/events endpoints which streams all
        job events & logs to the client (using json formatted messages)
      
      In django:
      
      - refactor the JobDetail view and template to update the page
        dynamically for job updates (state/logs)
          - allgo.log is read only when the job is already terminated.
            Otherwise the page uses the /aio/jobs/<ID>/events channel
            to stream the logs
          - the state icon is patched on the page when the state changes,
            except for the DONE state which triggers a full page reload
            (because there are other parts to be updated)
      1bb4acf4
    • BAIRE Anthony's avatar
      update the location of the job files · c5f93183
      BAIRE Anthony authored
      now located in the 'django' container and full path is
      "{DATASTORE}/{JOB_ID}"
      c5f93183
    • BAIRE Anthony's avatar
      add a redis client in the controller · 553fee62
      BAIRE Anthony authored
      553fee62
    • BAIRE Anthony's avatar
      remove the factories · 52a9e3ec
      BAIRE Anthony authored
      Fix #185
      
      - webapps are now located directly at the root of the registry
        (not in the /webapp subdir)
      
      - factories are no longer stored in our registry, we directly reference
        images on the official docker registry
      52a9e3ec
    • BAIRE Anthony's avatar
      support registry authentication in the controller · 5a5a06d2
      BAIRE Anthony authored
      because it is now based on tokens (instead of using TLS client certificates)
      5a5a06d2
    • BAIRE Anthony's avatar
      rename tables ad dj_* · a0ecda20
      BAIRE Anthony authored
      a0ecda20
  12. 12 Apr, 2018 1 commit
  13. 21 Nov, 2017 1 commit
    • BAIRE Anthony's avatar
      fix out of memory message · 06420260
      BAIRE Anthony authored
      this variant is closer to the actual meaning
      (the fact that the limit was reached does not automatically imply
       that the process is starving, we cannot decide how much memory
       a process needs without doing some profiling)
      06420260
  14. 20 Nov, 2017 5 commits
  15. 16 Nov, 2017 1 commit
  16. 14 Nov, 2017 4 commits
    • BAIRE Anthony's avatar
      update the job container command · 2cfd30b8
      BAIRE Anthony authored
      - to have SIGTERM forwarded to the process
      - to propagate the exit code of the process
      2cfd30b8
    • BAIRE Anthony's avatar
      add the 'rescheduled' future · 7d5df09e
      BAIRE Anthony authored
      to let the task implementations detect that they are being rescheduled
      7d5df09e
    • BAIRE Anthony's avatar
      factorisation · 5173f5e1
      BAIRE Anthony authored
      (disable_future_warning)
      5173f5e1
    • BAIRE Anthony's avatar
      refactor the management of swarm/sandbox resources · 0e301e74
      BAIRE Anthony authored
      - add SwarmAbstractionClient: a class that extends docker.Client and
        hides the API differences between the docker remote API and the
        swarm API. Thus a single docker engine can be used like a swarm
      
      - add SharedSwarmClient: a class that extends SwarmAbstractionClient
        and monitors the swarm health and its resource (cpu/mem) and manages
        the resource allocation.
        - resources are partitioned in groups (to allow reserving resources
          for higher priority jobs)
        - two SharedSwarmClient can work together over TCP in a master/slave
          configuration (to allow the production and qualification platforms
          to use the same swarm without any interference)
      
      - the controller is modified to:
        - use SharedSwarmClient to:
          - wait for the end of a job (in place of DockerWatcher)
          - manage resource reservation (LONG_APPS vs. BIGMEM_APPS vs normal
            apps) and monitor swarm health (fix #124)
          - NOTE: resources of the swarm and sandbox are now managed
            separately (2 instances of SharedSwarmClient), whereas it was
            global before (this was suboptimal)
        - rely on SwarmAbstractionClient to compute the cpu quotas
        - store the container_id of jobs into the DB (fix #128), this is a
          prerequisite to permit renaming apps in the future
        - store the class of the job (normal vs. long app) in the container
          name (for the resource management with SharedSwarmClient)
        - read the configuration from a yaml file (/vol/ro/config.yml) for:
          - cpu/mem quotas
          - swarm resources allocation policy
          - master/slave configuration
      0e301e74
  17. 29 May, 2017 1 commit
  18. 25 Apr, 2017 1 commit