Gricad cluster
Execute the code on the Gricad cluster.
Please note that the cluster request (oarsub command) and singularity container are
configurable from the config file.
Preliminary step:
You need to first configure your SSH access to both bastions of the Gricad cluster:
ssh-copy-id PERSEUS_LOGIN@rotule.univ-grenoble-alpes.fr
ssh-copy-id PERSEUS_LOGIN@trinity.univ-grenoble-alpes.frAnd also on the gricad cluster you would like to use (here with
bigfootfor example):
ssh-copy-id -o "ProxyCommand ssh -q PERSEUS_LOGIN@access-gricad.univ-grenoble-alpes.fr nc -w 60 %h %p" PERSEUS_LOGIN@bigfoot
Gricad is composed of different clusters (dahu, bigfoot for GPUs, luke, and froggy).
You can configure the one you want to use in the config file.
Available subcommands:
remi gricad setup
Set up your project on Gricad.
remi gricad push [-f | --force]
Sync the content of the project directory to the gricad cluster.
If no changes are detected locally, the file sync will not be attempted.
Options:
-f | --force: Run the sync command even if no local changes were detected.
remi pull [-f | --force] [REMOTE_PATH]
Sync the content of the provided REMOTE_PATH directory from the gricad cluster to the local
cluster.
This can be used to sync back experimental output that result from a computation done remotely.
If no path is specified, output/ will be used as the default value.
Options:
-f | --force: Do not ask for a confirmation before pulling.
Use with caution. (Eventually conflicting local files might be overridden).
remi gricad clean [-f | --force] [REMOTE_PATH]
Clean the content of the provided REMOTE_PATH directory on the gricad cluster.
If no directory is specified, output/ will be used as the default value.
Options:
-f | --force: Do not ask for a confirmation before cleaning.
Use with caution.
remi gricad [script] [OPTIONS]
Run a bash script on the gricad cluster.
This is the default subcommand (and can thus be run using remi cluster).
Options:
-s | --script: The path to a bash script to run.
Default:script.sh-n | --job-name: A custom name for the cluster job (oarsub’s--nameoption).
Default: The project name-g | --gpu-model: GPU model (‘A100’, ‘V100’ or ‘T4’) Default: The value defined in the gricad/oarsub section of the config file.--no-push: Do not attempt to sync project files to the gricad cluster.
Examples:
remi gricad: Runscript.shon the cluster.remi gricad -s training_script.sh: Runtraining_script.shon the cluster.
remi gricad command [OPTIONS] COMMAND
Run the specified COMMAND on the cluster.
Options:
-n | --job-name: A custom name for the cluster job (oarsub’s--nameoption).
Default: The project name-g | --gpu-model: GPU model (‘A100’, ‘V100’ or ‘T4’) Default: The value defined in the gricad/oarsub section of the config file.--no-push: Do not attempt to sync project files to the gricad cluster.
Example:
remi gricad command "./test.sh --number_steps=1000": Run the command./test.sh --number_steps=1000on the cluster.
remi gricad interactive [OPTIONS]
Start an interactive session on the cluster. This runs oarsub with the --interactive flag.
Options:
-n | --job-name: A custom name for the cluster job (oarsub’s--nameoption).
Default: The project name-g | --gpu-model: GPU model (‘A100’, ‘V100’ or ‘T4’) Default: The value defined in the gricad/oarsub section of the config file.--no-push: Do not attempt to sync project files to the gricad cluster.
Example:
remi gricad interactive --no-push: Start an interactive session on the cluster without pushing local changes.
remi gricad recap
Run recap.py on the cluster to list compute nodes information (CPU, GPU…).
remi gricad chandler
Run chandler on the cluster to list compute nodes occupation (free/busy).
remi gricad stat
Get some information about your running/planned jobs thanks to oarstat.
remi gricad connect OAR_JOB_ID
Connect to a running job.
Example:
remi gricad connect 6267518
remi gricad kill OAR_JOB_ID
Kill one or multiple running job(s).
Example:
remi gricad kill 6267518 6267519 6267520