slurmdbd conf not working in docker
La composition écrite pour lancer la base de donnée slurm et le service associé fonctionne en flavour nixos-test mais ne fonctionne pas en flavour docker.
Ci-dessous, déroulé du build, start, connect.
(nixos-compose-d1i0LvFl-py3.8)
[jon@dednox:~/dev/issue2]$ nxc build -f docker
Click to expand
Starting Build
warning: Git tree '/home/jon/dev/nxcSandbox/issue2' is dirty
Loaded image: nxc-docker-base-image:latest
Docker Image loaded
Build completed
(nixos-compose-d1i0LvFl-py3.8)
[jon@dednox:~/dev/issue2]$ nxc start
Click to expand
Starting
Use last build:
/home/jon/dev/nxcSandbox/issue2/nxc/build/composition::docker
docker
deleting VM state directory /run/user/1000/vm-state-dbd
if you want to keep the VM state, pass --keep-vm-state
/nix/store/dml49f1m98qybvj7y965fgkab3px8anf-docker-compose
starting docker-compose
(0.00 seconds)
starting all machines
(0.00 seconds)
dbd: waiting for success: which bash
store_dbd_1 is up-to-date
(0.34 seconds)
running the test script
/nix/store/dml49f1m98qybvj7y965fgkab3px8anf-docker-compose
starting docker-compose
(0.00 seconds)
starting all machines
(0.00 seconds)
can_start_slurmdbd
dbd: must succeed: systemctl restart slurmdbd
store_dbd_1 is up-to-date
(0.39 seconds)
Test "can_start_slurmdbd" failed with error: "unit "slurmdbd.service" reached state "failed""
error: unit "slurmdbd.service" reached state "failed"
Traceback (most recent call last):
File "/home/jon/dev/nixos-compose/nixos_compose/driver.py", line 1225, in driver
exec(test_script, globals())
File "<string>", line 6, in <module>
File "/home/jon/dev/nixos-compose/nixos_compose/driver.py", line 416, in wait_for_unit
retry(check_active)
File "/home/jon/dev/nixos-compose/nixos_compose/driver.py", line 193, in retry
if fn(False):
File "/home/jon/dev/nixos-compose/nixos_compose/driver.py", line 401, in check_active
raise Exception('unit "{}" reached state "{}"'.format(unit, state))
Exception: unit "slurmdbd.service" reached state "failed"
cleaning up
(0.00 seconds)
(nixos-compose-d1i0LvFl-py3.8)
[jon@dednox:~/dev/issue2]$ nxc connect dbd
[root@dbd:/]# systemctl status slurmdbd.service
× slurmdbd.service
Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-12-02 12:42:44 UTC; 17s ago
Process: 496 ExecStartPre=/nix/store/3rp3b9p7gb1cwl8a0fpxjrfna6a0n2d9-unit-script-slurmdbd-pre-start/bin/slurmdbd-pre-start (code=exited, status=0/SUCCESS)
Process: 502 ExecStart=/nix/store/y9vrnwxjy8lay28n7szfyn0qm87bs4kn-unit-script-slurmdbd-start/bin/slurmdbd-start (code=exited, status=1/FAILURE)
Main PID: 502 (code=exited, status=1/FAILURE)
CPU: 6ms
Dec 02 12:42:44 dbd systemd[1]: Starting slurmdbd.service...
Dec 02 12:42:44 dbd systemd[1]: Started slurmdbd.service.
Dec 02 12:42:44 dbd slurmdbd[502]: error: Parse error in file /run/slurmdbd/slurmdbd.conf line 1: "DbdHost="
Dec 02 12:42:44 dbd slurmdbd[502]: fatal: Could not open/read/parse slurmdbd.conf file /run/slurmdbd/slurmdbd.conf
Dec 02 12:42:44 dbd systemd[1]: slurmdbd.service: Main process exited, code=exited, status=1/FAILURE
Dec 02 12:42:44 dbd systemd[1]: slurmdbd.service: Failed with result 'exit-code'.
On peut voir que le service n'a pas pu lire le fichier /run/slurmdbd/slurmdbd.conf
, celui n'existe pas.
Autre message d'erreur à partir du binaire slurmdbd
[root@dbd:/run/current-system/sw/bin]# /nix/store/w4j74bjzv7crgvfsc9xw6w474p8jaf02-wrappedSlurm/bin/slurmdbd
slurmdbd: No slurmdbd.conf file (/nix/store/fnvb295r6xclvi3l89bk2rnynsb6g3ww-etc-slurm/slurmdbd.conf)
slurmdbd: error: slurmdbd.conf lacks DbdHost parameter, using 'localhost'
slurmdbd: fatal: StorageType must be specified
[root@dbd:/run/current-system/sw/bin]# cat /nix/store/fnvb295r6xclvi3l89bk2rnynsb6g3ww-etc-slurm/slurmdbd.conf
cat: /nix/store/fnvb295r6xclvi3l89bk2rnynsb6g3ww-etc-slurm/slurmdbd.conf: No such file or directory
[root@dbd:/run/current-system/sw/bin]# l /nix/store | grep slurmdbd.conf
-r--r--r-- 1 root root 1.6K Jan 1 1970 1h740k1cqhvji9kdqallkrjs10wrkbiq-slurmdbd.conf.drv
-r--r--r-- 1 root root 1.6K Jan 1 1970 365pbsg7m6b668mly2simavsld5hn6cx-slurmdbd.conf.drv
-r--r--r-- 1 root root 81 Jan 1 1970 4wpw2ymh29449gpxgqbx6f1191s7vphy-slurmdbd.conf
-r--r--r-- 1 root root 84 Jan 1 1970 6159mvvr02bajxv0ljmq9zm3468vxx67-slurmdbd.conf
-r--r--r-- 1 root root 81 Jan 1 1970 b2d1fw0y5z5qmv66yh18ayfj08ywn41n-slurmdbd.conf
-r--r--r-- 1 root root 84 Jan 1 1970 i8k0hd13wf2pfqdyglg0m3dhdzf3nhvw-slurmdbd.conf
-r--r--r-- 1 root root 84 Jan 1 1970 j341wk7wy2kd2xn1zj12sqcgjx99q3n9-slurmdbd.conf
-r--r--r-- 1 root root 1.6K Jan 1 1970 kpnncdd62fakx56lrm6iv7393ja39dpw-slurmdbd.conf.drv
-r--r--r-- 1 root root 84 Jan 1 1970 nnjvrhs5wya0z65iycqjj8h5hvsf0vkp-slurmdbd.conf
-r--r--r-- 1 root root 1.6K Jan 1 1970 wjbw3cki2yvh3i9nrnashi14bvwgi2ha-slurmdbd.conf.drv
-r--r--r-- 1 root root 1.6K Jan 1 1970 y1g0vsgjns5s9hjjy8pd6niwyal7rp2x-slurmdbd.conf.drv
-r--r--r-- 1 root root 1.6K Jan 1 1970 y31vmxyh62xwaz8kz4gansa9fgjmj0y2-slurmdbd.conf.drv
[root@dbd:/run/current-system/sw/bin]# cat /nix/store/4wpw2ymh29449gpxgqbx6f1191s7vphy-slurmdbd.conf
DbdHost=
SlurmUser=slurm
StorageType=accounting_storage/mysql
StorageUser=slurm
[root@dbd:/run/current-system/sw/bin]# cat /nix/store/6159mvvr02bajxv0ljmq9zm3468vxx67-slurmdbd.conf
DbdHost=dbd
SlurmUser=slurm
StorageType=accounting_storage/mysql
StorageUser=slurm
[root@dbd:/run/current-system/sw/bin]# cat /nix/store/nnjvrhs5wya0z65iycqjj8h5hvsf0vkp-slurmdbd.conf
DbdHost=dbd
SlurmUser=slurm
StorageType=accounting_storage/mysql
StorageUser=slurm
Ici le binaire slurm me donne une erreur similaire, il cherche pourtant dans le store mais un fichier inexistant. en regardant dans le store on a plusieurs fichier slurmdbd.conf
et pas tous avec le même contenu.