Mentions légales du service

Skip to content

SSH connections through a jump host are not reliable

Example of an error I got with Enos on G5K with Enoslib 8.1.3 (automatic SSH jump host). This was with 4 hosts.

INFO:enoslib.log:[G5k] Waiting for the end of deployment [D-805892fc-1b04-4afb-8f99-246002926c8f]

PLAY [all] *********************************************************************************************************************************************************************************************************

TASK [Run dhcp on the nodes] ***************************************************************************************************************************************************************************************
fatal: [gros-44-kavlan-4.nancy.grid5000.fr]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host\r\nConnection closed by UNKNOWN port 65535", "unreachable": true}
changed: [gros-45-kavlan-4.nancy.grid5000.fr]
changed: [gros-30-kavlan-4.nancy.grid5000.fr]
changed: [gros-41-kavlan-4.nancy.grid5000.fr]

PLAY RECAP *********************************************************************************************************************************************************************************************************
gros-30-kavlan-4.nancy.grid5000.fr : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
gros-41-kavlan-4.nancy.grid5000.fr : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
gros-44-kavlan-4.nancy.grid5000.fr : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
gros-45-kavlan-4.nancy.grid5000.fr : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

ERROR:enoslib.api:Unreachable hosts: [_AnsibleExecutionRecord(host='gros-44-kavlan-4.nancy.grid5000.fr', status='UNREACHABLE', task='Run dhcp on the nodes', payload={'unreachable': True, 'msg': 'Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host\r\nConnection closed by UNKNOWN port 65535', 'changed': False})]
[_AnsibleExecutionRecord(host='gros-44-kavlan-4.nancy.grid5000.fr',
status='UNREACHABLE', task='Run dhcp on the nodes', payload={'unreachable':
True, 'msg': 'Failed to connect to the host via ssh:
kex_exchange_identification: Connection closed by remote host\r\nConnection
closed by UNKNOWN port 65535', 'changed': False})]
(CRITICAL cli.py:109)

Relaunching the script (that reuses the same G5K job), everything went fine, so this was a transient error.

Possibly because the host took a bit of time to become reachable, or because there were too many connections on the jump host, or some other issue.

@vparolgu reported a similar issue with Enoslib directly, where it can fail when gathering facts: image

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information