g5k: using Enoslib through SSH proxy fails when configuring many nodes
When using Enoslib from outside Grid'5000, we need to use the SSH proxy at access.grid5000.fr
. However, this fails when trying to configure many nodes. With 14 nodes, Enoslib fails on one of the node at random with this error:
2022-08-30 18:13:50,369 WARNING: terminated: <SshProcess('cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys|sudo-g5k tee -a /root/.ssh/authorized_keys', Host('uvb-6.sophia.grid5000.fr'), connection_params={'user': 'user', 'keyfile': '/home/user/.ssh/id_rsa'}, name=cat ~/.ssh/id_rsa.pub ~/.ssh/authori..., started=True, start_date=2022-08-30 18:13:50+02:00, ended=True, end_date=2022-08-30 18:13:50+02:00, killed=False, error=False, error_reason=None, timeouted=False, expect_fail=False, write_error=False, exit_code=65280, ok=False, pid=31818, real cmd=('ssh', '-tt', '-o', 'BatchMode=yes', '-o', 'PasswordAuthentication=no', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'ConnectTimeout=20', '-o', 'User=user', '-i', '/home/user/.ssh/id_rsa', 'uvb-6.sophia.grid5000.fr', 'cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys|sudo-g5k tee -a /root/.ssh/authorized_keys'))>
stdout:
stderr:
kex_exchange_identification: read: Connection reset by peer
kex_exchange_identification: Connection closed by remote host
After some analysis, it turns out that Enoslib (through Ansible) creates simultaneous SSH connections for all nodes, and each of these connection creates a separate SSH connection to access.grid5000.fr
. When there are too many, some of these connection attempts fail.