timeout when pushing big images
When a user commits a sandbox, the controller pushes the new image to the registry, but if the image has lots of changes (~hundreds of MBs or maybe GBs), we experience push failures because of socket timeouts
2018-Jun-06 16:45:11 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/response.py", line 186, in read
data = self._fp.read(amt)
File "/usr/lib/python3.4/http/client.py", line 500, in read
return super(HTTPResponse, self).read(amt)
File "/usr/lib/python3.4/http/client.py", line 529, in readinto
return self._readinto_chunked(b)
File "/usr/lib/python3.4/http/client.py", line 614, in _readinto_chunked
chunk_left = self._read_next_chunk_size()
File "/usr/lib/python3.4/http/client.py", line 552, in _read_next_chunk_size
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python3.4/socket.py", line 371, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/allgo-docker/controller.py", line 101, in report_error
yield
File "/opt/allgo-docker/controller.py", line 1390, in _process
yield from self.run_in_executor(docker_check_error, self.ctrl.sandbox.push, image, tag)
File "/opt/allgo-docker/controller.py", line 336, in run_in_executor
return (yield from run())
File "/usr/lib/python3.4/asyncio/tasks.py", line 472, in _wait_for_one
return f.result() # May raise f.exception().
File "/usr/lib/python3.4/asyncio/futures.py", line 277, in result
raise self._exception
File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/allgo-docker/controller.py", line 76, in docker_check_error
for elem in func(*k, stream=True, **kw):
File "/usr/lib/python3/dist-packages/docker/client.py", line 217, in _stream_helper
data = reader.read(1)
File "/usr/lib/python3/dist-packages/urllib3/response.py", line 201, in read
raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
2018-Jun-06 16:45:11 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/response.py", line 186, in read
data = self._fp.read(amt)
File "/usr/lib/python3.4/http/client.py", line 500, in read
return super(HTTPResponse, self).read(amt)
File "/usr/lib/python3.4/http/client.py", line 529, in readinto
return self._readinto_chunked(b)
File "/usr/lib/python3.4/http/client.py", line 614, in _readinto_chunked
chunk_left = self._read_next_chunk_size()
File "/usr/lib/python3.4/http/client.py", line 552, in _read_next_chunk_size
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python3.4/socket.py", line 371, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/allgo-docker/controller.py", line 286, in _done
hnd.cur.result()
File "/usr/lib/python3.4/asyncio/futures.py", line 277, in result
raise self._exception
File "/usr/lib/python3.4/asyncio/tasks.py", line 235, in _step
result = coro.send(value)
File "/opt/allgo-docker/controller.py", line 1390, in _process
yield from self.run_in_executor(docker_check_error, self.ctrl.sandbox.push, image, tag)
File "/opt/allgo-docker/controller.py", line 336, in run_in_executor
return (yield from run())
File "/usr/lib/python3.4/asyncio/tasks.py", line 472, in _wait_for_one
return f.result() # May raise f.exception().
File "/usr/lib/python3.4/asyncio/futures.py", line 277, in result
raise self._exception
File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/allgo-docker/controller.py", line 76, in docker_check_error
for elem in func(*k, stream=True, **kw):
File "/usr/lib/python3/dist-packages/docker/client.py", line 217, in _stream_helper
data = reader.read(1)
File "/usr/lib/python3/dist-packages/urllib3/response.py", line 201, in read
raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
2018-Jun-06 16:45:11 DEBUG controller task scheduled <controller.PushManager object at 0x7fe9b52b9dd8> 219
2018-Jun-06 16:45:11 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
Fortunately the controller is resilient and retries the push immediately (and users do not complain ;-)), however the upload takes a long time to complete (eg: ~40mn for magritpoc:1)
2018-Jun-06 16:38:07 INFO controller sandbox 'magritpoc' is now in state 'IDLE'
2018-Jun-06 16:45:11 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:45:11 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 16:45:11 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 16:46:12 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:46:12 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 16:46:12 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:46:12 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:46:12 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 16:47:12 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:47:12 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 16:47:12 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:47:12 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 16:54:55 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:54:55 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 16:54:55 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:54:55 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 16:55:58 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 16:55:58 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 16:55:58 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:12:05 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 17:13:14 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:13:14 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 17:13:14 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:13:14 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 17:14:14 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:14:14 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 17:14:14 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:14:14 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 17:15:14 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:15:14 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 17:15:14 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:15:14 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 17:16:14 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:16:14 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 17:16:14 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:16:15 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 17:17:15 ERROR controller unable to push version id 219 (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:17:15 ERROR controller task <controller.PushManager object at 0x7fe9b52b9dd8> 219 unhandled exception
2018-Jun-06 17:17:15 ERROR controller unable to pull version id 219 to swarm (urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.)
2018-Jun-06 17:17:15 INFO controller push from the sandbox cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
2018-Jun-06 17:17:44 INFO controller pull to the swarm cargo.irisa.fr:8000/allgo/prod/webapp/magritpoc:1
Increasing the timeout of docker-py requests (in the APIClient constructor) should be sufficient. Note: it is not possible to tune the timeout on a per-request basis (the timeout is global :-/)