Enable GPU use on our CI runner and garden around 'pyproject.toml'. (!60) · Merge requests · Magnet / DecLearn / declearn2

ANDREY Paul requested to merge fix-ci-gpu into develop Nov 23, 2023

This MR is mostly about recovering and improving GPU use as part of our CI.

In practice, it embraces changes of four distinct types:

Changes specific to our CI runner and its current GPU and CUDA setup.
- Add some tox environments dedicated to our CUDA 11 setup.
- Add some explicit requirements for CUDA toolkit binaries, that have been tested to match our system and enable GPU support for both Jax, TensorFlow and Torch.
- Disable some unit tests on 'TorchOptiModule', as for some reason TensorFlow basic operations run on GPU, but optimizers crash due to PTX incompatibility.
Generic improvements to our test suite.
- Add environment variables to GPU-enabled tox environments, to prevent Jax and/or TensorFlow from pre-allocating the entire memory of the GPU they use.
- Fix the way the device-selection policy is set depending on the tests.
- Fix 'lint_tests', which previously did not properly run mypy on tests' code.
- Fix some TensorFlow and Jax data being generated on GPU rather than CPU.
Generic improvements to our code.
- Fix 'JaxNumpyVector' equality operator on inputs backed by distinct devices.
- Apply some linter-based minor backend fixes.
Improvements to 'pyproject.toml' dependencies' specifications.
- Better specify some loose dependencies, notably using '~=' wherever suitable.
- Add explicit support for newer versions of some dependencies; namely, dm-haiku and websockets. Future versions will have to forego the same process (identify existence, read release notes for breaking changes, verify compatibility via running our test suite, update version specifiers if things go smoothly).

Admin message