Mentions légales du service

Skip to content

Drastically decrease memory usage by switching multiprocessing to forkserver

Théophile BASTIAN requested to merge tbastian/toomanycows into master

Before this MR, the parallelism relies on a the fork method for multiprocessing.set_start_method (default), meaning the multiprocessing is done in a usual, unix-like fashion: all the memory is passed to the child (forked) process in a Copy-on-Write (CoW) fashion.

Yet, Python being Python, it seems that the huge chunks of memory used (up to 15GB observed on a dense benchmarks matrix) are actually copied at some point, probably because of data structures being re-indexed or magicked upon. This leads to a situation where we need ~10GB x NB_CPU RAM, which, often, is just too much and we run out of memory.

This MR makes the CorePinnedPool rely on the forkserver method instead, which does not share unnecessary memory, therefore not CoW-ing. This required a bit of a revamp.

Merge request reports