Draft version of converting server to multiprocess (!96) · Merge requests · melissa / Melissa

CAULK Robert requested to merge convert-server-to-multiprocess into develop Mar 02, 2023

The goal of this MR is to explore the strengths and weaknesses associated with converting the reception and training from multiple threads to multiple processes.

Some notes:

Cannot subclass multiprocessing Queue because it is a function (not a class) that creates a Queue. It creates multiprocessing.queues.Queue but subclassing this introduces issues because it side tracks all the additional functionality that the main Queue function does.

Alternative is to make new class that adds self.queue = Queue() as an attribute (as shown in this MR).

qsize(), full(), and empty() are not reliable in multiprocessing. We may need to use our own buffer_size counter and locks.

We cannot put reception on a new process because it needs all of the server self object to be properly shared. It is better to put the train() on a separate process (as shown in this MR).

TB logger and normal logging both become more difficult. Normal logging may be ok (yet untested):

import multiprocessing_logging

multiprocessing_logging.install_mp_handler()

But tensorboard logging may not. Needs further investigation.

Condition(Lock()).notify() and Condition(Lock()).wait() do not work, need to use Semaphores instead in the buffer (as shown in this MR).

All these points end up meaning that both buffers ultimately need a full rewrite + retesting.

Possible difficulties using queue.rotate() (see MR FIXMES).

Admin message

Draft version of converting server to multiprocess

Merge request reports