Update Buffer (!98) · Merge requests · melissa / Melissa

Lucas Meyer requested to merge memory_buffer into develop Mar 08, 2023

The present MR does:

Some major changes from previous buffer implementation:

Copy the queue.Queue methods necessary for the buffer to work, but does not inherit from it directly, because there are additional features we don't want like join, task_done.
Replace the queue container from collections.deque to a simple list. Motivation is that we want to perform random access memory, and documentation states:

Indexed access is O(1) at both ends but slows to O(n) in the middle. For fast random access, use lists instead.
Use Mixin to separate components and features, based on a previous suggestion of @rcaulk. Now, I'm not sure my implementation is the most readable. But we can have batch version, threshold, for virtually any queue we want. We can test those components independently. I added tests to be sure the actual behavior is the one expected.
The test_buffer.py also allows to check the performances of different buffer implementations. The experiment generate 500,000 samples with multi-threading. The buffer size is 1,000 and the threshold 200. Batch size is set to 16.

Buffer # Sample Seen Time (s)

ThresholdQueue 500,000 12

ThresholdReservoirQueue 1,213,000 85

BatchThresholdEvictOnWriteQueue 2,848,088 17

The experiment also shows that we don't have to call explicitly torch.from_numpy as it is done automatically in the default collate_fn.

Buffer	# Sample Seen	Time (s)
ThresholdQueue	500,000	12
ThresholdReservoirQueue	1,213,000	85
BatchThresholdEvictOnWriteQueue	2,848,088	17

Edited Mar 08, 2023 by Lucas Meyer

Admin message