Update Buffer
The present MR does:
- Introduce a new buffer that evicts on write, and sample randomly.
- Factorize buffer classes with Mixin.
- Add tests to check different buffer behaviour and their performances.
- Add typing to make Mixin classes comply in part with Mypy.
Some major changes from previous buffer implementation:
-
Copy the
queue.Queue
methods necessary for the buffer to work, but does not inherit from it directly, because there are additional features we don't want likejoin
,task_done
. -
Replace the queue container from
collections.deque
to a simplelist
. Motivation is that we want to perform random access memory, and documentation states:Indexed access is O(1) at both ends but slows to O(n) in the middle. For fast random access, use lists instead.
-
Use Mixin to separate components and features, based on a previous suggestion of @rcaulk. Now, I'm not sure my implementation is the most readable. But we can have batch version, threshold, for virtually any queue we want. We can test those components independently. I added tests to be sure the actual behavior is the one expected.
-
The
test_buffer.py
also allows to check the performances of different buffer implementations. The experiment generate 500,000 samples with multi-threading. The buffer size is 1,000 and the threshold 200. Batch size is set to 16.Buffer # Sample Seen Time (s) ThresholdQueue 500,000 12 ThresholdReservoirQueue 1,213,000 85 BatchThresholdEvictOnWriteQueue 2,848,088 17 The experiment also shows that we don't have to call explicitly
torch.from_numpy
as it is done automatically in the defaultcollate_fn
.