Handling of `batch_size` is broken
update: bug description
I am changing this issue into a bug because I have discovered what I think is unexpected behaviour: when the researcher does not set a batch_size
argument to their training_data
function, the batch size in the training arguments is not correctly propagated to the DataManager._loader_arguments
, and therefore not propagated to the data loader.
additional confusing things
To my understanding, the way we currently handle batch_size
seems a bit confusing.
Here are some examples of what I mean:
- Researchers are encouraged to declare
batch_size
as a parameter to their customtraining_data
function. In almost all our notebooks, we encourage them to define a default value, typically something like
def training_data(self, batch_size = 48):
However, this default value will never be taken into account! It will always be superseded by the default value that we set in the TrainingArgs
validator.
- If the researcher forgets to declare
batch_size
as an argument to theDataManager
returned by theirtraining_data
function, then whatever they set in thetraining_args
will be silently ingored!
I believe we should have a clearer way of handling this.
My suggestion (but I'm open for discussion) would be:
- do not ask the researchers to manually put batch size anywhere. Even better, the
training_data
function should not take any arguments. - automatically inject the batch size through some interface with the DataManager (could be a
set_batch_size
method, or aappend_to_loader_kwargs
method, etc..) - the default value for
batch_size
inTrainingArgs
should be 1, not 48, to avoid issues with smaller datasets.