Handling of `batch_size` is broken

update: bug description

I am changing this issue into a bug because I have discovered what I think is unexpected behaviour: when the researcher does not set a batch_size argument to their training_data function, the batch size in the training arguments is not correctly propagated to the DataManager._loader_arguments, and therefore not propagated to the data loader.

additional confusing things

To my understanding, the way we currently handle batch_size seems a bit confusing.

Here are some examples of what I mean:

Researchers are encouraged to declare batch_size as a parameter to their custom training_data function. In almost all our notebooks, we encourage them to define a default value, typically something like

def training_data(self, batch_size = 48):

However, this default value will never be taken into account! It will always be superseded by the default value that we set in the TrainingArgs validator.

If the researcher forgets to declare batch_size as an argument to the DataManager returned by their training_data function, then whatever they set in the training_args will be silently ingored!

I believe we should have a clearer way of handling this.

My suggestion (but I'm open for discussion) would be:

do not ask the researchers to manually put batch size anywhere. Even better, the training_data function should not take any arguments.
automatically inject the batch size through some interface with the DataManager (could be a set_batch_size method, or a append_to_loader_kwargs method, etc..)
the default value for batch_size in TrainingArgs should be 1, not 48, to avoid issues with smaller datasets.

Edited Nov 15, 2022 by CREMONESI Francesco

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Handling of `batch_size` is broken

update: bug description

additional confusing things