Incorrect types in AddScalarReply lead to experiment failure

If an incorrect type is provided to AddScalarReply or similar dataclasses, it raises an exception which leads to a failure of the experiment. For example, it may happen that num_batches is a float (e.g. 2.0) because the user defined num_updates: 2.0 in their training_args.

I do not think that this should be a critical error, nor that the experiment should fail. Instead, I propose we should catch the exception and continue the experiment.

Edited Apr 19, 2023 by CREMONESI Francesco

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Incorrect types in AddScalarReply lead to experiment failure