Incorrect types in AddScalarReply lead to experiment failure
If an incorrect type is provided to AddScalarReply
or similar dataclasses, it raises an exception which leads to a failure of the experiment. For example, it may happen that num_batches
is a float (e.g. 2.0) because the user defined num_updates: 2.0
in their training_args
.
I do not think that this should be a critical error, nor that the experiment should fail. Instead, I propose we should catch the exception and continue the experiment.
Edited by CREMONESI Francesco