training/validation/test sets

Original question

How do we handle now the training/validation/ (test) sets ? Are they constant throughout the runs ?

Erik's answer

The split between training/validation sets is randomized, but there is an option in the config file to set the seed for the PRNG, so as long as the seed is set constant the sets will always be the same. I have a test which validates this (though as a I learned in this issue [1], when evaluating nets on CUDA there is still a possibility for some non-determinism having nothing to do with the PRNG, but this does not affect splitting into training/validation sets).

[1] #9 (closed)

They were constant during theophile's experiments, ie all networks are trained on the same dataset and validated on the same dataset, what about dnadna?

Yes, it should be consistent as long as the 'seed' option is set in the config file (which it usually is by default). Though I wonder (?) if it would be worth having different randomization options for different parts of the process (e.g. same seed for deciding validation set, but different seed for running training).

Edited Mar 18, 2021 by Jean Cury

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

training/validation/test sets