Implement new dataset splits configuration (!96) · Merge requests · Machine learning for population genetics / private / dnadna

E Madison Bray requested to merge embray/issue-14 into master Jun 21, 2021

Currently the split names are hard-coded: they can be either "training/validation/test/unused"

Of the two, currently only "training" and "validation" are required, because they are the only two used in the code. "Test" can be used for a test set, but we don't currently use that, so it is optional. We have not (to my recollection) discussed whether or not there are plans for explicitly doing something with the test set (e.g. the dnadna predict command could have an option to run against the test set).

"Unused" just means some other scenarios will be set aside for an otherwise unspecified reason. If the ratios of the splits don't add up to 1, the additional scenarios go to "unused" by default (generally this is not a desirable situation, but we allow it, and just log a warning).

If the ratios add up to greater than 1, this is an error.

Admin message

Implement new dataset splits configuration

Merge request reports