Mentions légales du service

Skip to content

Implement new dataset splits configuration

E. Madison Bray requested to merge embray/issue-14 into master

Currently the split names are hard-coded: they can be either "training/validation/test/unused"

Of the two, currently only "training" and "validation" are required, because they are the only two used in the code. "Test" can be used for a test set, but we don't currently use that, so it is optional. We have not (to my recollection) discussed whether or not there are plans for explicitly doing something with the test set (e.g. the dnadna predict command could have an option to run against the test set).

"Unused" just means some other scenarios will be set aside for an otherwise unspecified reason. If the ratios of the splits don't add up to 1, the additional scenarios go to "unused" by default (generally this is not a desirable situation, but we allow it, and just log a warning).

If the ratios add up to greater than 1, this is an error.

Merge request reports