Restructuring of the generated training config file (!110) · Merge requests · Machine learning for population genetics / private / dnadna

E Madison Bray requested to merge embray/config/training-config-order-2 into master Jul 13, 2021

This does two major things:

Rather than making the training config file include full copies of were in the preprocessing and dataset config files, it inherits as much as possible (using the 'inherit' keyword) from the existing preprocessing config file (with learned_params being one exception, since they are more likely to need editing between training runs)
Improves the ordering of settings in the generated training config file. This is by no means perfect, but it can be difficult to completely control the order.

This also fixes a few algorithmic bugs, particularly in iteration over Config objects, as well as more problems caused by in-place filling of defaults during validation.

The difficulty of implementing these changes I think highlights some of the shortcomings in the complexity of the config system. There are some things that could use rethinking, such as:

How to merge together multiple configuration sources, while maintaining a specific desired ordering of the keys (perhaps defined by the schema?)
Saner handling of configuration defaults. Especially better handling of defaults inserted into the config during schema validation. In retrospect, this can create a lot of problems, and should maybe be performed by a separate method as a step after initial validation. Same for resolving relative filenames, and other steps that cause in-place modifications to the config being validated.

Admin message

Restructuring of the generated training config file

Merge request reports