[refactoring] factor our dataset configuration from simulation configuration (!5) · Merge requests · Machine learning for population genetics / private / dnadna

and improve distinction between dataset name and model name

This does two things:

Separates the dataset configuration (the config parameters needed just for loading data from a dataset) from the simulation config, which extends the dataset config with additional parameters specific to a simulation. This allows us to write a config file for reading an arbitrary dataset that isn't otherwise necessarily tied to a simulation--this is for issue #13 (closed) and specifically in reference to this comment: #13 (comment 325174)
One thing that was confusing about the original format was the "model_name" parameter of simulation configs. Here the "model" referred to a simulation --the result of running a simulator with some specific parameters. This was kind of confusing because a single simulated dataset is like an instance of a model. Then, the "model_name" of the simulation dataset would also be used as the name of a NN model trained on that simulation dataset, which was assumed to be the same name.

This now better separates the two notions: the "model_name" in simulation/dataset configs is now called just dataset_name. Meanwhile the training config file gets a new (required) "model_name" property, which is used the same way w.r.t. training as "model_name" was before. But now it may be different than (or the same as) the dataset_name.

Admin message