Improve error message when errors in config file

The json-schemas are quite powerful yet complex objects, and I think not everybody will take the time to read the schema properly, so if we could have easy to understand error messages it would be nice.

I'll show here 3 types of scenarios:

Missing property:

-> In the training config file, the parameter learned_param is missing>

command: (click to unroll the output)

$ dnadna preprocess bactsel/bactsel_training_config.yml

an unexpected error occurred: 'learned_params' is a required property

Failed validating 'required' in schema:
    {'$schema': 'http://json-schema.org/draft-07/schema#',
     'additionalProperties': True,
     'properties': {'SNP_min': {'description': 'minimum number of SNPs '
                                               'each sample should have',
                                'minimum': 1,
                                'type': 'integer'},
                    'batch_size': {'default': 1,
                                   'description': 'sample batch size to '
                                                  'train on',
                                   'minimum': 1,
                                   'type': 'integer'},
                    'cuda_device': {'default': None,
                                    'description': 'specifies the CUDA '
                                                   'device index to use',
                                    'oneOf': [{'minimum': 0,
                                               'type': 'integer'},
                                              {'type': 'null'}]},
                    'dataset_params': {'additionalProperties': True,
                                       'default': {'concat': True,
                                                   'ignore_missing': False},
                                       'description': 'options specific to '
                                                      'the dataset used '
                                                      'for training; e.g. '
                                                      'to apply '
                                                      'augmentations to '
                                                      'the dataset',
                                       'properties': {'concat': {'default': True,
                                                                 'description': 'when '
                                                                                'loading '
                                                                                'SNPs '
                                                                                'from '
                                                                                'a '
                                                                                'dataset, '
                                                                                'concatenate '
                                                                                'the '
                                                                                'positions '
                                                                                'array '
                                                                                'to '
                                                                                'the '
                                                                                'SNP '
                                                                                'matrix '
                                                                                'instead '
                                                                                'of '
                                                                                'multiplying '
                                                                                'by '
                                                                                'it',
                                                                 'type': 'boolean'},
                                                      'ignore_missing': {'default': False,
                                                                         'description': 'ignore '
                                                                                        'missing '
                                                                                        'scenarios '
                                                                                        'or '
                                                                                        'replicates '
                                                                                        'when '
                                                                                        'loading '
                                                                                        'data '
                                                                                        'samples; '
                                                                                        'in '
                                                                                        'the '
                                                                                        'case '
                                                                                        'of '
                                                                                        'missing '
                                                                                        'samples '
                                                                                        'the '
                                                                                        'next '
                                                                                        'one '
                                                                                        'is '
                                                                                        'tried '
                                                                                        'until '
                                                                                        'one '
                                                                                        'is '
                                                                                        'found',
                                                                         'type': 'boolean'},
                                                      'transforms': {'additionalProperties': True,
                                                                     'description': 'dictionary '
                                                                                    'of '
                                                                                    'transforms '
                                                                                    'to '
                                                                                    'apply '
                                                                                    'to '
                                                                                    'the '
                                                                                    'dataset; '
                                                                                    'all '
                                                                                    'optional '
                                                                                    'transforms '
                                                                                    'are '
                                                                                    'disabled '
                                                                                    'by '
                                                                                    'default '
                                                                                    'unless '
                                                                                    'specified '
                                                                                    'here; '
                                                                                    'some '
                                                                                    'transforms '
                                                                                    'may '
                                                                                    'take '
                                                                                    'one '
                                                                                    'or '
                                                                                    'more '
                                                                                    'parameters '
                                                                                    'specified '
                                                                                    'in '
                                                                                    'the '
                                                                                    'value '
                                                                                    'associated '
                                                                                    'with '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'name--if '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'does '
                                                                                    'not '
                                                                                    'take '
                                                                                    'a '
                                                                                    'parameter '
                                                                                    'then '
                                                                                    'just '
                                                                                    'use '
                                                                                    'the '
                                                                                    'value '
                                                                                    'true '
                                                                                    'to '
                                                                                    'enable '
                                                                                    'it',
                                                                     'properties': {'rotate': {'default': False,
                                                                                               'description': 'apply '
                                                                                                              'a '
                                                                                                              'random '
                                                                                                              'rotation '
                                                                                                              'along '
                                                                                                              'the '
                                                                                                              'SNP '
                                                                                                              'axis '
                                                                                                              'of '
                                                                                                              'a '
                                                                                                              'sequence',
                                                                                               'type': 'boolean'},
                                                                                    'subsample': {'description': 'take '
                                                                                                                 'random '
                                                                                                                 'subsamples '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'SNP '
                                                                                                                 'matrix; '
                                                                                                                 'the '
                                                                                                                 'argument '
                                                                                                                 'is '
                                                                                                                 'a '
                                                                                                                 'pair '
                                                                                                                 '(min, '
                                                                                                                 'max) '
                                                                                                                 'of '
                                                                                                                 'integers '
                                                                                                                 'giving '
                                                                                                                 'the '
                                                                                                                 'range '
                                                                                                                 'for '
                                                                                                                 'random '
                                                                                                                 'sizes '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'subsamples, '
                                                                                                                 'or '
                                                                                                                 'a '
                                                                                                                 'single '
                                                                                                                 'integer '
                                                                                                                 'giving '
                                                                                                                 'a '
                                                                                                                 'fixed '
                                                                                                                 'size '
                                                                                                                 'for '
                                                                                                                 'the '
                                                                                                                 'subsamples',
                                                                                                  'oneOf': [{'items': {'minimum': 1,
                                                                                                                       'type': 'integer'},
                                                                                                             'maxItems': 2,
                                                                                                             'minItems': 2,
                                                                                                             'type': 'array'},
                                                                                                            {'minimum': 1,
                                                                                                             'type': 'integer'}]}},
                                                                     'type': 'object'}},
                                       'type': 'object'},
                    'evaluation_interval': {'default': 1,
                                            'description': 'interval in '
                                                           'the training '
                                                           'loop in which '
                                                           'to perform '
                                                           'model '
                                                           'validation',
                                            'minimum': 1,
                                            'type': 'integer'},
                    'learned_params': {'$ref': 'learned-params.yml',
                                       'description': 'configuration of '
                                                      'parameters to learn '
                                                      'in training'},
                    'learning_rate': {'default': 0.001,
                                      'description': 'the learning rate '
                                                     'for runs using this '
                                                     'configuration',
                                      'exclusiveMinimum': 0,
                                      'type': 'number'},
                    'loader_num_workers': {'default': 0,
                                           'description': 'number of '
                                                          'subprocesses to '
                                                          'use for data '
                                                          'loading',
                                           'minimum': 0,
                                           'type': 'integer'},
                    'maf': {'default': 0,
                            'description': 'minor allele frequency; used '
                                           'during pre-processing',
                            'minimum': 0,
                            'type': 'number'},
                    'model_root': {'default': '.',
                                   'description': 'root directory for all '
                                                  'training runs of this '
                                                  'model / training '
                                                  'configuration',
                                   'format': 'filename!',
                                   'type': 'string'},
                    'n_epochs': {'default': 1,
                                 'description': 'number of epochs over '
                                                'which to repeat the '
                                                'training process',
                                 'minimum': 1,
                                 'type': 'integer'},
                    'n_validation_scenarios': {'default': 1,
                                               'description': 'number of '
                                                              'scenarios '
                                                              'out of the '
                                                              'set of '
                                                              'usable '
                                                              'scenarios '
                                                              'to use for '
                                                              'validation '
                                                              'as opposed '
                                                              'to training',
                                               'minimum': 1,
                                               'type': 'integer'},
                    'net_params': {'additionalProperties': True,
                                   'default': {},
                                   'description': 'options specific to the '
                                                  'neural net model being '
                                                  'trained; these are '
                                                  'passed as keyword '
                                                  "arguments to the net's "
                                                  'constructor (see '
                                                  'dnadna.net module)',
                                   'type': 'object'},
                    'network_name': {'default': 'SPIDNA1',
                                     'description': 'name of the neural '
                                                    'net model to train',
                                     'minLength': 1,
                                     'type': 'string'},
                    'run_name_format': {'default': 'run_{run_id}',
                                        'description': 'format string for '
                                                       'the name given to '
                                                       'this run for a '
                                                       'sequence of runs '
                                                       'of the same model; '
                                                       'the outputs of '
                                                       'each run are '
                                                       'placed in '
                                                       'subdirectories of '
                                                       '<run_path>/<model_name> '
                                                       'with the name of '
                                                       'this run; the '
                                                       'format string can '
                                                       'use the template '
                                                       'variables '
                                                       'model_name and '
                                                       'run_id',
                                        'minLength': 4,
                                        'type': 'string'},
                    'scenario_params_path': {'description': 'path to the '
                                                            'scenario '
                                                            'parameters '
                                                            'file, either '
                                                            'absolute or '
                                                            'relative to '
                                                            'this file',
                                             'format': 'filename',
                                             'minLength': 1,
                                             'type': 'string'},
                    'seed': {'description': 'seed for initializing the '
                                            'PRNG prior to a training run '
                                            'for reproducible results; if '
                                            'unspecified the PRNG chooses '
                                            'its default seeding method',
                             'type': 'integer'},
                    'simulation': {'$ref': 'simulation.yml',
                                   'description': 'the simulation '
                                                  'configuration'},
                    'start_from_last_checkpoint': {'default': False,
                                                   'description': 'if '
                                                                  'true, '
                                                                  'resume '
                                                                  'training '
                                                                  'from a '
                                                                  'snapshot '
                                                                  'of the '
                                                                  'net '
                                                                  'that is '
                                                                  'saved '
                                                                  'each '
                                                                  'epoch',
                                                   'type': 'boolean'},
                    'transform_allel_min_major': {'default': False,
                                                  'type': 'boolean'},
                    'use_cuda': {'default': True,
                                 'description': 'use CUDA-capable GPU '
                                                'where available',
                                 'type': 'boolean'},
                    'weight_decay': {'description': 'the weight decay to '
                                                    'apply to the '
                                                    'training; if ommitted '
                                                    'or zero weight decay '
                                                    'is not applied',
                                     'minimum': 0,
                                     'type': 'number'}},
     'required': ['simulation', 'learned_params'],
     'type': 'object'}

On instance:
    Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}}); run again with --debug to view the full traceback

Given that in the command line, the first thing we see it the bottom, nothing seems to help us there unless we go all the way up to see that an unexpected error occurred: 'learned_params' is a required property, which would have been enough.

Adding --debug gives:

command:

$ dnadna preprocess bactsel/bactsel_training_config.yml --debug

an unexpected error occurred: 'learned_params' is a required property

Failed validating 'required' in schema:
    {'$schema': 'http://json-schema.org/draft-07/schema#',
     'additionalProperties': True,
     'properties': {'SNP_min': {'description': 'minimum number of SNPs '
                                               'each sample should have',
                                'minimum': 1,
                                'type': 'integer'},
                    'batch_size': {'default': 1,
                                   'description': 'sample batch size to '
                                                  'train on',
                                   'minimum': 1,
                                   'type': 'integer'},
                    'cuda_device': {'default': None,
                                    'description': 'specifies the CUDA '
                                                   'device index to use',
                                    'oneOf': [{'minimum': 0,
                                               'type': 'integer'},
                                              {'type': 'null'}]},
                    'dataset_params': {'additionalProperties': True,
                                       'default': {'concat': True,
                                                   'ignore_missing': False},
                                       'description': 'options specific to '
                                                      'the dataset used '
                                                      'for training; e.g. '
                                                      'to apply '
                                                      'augmentations to '
                                                      'the dataset',
                                       'properties': {'concat': {'default': True,
                                                                 'description': 'when '
                                                                                'loading '
                                                                                'SNPs '
                                                                                'from '
                                                                                'a '
                                                                                'dataset, '
                                                                                'concatenate '
                                                                                'the '
                                                                                'positions '
                                                                                'array '
                                                                                'to '
                                                                                'the '
                                                                                'SNP '
                                                                                'matrix '
                                                                                'instead '
                                                                                'of '
                                                                                'multiplying '
                                                                                'by '
                                                                                'it',
                                                                 'type': 'boolean'},
                                                      'ignore_missing': {'default': False,
                                                                         'description': 'ignore '
                                                                                        'missing '
                                                                                        'scenarios '
                                                                                        'or '
                                                                                        'replicates '
                                                                                        'when '
                                                                                        'loading '
                                                                                        'data '
                                                                                        'samples; '
                                                                                        'in '
                                                                                        'the '
                                                                                        'case '
                                                                                        'of '
                                                                                        'missing '
                                                                                        'samples '
                                                                                        'the '
                                                                                        'next '
                                                                                        'one '
                                                                                        'is '
                                                                                        'tried '
                                                                                        'until '
                                                                                        'one '
                                                                                        'is '
                                                                                        'found',
                                                                         'type': 'boolean'},
                                                      'transforms': {'additionalProperties': True,
                                                                     'description': 'dictionary '
                                                                                    'of '
                                                                                    'transforms '
                                                                                    'to '
                                                                                    'apply '
                                                                                    'to '
                                                                                    'the '
                                                                                    'dataset; '
                                                                                    'all '
                                                                                    'optional '
                                                                                    'transforms '
                                                                                    'are '
                                                                                    'disabled '
                                                                                    'by '
                                                                                    'default '
                                                                                    'unless '
                                                                                    'specified '
                                                                                    'here; '
                                                                                    'some '
                                                                                    'transforms '
                                                                                    'may '
                                                                                    'take '
                                                                                    'one '
                                                                                    'or '
                                                                                    'more '
                                                                                    'parameters '
                                                                                    'specified '
                                                                                    'in '
                                                                                    'the '
                                                                                    'value '
                                                                                    'associated '
                                                                                    'with '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'name--if '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'does '
                                                                                    'not '
                                                                                    'take '
                                                                                    'a '
                                                                                    'parameter '
                                                                                    'then '
                                                                                    'just '
                                                                                    'use '
                                                                                    'the '
                                                                                    'value '
                                                                                    'true '
                                                                                    'to '
                                                                                    'enable '
                                                                                    'it',
                                                                     'properties': {'rotate': {'default': False,
                                                                                               'description': 'apply '
                                                                                                              'a '
                                                                                                              'random '
                                                                                                              'rotation '
                                                                                                              'along '
                                                                                                              'the '
                                                                                                              'SNP '
                                                                                                              'axis '
                                                                                                              'of '
                                                                                                              'a '
                                                                                                              'sequence',
                                                                                               'type': 'boolean'},
                                                                                    'subsample': {'description': 'take '
                                                                                                                 'random '
                                                                                                                 'subsamples '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'SNP '
                                                                                                                 'matrix; '
                                                                                                                 'the '
                                                                                                                 'argument '
                                                                                                                 'is '
                                                                                                                 'a '
                                                                                                                 'pair '
                                                                                                                 '(min, '
                                                                                                                 'max) '
                                                                                                                 'of '
                                                                                                                 'integers '
                                                                                                                 'giving '
                                                                                                                 'the '
                                                                                                                 'range '
                                                                                                                 'for '
                                                                                                                 'random '
                                                                                                                 'sizes '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'subsamples, '
                                                                                                                 'or '
                                                                                                                 'a '
                                                                                                                 'single '
                                                                                                                 'integer '
                                                                                                                 'giving '
                                                                                                                 'a '
                                                                                                                 'fixed '
                                                                                                                 'size '
                                                                                                                 'for '
                                                                                                                 'the '
                                                                                                                 'subsamples',
                                                                                                  'oneOf': [{'items': {'minimum': 1,
                                                                                                                       'type': 'integer'},
                                                                                                             'maxItems': 2,
                                                                                                             'minItems': 2,
                                                                                                             'type': 'array'},
                                                                                                            {'minimum': 1,
                                                                                                             'type': 'integer'}]}},
                                                                     'type': 'object'}},
                                       'type': 'object'},
                    'evaluation_interval': {'default': 1,
                                            'description': 'interval in '
                                                           'the training '
                                                           'loop in which '
                                                           'to perform '
                                                           'model '
                                                           'validation',
                                            'minimum': 1,
                                            'type': 'integer'},
                    'learned_params': {'$ref': 'learned-params.yml',
                                       'description': 'configuration of '
                                                      'parameters to learn '
                                                      'in training'},
                    'learning_rate': {'default': 0.001,
                                      'description': 'the learning rate '
                                                     'for runs using this '
                                                     'configuration',
                                      'exclusiveMinimum': 0,
                                      'type': 'number'},
                    'loader_num_workers': {'default': 0,
                                           'description': 'number of '
                                                          'subprocesses to '
                                                          'use for data '
                                                          'loading',
                                           'minimum': 0,
                                           'type': 'integer'},
                    'maf': {'default': 0,
                            'description': 'minor allele frequency; used '
                                           'during pre-processing',
                            'minimum': 0,
                            'type': 'number'},
                    'model_root': {'default': '.',
                                   'description': 'root directory for all '
                                                  'training runs of this '
                                                  'model / training '
                                                  'configuration',
                                   'format': 'filename!',
                                   'type': 'string'},
                    'n_epochs': {'default': 1,
                                 'description': 'number of epochs over '
                                                'which to repeat the '
                                                'training process',
                                 'minimum': 1,
                                 'type': 'integer'},
                    'n_validation_scenarios': {'default': 1,
                                               'description': 'number of '
                                                              'scenarios '
                                                              'out of the '
                                                              'set of '
                                                              'usable '
                                                              'scenarios '
                                                              'to use for '
                                                              'validation '
                                                              'as opposed '
                                                              'to training',
                                               'minimum': 1,
                                               'type': 'integer'},
                    'net_params': {'additionalProperties': True,
                                   'default': {},
                                   'description': 'options specific to the '
                                                  'neural net model being '
                                                  'trained; these are '
                                                  'passed as keyword '
                                                  "arguments to the net's "
                                                  'constructor (see '
                                                  'dnadna.net module)',
                                   'type': 'object'},
                    'network_name': {'default': 'SPIDNA1',
                                     'description': 'name of the neural '
                                                    'net model to train',
                                     'minLength': 1,
                                     'type': 'string'},
                    'run_name_format': {'default': 'run_{run_id}',
                                        'description': 'format string for '
                                                       'the name given to '
                                                       'this run for a '
                                                       'sequence of runs '
                                                       'of the same model; '
                                                       'the outputs of '
                                                       'each run are '
                                                       'placed in '
                                                       'subdirectories of '
                                                       '<run_path>/<model_name> '
                                                       'with the name of '
                                                       'this run; the '
                                                       'format string can '
                                                       'use the template '
                                                       'variables '
                                                       'model_name and '
                                                       'run_id',
                                        'minLength': 4,
                                        'type': 'string'},
                    'scenario_params_path': {'description': 'path to the '
                                                            'scenario '
                                                            'parameters '
                                                            'file, either '
                                                            'absolute or '
                                                            'relative to '
                                                            'this file',
                                             'format': 'filename',
                                             'minLength': 1,
                                             'type': 'string'},
                    'seed': {'description': 'seed for initializing the '
                                            'PRNG prior to a training run '
                                            'for reproducible results; if '
                                            'unspecified the PRNG chooses '
                                            'its default seeding method',
                             'type': 'integer'},
                    'simulation': {'$ref': 'simulation.yml',
                                   'description': 'the simulation '
                                                  'configuration'},
                    'start_from_last_checkpoint': {'default': False,
                                                   'description': 'if '
                                                                  'true, '
                                                                  'resume '
                                                                  'training '
                                                                  'from a '
                                                                  'snapshot '
                                                                  'of the '
                                                                  'net '
                                                                  'that is '
                                                                  'saved '
                                                                  'each '
                                                                  'epoch',
                                                   'type': 'boolean'},
                    'transform_allel_min_major': {'default': False,
                                                  'type': 'boolean'},
                    'use_cuda': {'default': True,
                                 'description': 'use CUDA-capable GPU '
                                                'where available',
                                 'type': 'boolean'},
                    'weight_decay': {'description': 'the weight decay to '
                                                    'apply to the '
                                                    'training; if ommitted '
                                                    'or zero weight decay '
                                                    'is not applied',
                                     'minimum': 0,
                                     'type': 'number'}},
     'required': ['simulation', 'learned_params'],
     'type': 'object'}

On instance:
    Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}}); run again with --debug to view the full traceback
an unexpected error occurred: 'learned_params' is a required property

Failed validating 'required' in schema:
    {'$schema': 'http://json-schema.org/draft-07/schema#',
     'additionalProperties': True,
     'properties': {'SNP_min': {'description': 'minimum number of SNPs '
                                               'each sample should have',
                                'minimum': 1,
                                'type': 'integer'},
                    'batch_size': {'default': 1,
                                   'description': 'sample batch size to '
                                                  'train on',
                                   'minimum': 1,
                                   'type': 'integer'},
                    'cuda_device': {'default': None,
                                    'description': 'specifies the CUDA '
                                                   'device index to use',
                                    'oneOf': [{'minimum': 0,
                                               'type': 'integer'},
                                              {'type': 'null'}]},
                    'dataset_params': {'additionalProperties': True,
                                       'default': {'concat': True,
                                                   'ignore_missing': False},
                                       'description': 'options specific to '
                                                      'the dataset used '
                                                      'for training; e.g. '
                                                      'to apply '
                                                      'augmentations to '
                                                      'the dataset',
                                       'properties': {'concat': {'default': True,
                                                                 'description': 'when '
                                                                                'loading '
                                                                                'SNPs '
                                                                                'from '
                                                                                'a '
                                                                                'dataset, '
                                                                                'concatenate '
                                                                                'the '
                                                                                'positions '
                                                                                'array '
                                                                                'to '
                                                                                'the '
                                                                                'SNP '
                                                                                'matrix '
                                                                                'instead '
                                                                                'of '
                                                                                'multiplying '
                                                                                'by '
                                                                                'it',
                                                                 'type': 'boolean'},
                                                      'ignore_missing': {'default': False,
                                                                         'description': 'ignore '
                                                                                        'missing '
                                                                                        'scenarios '
                                                                                        'or '
                                                                                        'replicates '
                                                                                        'when '
                                                                                        'loading '
                                                                                        'data '
                                                                                        'samples; '
                                                                                        'in '
                                                                                        'the '
                                                                                        'case '
                                                                                        'of '
                                                                                        'missing '
                                                                                        'samples '
                                                                                        'the '
                                                                                        'next '
                                                                                        'one '
                                                                                        'is '
                                                                                        'tried '
                                                                                        'until '
                                                                                        'one '
                                                                                        'is '
                                                                                        'found',
                                                                         'type': 'boolean'},
                                                      'transforms': {'additionalProperties': True,
                                                                     'description': 'dictionary '
                                                                                    'of '
                                                                                    'transforms '
                                                                                    'to '
                                                                                    'apply '
                                                                                    'to '
                                                                                    'the '
                                                                                    'dataset; '
                                                                                    'all '
                                                                                    'optional '
                                                                                    'transforms '
                                                                                    'are '
                                                                                    'disabled '
                                                                                    'by '
                                                                                    'default '
                                                                                    'unless '
                                                                                    'specified '
                                                                                    'here; '
                                                                                    'some '
                                                                                    'transforms '
                                                                                    'may '
                                                                                    'take '
                                                                                    'one '
                                                                                    'or '
                                                                                    'more '
                                                                                    'parameters '
                                                                                    'specified '
                                                                                    'in '
                                                                                    'the '
                                                                                    'value '
                                                                                    'associated '
                                                                                    'with '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'name--if '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'does '
                                                                                    'not '
                                                                                    'take '
                                                                                    'a '
                                                                                    'parameter '
                                                                                    'then '
                                                                                    'just '
                                                                                    'use '
                                                                                    'the '
                                                                                    'value '
                                                                                    'true '
                                                                                    'to '
                                                                                    'enable '
                                                                                    'it',
                                                                     'properties': {'rotate': {'default': False,
                                                                                               'description': 'apply '
                                                                                                              'a '
                                                                                                              'random '
                                                                                                              'rotation '
                                                                                                              'along '
                                                                                                              'the '
                                                                                                              'SNP '
                                                                                                              'axis '
                                                                                                              'of '
                                                                                                              'a '
                                                                                                              'sequence',
                                                                                               'type': 'boolean'},
                                                                                    'subsample': {'description': 'take '
                                                                                                                 'random '
                                                                                                                 'subsamples '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'SNP '
                                                                                                                 'matrix; '
                                                                                                                 'the '
                                                                                                                 'argument '
                                                                                                                 'is '
                                                                                                                 'a '
                                                                                                                 'pair '
                                                                                                                 '(min, '
                                                                                                                 'max) '
                                                                                                                 'of '
                                                                                                                 'integers '
                                                                                                                 'giving '
                                                                                                                 'the '
                                                                                                                 'range '
                                                                                                                 'for '
                                                                                                                 'random '
                                                                                                                 'sizes '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'subsamples, '
                                                                                                                 'or '
                                                                                                                 'a '
                                                                                                                 'single '
                                                                                                                 'integer '
                                                                                                                 'giving '
                                                                                                                 'a '
                                                                                                                 'fixed '
                                                                                                                 'size '
                                                                                                                 'for '
                                                                                                                 'the '
                                                                                                                 'subsamples',
                                                                                                  'oneOf': [{'items': {'minimum': 1,
                                                                                                                       'type': 'integer'},
                                                                                                             'maxItems': 2,
                                                                                                             'minItems': 2,
                                                                                                             'type': 'array'},
                                                                                                            {'minimum': 1,
                                                                                                             'type': 'integer'}]}},
                                                                     'type': 'object'}},
                                       'type': 'object'},
                    'evaluation_interval': {'default': 1,
                                            'description': 'interval in '
                                                           'the training '
                                                           'loop in which '
                                                           'to perform '
                                                           'model '
                                                           'validation',
                                            'minimum': 1,
                                            'type': 'integer'},
                    'learned_params': {'$ref': 'learned-params.yml',
                                       'description': 'configuration of '
                                                      'parameters to learn '
                                                      'in training'},
                    'learning_rate': {'default': 0.001,
                                      'description': 'the learning rate '
                                                     'for runs using this '
                                                     'configuration',
                                      'exclusiveMinimum': 0,
                                      'type': 'number'},
                    'loader_num_workers': {'default': 0,
                                           'description': 'number of '
                                                          'subprocesses to '
                                                          'use for data '
                                                          'loading',
                                           'minimum': 0,
                                           'type': 'integer'},
                    'maf': {'default': 0,
                            'description': 'minor allele frequency; used '
                                           'during pre-processing',
                            'minimum': 0,
                            'type': 'number'},
                    'model_root': {'default': '.',
                                   'description': 'root directory for all '
                                                  'training runs of this '
                                                  'model / training '
                                                  'configuration',
                                   'format': 'filename!',
                                   'type': 'string'},
                    'n_epochs': {'default': 1,
                                 'description': 'number of epochs over '
                                                'which to repeat the '
                                                'training process',
                                 'minimum': 1,
                                 'type': 'integer'},
                    'n_validation_scenarios': {'default': 1,
                                               'description': 'number of '
                                                              'scenarios '
                                                              'out of the '
                                                              'set of '
                                                              'usable '
                                                              'scenarios '
                                                              'to use for '
                                                              'validation '
                                                              'as opposed '
                                                              'to training',
                                               'minimum': 1,
                                               'type': 'integer'},
                    'net_params': {'additionalProperties': True,
                                   'default': {},
                                   'description': 'options specific to the '
                                                  'neural net model being '
                                                  'trained; these are '
                                                  'passed as keyword '
                                                  "arguments to the net's "
                                                  'constructor (see '
                                                  'dnadna.net module)',
                                   'type': 'object'},
                    'network_name': {'default': 'SPIDNA1',
                                     'description': 'name of the neural '
                                                    'net model to train',
                                     'minLength': 1,
                                     'type': 'string'},
                    'run_name_format': {'default': 'run_{run_id}',
                                        'description': 'format string for '
                                                       'the name given to '
                                                       'this run for a '
                                                       'sequence of runs '
                                                       'of the same model; '
                                                       'the outputs of '
                                                       'each run are '
                                                       'placed in '
                                                       'subdirectories of '
                                                       '<run_path>/<model_name> '
                                                       'with the name of '
                                                       'this run; the '
                                                       'format string can '
                                                       'use the template '
                                                       'variables '
                                                       'model_name and '
                                                       'run_id',
                                        'minLength': 4,
                                        'type': 'string'},
                    'scenario_params_path': {'description': 'path to the '
                                                            'scenario '
                                                            'parameters '
                                                            'file, either '
                                                            'absolute or '
                                                            'relative to '
                                                            'this file',
                                             'format': 'filename',
                                             'minLength': 1,
                                             'type': 'string'},
                    'seed': {'description': 'seed for initializing the '
                                            'PRNG prior to a training run '
                                            'for reproducible results; if '
                                            'unspecified the PRNG chooses '
                                            'its default seeding method',
                             'type': 'integer'},
                    'simulation': {'$ref': 'simulation.yml',
                                   'description': 'the simulation '
                                                  'configuration'},
                    'start_from_last_checkpoint': {'default': False,
                                                   'description': 'if '
                                                                  'true, '
                                                                  'resume '
                                                                  'training '
                                                                  'from a '
                                                                  'snapshot '
                                                                  'of the '
                                                                  'net '
                                                                  'that is '
                                                                  'saved '
                                                                  'each '
                                                                  'epoch',
                                                   'type': 'boolean'},
                    'transform_allel_min_major': {'default': False,
                                                  'type': 'boolean'},
                    'use_cuda': {'default': True,
                                 'description': 'use CUDA-capable GPU '
                                                'where available',
                                 'type': 'boolean'},
                    'weight_decay': {'description': 'the weight decay to '
                                                    'apply to the '
                                                    'training; if ommitted '
                                                    'or zero weight decay '
                                                    'is not applied',
                                     'minimum': 0,
                                     'type': 'number'}},
     'required': ['simulation', 'learned_params'],
     'type': 'object'}

On instance:
    Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}}); run again with --debug to view the full traceback
Traceback (most recent call last):
  File "/home/jean/anaconda3/envs/dnadna/bin/dnadna", line 11, in <module>
    load_entry_point('dnadna', 'console_scripts', 'dnadna')()
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
    raise exc
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 790, in main
    ret2 = cls.run_subcommand(args)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 759, in run_subcommand
    return command_cls.main(command[1:], namespace=args)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
    raise exc
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 782, in main
    ret = cls.run(args)
  File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 653, in run
    args.config)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1357, in from_config_file
    return cls(config=config, validate=validate)
  File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 95, in __init__
    super().__init__(config=config, validate=validate)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1345, in __init__
    config.validate(schema=self.config_schema)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1010, in validate
    validator.validate(self)
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 348, in validate
    raise error
jsonschema.exceptions.ValidationError: 'learned_params' is a required property

Failed validating 'required' in schema:
    {'$schema': 'http://json-schema.org/draft-07/schema#',
     'additionalProperties': True,
     'properties': {'SNP_min': {'description': 'minimum number of SNPs '
                                               'each sample should have',
                                'minimum': 1,
                                'type': 'integer'},
                    'batch_size': {'default': 1,
                                   'description': 'sample batch size to '
                                                  'train on',
                                   'minimum': 1,
                                   'type': 'integer'},
                    'cuda_device': {'default': None,
                                    'description': 'specifies the CUDA '
                                                   'device index to use',
                                    'oneOf': [{'minimum': 0,
                                               'type': 'integer'},
                                              {'type': 'null'}]},
                    'dataset_params': {'additionalProperties': True,
                                       'default': {'concat': True,
                                                   'ignore_missing': False},
                                       'description': 'options specific to '
                                                      'the dataset used '
                                                      'for training; e.g. '
                                                      'to apply '
                                                      'augmentations to '
                                                      'the dataset',
                                       'properties': {'concat': {'default': True,
                                                                 'description': 'when '
                                                                                'loading '
                                                                                'SNPs '
                                                                                'from '
                                                                                'a '
                                                                                'dataset, '
                                                                                'concatenate '
                                                                                'the '
                                                                                'positions '
                                                                                'array '
                                                                                'to '
                                                                                'the '
                                                                                'SNP '
                                                                                'matrix '
                                                                                'instead '
                                                                                'of '
                                                                                'multiplying '
                                                                                'by '
                                                                                'it',
                                                                 'type': 'boolean'},
                                                      'ignore_missing': {'default': False,
                                                                         'description': 'ignore '
                                                                                        'missing '
                                                                                        'scenarios '
                                                                                        'or '
                                                                                        'replicates '
                                                                                        'when '
                                                                                        'loading '
                                                                                        'data '
                                                                                        'samples; '
                                                                                        'in '
                                                                                        'the '
                                                                                        'case '
                                                                                        'of '
                                                                                        'missing '
                                                                                        'samples '
                                                                                        'the '
                                                                                        'next '
                                                                                        'one '
                                                                                        'is '
                                                                                        'tried '
                                                                                        'until '
                                                                                        'one '
                                                                                        'is '
                                                                                        'found',
                                                                         'type': 'boolean'},
                                                      'transforms': {'additionalProperties': True,
                                                                     'description': 'dictionary '
                                                                                    'of '
                                                                                    'transforms '
                                                                                    'to '
                                                                                    'apply '
                                                                                    'to '
                                                                                    'the '
                                                                                    'dataset; '
                                                                                    'all '
                                                                                    'optional '
                                                                                    'transforms '
                                                                                    'are '
                                                                                    'disabled '
                                                                                    'by '
                                                                                    'default '
                                                                                    'unless '
                                                                                    'specified '
                                                                                    'here; '
                                                                                    'some '
                                                                                    'transforms '
                                                                                    'may '
                                                                                    'take '
                                                                                    'one '
                                                                                    'or '
                                                                                    'more '
                                                                                    'parameters '
                                                                                    'specified '
                                                                                    'in '
                                                                                    'the '
                                                                                    'value '
                                                                                    'associated '
                                                                                    'with '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'name--if '
                                                                                    'the '
                                                                                    'transform '
                                                                                    'does '
                                                                                    'not '
                                                                                    'take '
                                                                                    'a '
                                                                                    'parameter '
                                                                                    'then '
                                                                                    'just '
                                                                                    'use '
                                                                                    'the '
                                                                                    'value '
                                                                                    'true '
                                                                                    'to '
                                                                                    'enable '
                                                                                    'it',
                                                                     'properties': {'rotate': {'default': False,
                                                                                               'description': 'apply '
                                                                                                              'a '
                                                                                                              'random '
                                                                                                              'rotation '
                                                                                                              'along '
                                                                                                              'the '
                                                                                                              'SNP '
                                                                                                              'axis '
                                                                                                              'of '
                                                                                                              'a '
                                                                                                              'sequence',
                                                                                               'type': 'boolean'},
                                                                                    'subsample': {'description': 'take '
                                                                                                                 'random '
                                                                                                                 'subsamples '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'SNP '
                                                                                                                 'matrix; '
                                                                                                                 'the '
                                                                                                                 'argument '
                                                                                                                 'is '
                                                                                                                 'a '
                                                                                                                 'pair '
                                                                                                                 '(min, '
                                                                                                                 'max) '
                                                                                                                 'of '
                                                                                                                 'integers '
                                                                                                                 'giving '
                                                                                                                 'the '
                                                                                                                 'range '
                                                                                                                 'for '
                                                                                                                 'random '
                                                                                                                 'sizes '
                                                                                                                 'of '
                                                                                                                 'the '
                                                                                                                 'subsamples, '
                                                                                                                 'or '
                                                                                                                 'a '
                                                                                                                 'single '
                                                                                                                 'integer '
                                                                                                                 'giving '
                                                                                                                 'a '
                                                                                                                 'fixed '
                                                                                                                 'size '
                                                                                                                 'for '
                                                                                                                 'the '
                                                                                                                 'subsamples',
                                                                                                  'oneOf': [{'items': {'minimum': 1,
                                                                                                                       'type': 'integer'},
                                                                                                             'maxItems': 2,
                                                                                                             'minItems': 2,
                                                                                                             'type': 'array'},
                                                                                                            {'minimum': 1,
                                                                                                             'type': 'integer'}]}},
                                                                     'type': 'object'}},
                                       'type': 'object'},
                    'evaluation_interval': {'default': 1,
                                            'description': 'interval in '
                                                           'the training '
                                                           'loop in which '
                                                           'to perform '
                                                           'model '
                                                           'validation',
                                            'minimum': 1,
                                            'type': 'integer'},
                    'learned_params': {'$ref': 'learned-params.yml',
                                       'description': 'configuration of '
                                                      'parameters to learn '
                                                      'in training'},
                    'learning_rate': {'default': 0.001,
                                      'description': 'the learning rate '
                                                     'for runs using this '
                                                     'configuration',
                                      'exclusiveMinimum': 0,
                                      'type': 'number'},
                    'loader_num_workers': {'default': 0,
                                           'description': 'number of '
                                                          'subprocesses to '
                                                          'use for data '
                                                          'loading',
                                           'minimum': 0,
                                           'type': 'integer'},
                    'maf': {'default': 0,
                            'description': 'minor allele frequency; used '
                                           'during pre-processing',
                            'minimum': 0,
                            'type': 'number'},
                    'model_root': {'default': '.',
                                   'description': 'root directory for all '
                                                  'training runs of this '
                                                  'model / training '
                                                  'configuration',
                                   'format': 'filename!',
                                   'type': 'string'},
                    'n_epochs': {'default': 1,
                                 'description': 'number of epochs over '
                                                'which to repeat the '
                                                'training process',
                                 'minimum': 1,
                                 'type': 'integer'},
                    'n_validation_scenarios': {'default': 1,
                                               'description': 'number of '
                                                              'scenarios '
                                                              'out of the '
                                                              'set of '
                                                              'usable '
                                                              'scenarios '
                                                              'to use for '
                                                              'validation '
                                                              'as opposed '
                                                              'to training',
                                               'minimum': 1,
                                               'type': 'integer'},
                    'net_params': {'additionalProperties': True,
                                   'default': {},
                                   'description': 'options specific to the '
                                                  'neural net model being '
                                                  'trained; these are '
                                                  'passed as keyword '
                                                  "arguments to the net's "
                                                  'constructor (see '
                                                  'dnadna.net module)',
                                   'type': 'object'},
                    'network_name': {'default': 'SPIDNA1',
                                     'description': 'name of the neural '
                                                    'net model to train',
                                     'minLength': 1,
                                     'type': 'string'},
                    'run_name_format': {'default': 'run_{run_id}',
                                        'description': 'format string for '
                                                       'the name given to '
                                                       'this run for a '
                                                       'sequence of runs '
                                                       'of the same model; '
                                                       'the outputs of '
                                                       'each run are '
                                                       'placed in '
                                                       'subdirectories of '
                                                       '<run_path>/<model_name> '
                                                       'with the name of '
                                                       'this run; the '
                                                       'format string can '
                                                       'use the template '
                                                       'variables '
                                                       'model_name and '
                                                       'run_id',
                                        'minLength': 4,
                                        'type': 'string'},
                    'scenario_params_path': {'description': 'path to the '
                                                            'scenario '
                                                            'parameters '
                                                            'file, either '
                                                            'absolute or '
                                                            'relative to '
                                                            'this file',
                                             'format': 'filename',
                                             'minLength': 1,
                                             'type': 'string'},
                    'seed': {'description': 'seed for initializing the '
                                            'PRNG prior to a training run '
                                            'for reproducible results; if '
                                            'unspecified the PRNG chooses '
                                            'its default seeding method',
                             'type': 'integer'},
                    'simulation': {'$ref': 'simulation.yml',
                                   'description': 'the simulation '
                                                  'configuration'},
                    'start_from_last_checkpoint': {'default': False,
                                                   'description': 'if '
                                                                  'true, '
                                                                  'resume '
                                                                  'training '
                                                                  'from a '
                                                                  'snapshot '
                                                                  'of the '
                                                                  'net '
                                                                  'that is '
                                                                  'saved '
                                                                  'each '
                                                                  'epoch',
                                                   'type': 'boolean'},
                    'transform_allel_min_major': {'default': False,
                                                  'type': 'boolean'},
                    'use_cuda': {'default': True,
                                 'description': 'use CUDA-capable GPU '
                                                'where available',
                                 'type': 'boolean'},
                    'weight_decay': {'description': 'the weight decay to '
                                                    'apply to the '
                                                    'training; if ommitted '
                                                    'or zero weight decay '
                                                    'is not applied',
                                     'minimum': 0,
                                     'type': 'number'}},
     'required': ['simulation', 'learned_params'],
     'type': 'object'}

On instance:
    Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}})

Here --debug doesn't help, and it is even weird that we still have in addition the error message that we have before (asking to use --debug).

Wrong format

When using dnadna init -t default at the learned_param line, there is: learned_param: {}, so one might think that we should add the parameter's name in there like this: learned_params: {selection}

This gives:

$ dnadna preprocess bactsel/bactsel_training_config.yml        
an unexpected error occurred: argument of type 'NoneType' is not iterable; run again with --debug to view the full traceback

and with --debug:

command:

dnadna preprocess bactsel/bactsel_training_config.yml --debug

an unexpected error occurred: argument of type 'NoneType' is not iterable; run again with --debug to view the full traceback
an unexpected error occurred: argument of type 'NoneType' is not iterable; run again with --debug to view the full traceback
Traceback (most recent call last):
  File "/home/jean/anaconda3/envs/dnadna/bin/dnadna", line 11, in <module>
    load_entry_point('dnadna', 'console_scripts', 'dnadna')()
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
    raise exc
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 790, in main
    ret2 = cls.run_subcommand(args)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 759, in run_subcommand
    return command_cls.main(command[1:], namespace=args)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
    raise exc
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 782, in main
    ret = cls.run(args)
  File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 653, in run
    args.config)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1357, in from_config_file
    return cls(config=config, validate=validate)
  File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 95, in __init__
    super().__init__(config=config, validate=validate)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1345, in __init__
    config.validate(schema=self.config_schema)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1010, in validate
    validator.validate(self)
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 347, in validate
    for error in self.iter_errors(*args, **kwargs):
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
    for error in errors:
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1159, in validate_config_properties
    schema))
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 286, in properties
    schema_path=property,
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
    for error in self.iter_errors(instance, schema):
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
    for error in errors:
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 263, in ref
    for error in validator.descend(instance, resolved):
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
    for error in self.iter_errors(instance, schema):
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
    for error in errors:
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 49, in additionalProperties
    for error in validator.descend(instance[extra], aP, path=extra):
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
    for error in self.iter_errors(instance, schema):
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
    for error in errors:
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 337, in oneOf
    errs = list(validator.descend(instance, subschema, schema_path=index))
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
    for error in self.iter_errors(instance, schema):
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
    for error in errors:
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1171, in validate_config_properties
    if 'default' in subschema and prop not in instance:
TypeError: argument of type 'NoneType' is not iterable

Not helping either, and same as above, previous error is repeated. I don't know how this error should be handled, but a first step could be to modify the default template.

Missing argument

Say now that we have:

learned_params: 
    selection:
        type: classification
        loss_func: Cross Entropy

command:

dnadna_run dnadna preprocess bactsel/bactsel_training_config.yml

an unexpected error occurred: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
    {'description': 'details about a single parameter to learn in a '
                    'training run',
     'oneOf': [{'additionalProperties': False,
                'properties': {'log_transform': {'default': False,
                                                 'description': 'whether '
                                                                'or not a '
                                                                'log '
                                                                'transform '
                                                                'should be '
                                                                'applied '
                                                                'to this '
                                                                "parameter's "
                                                                'known '
                                                                'values '
                                                                'during '
                                                                'pre-processing; '
                                                                'training '
                                                                'is then '
                                                                'performed '
                                                                'with the '
                                                                'log '
                                                                'values '
                                                                '(regression '
                                                                'parameters '
                                                                'only)',
                                                 'type': 'boolean'},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'MSE'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'tied_to_position': {'default': False,
                                                    'description': 'values '
                                                                   'of '
                                                                   'this '
                                                                   'parameter '
                                                                   'are '
                                                                   'SNP '
                                                                   'positions, '
                                                                   'so any '
                                                                   'transformations '
                                                                   'or '
                                                                   'normalizations '
                                                                   'of the '
                                                                   'position '
                                                                   'array '
                                                                   'must '
                                                                   'also '
                                                                   'be '
                                                                   'applied '
                                                                   'to '
                                                                   'this '
                                                                   'parameter '
                                                                   'during '
                                                                   'training',
                                                    'type': 'boolean'},
                               'type': {'const': 'regression'}}},
               {'additionalProperties': False,
                'properties': {'classes': {'description': 'classification '
                                                          'parameters '
                                                          'classes, either '
                                                          'an integer '
                                                          'giving the '
                                                          'number of '
                                                          'classes in the '
                                                          'parameter, or '
                                                          'an array to '
                                                          'give explicit '
                                                          'names to the '
                                                          'classes (one '
                                                          'item for each '
                                                          'class);  class '
                                                          'names can '
                                                          'themselves be '
                                                          'either strings, '
                                                          'or integers '
                                                          '(which are '
                                                          'converted '
                                                          'automatically '
                                                          'to strings, as '
                                                          'they are just '
                                                          'labels for the '
                                                          'classes)',
                                           'items': {'type': ['integer',
                                                              'string']},
                                           'minItems': 1,
                                           'minimum': 1,
                                           'type': ['integer', 'array']},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'Cross Entropy'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'n_classes': {'description': 'after '
                                                            'pre-processing, '
                                                            'this property '
                                                            'contains the '
                                                            'number of '
                                                            'classes in a '
                                                            'classification '
                                                            'parameter; if '
                                                            'the "classes" '
                                                            'property is '
                                                            'an integer '
                                                            'this is '
                                                            'identical; '
                                                            'otherwise it '
                                                            'is the length '
                                                            'of the '
                                                            '"classes" '
                                                            'array; '
                                                            'normally this '
                                                            'property '
                                                            'should not be '
                                                            'manually '
                                                            'specified',
                                             'minimum': 1,
                                             'type': 'integer'},
                               'type': {'const': 'classification'}},
                'required': ['classes']}],
     'properties': {'type': {'description': 'parameter type; either '
                                            '"regression" or '
                                            '"classification".  '
                                            'Classification parameters '
                                            'require the additional '
                                            '"classes" property',
                             'enum': ['regression', 'classification']}},
     'required': ['type'],
     'type': 'object'}

On instance['learned_params']['selection']:
    Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}); run again with --debug to view the full traceback

Here the short message isn't telling us that the argument classes is missing when we have a classification task. We can see it by reading the json-schema, but it is not obvious at first sight, especially given the text-wrapping which is not helping.

With debug: it doesn't help + duplicates of the schema and errors doesn't help either.

command:

dnadna preprocess bactsel/bactsel_training_config.yml --debug

an unexpected error occurred: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
    {'description': 'details about a single parameter to learn in a '
                    'training run',
     'oneOf': [{'additionalProperties': False,
                'properties': {'log_transform': {'default': False,
                                                 'description': 'whether '
                                                                'or not a '
                                                                'log '
                                                                'transform '
                                                                'should be '
                                                                'applied '
                                                                'to this '
                                                                "parameter's "
                                                                'known '
                                                                'values '
                                                                'during '
                                                                'pre-processing; '
                                                                'training '
                                                                'is then '
                                                                'performed '
                                                                'with the '
                                                                'log '
                                                                'values '
                                                                '(regression '
                                                                'parameters '
                                                                'only)',
                                                 'type': 'boolean'},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'MSE'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'tied_to_position': {'default': False,
                                                    'description': 'values '
                                                                   'of '
                                                                   'this '
                                                                   'parameter '
                                                                   'are '
                                                                   'SNP '
                                                                   'positions, '
                                                                   'so any '
                                                                   'transformations '
                                                                   'or '
                                                                   'normalizations '
                                                                   'of the '
                                                                   'position '
                                                                   'array '
                                                                   'must '
                                                                   'also '
                                                                   'be '
                                                                   'applied '
                                                                   'to '
                                                                   'this '
                                                                   'parameter '
                                                                   'during '
                                                                   'training',
                                                    'type': 'boolean'},
                               'type': {'const': 'regression'}}},
               {'additionalProperties': False,
                'properties': {'classes': {'description': 'classification '
                                                          'parameters '
                                                          'classes, either '
                                                          'an integer '
                                                          'giving the '
                                                          'number of '
                                                          'classes in the '
                                                          'parameter, or '
                                                          'an array to '
                                                          'give explicit '
                                                          'names to the '
                                                          'classes (one '
                                                          'item for each '
                                                          'class);  class '
                                                          'names can '
                                                          'themselves be '
                                                          'either strings, '
                                                          'or integers '
                                                          '(which are '
                                                          'converted '
                                                          'automatically '
                                                          'to strings, as '
                                                          'they are just '
                                                          'labels for the '
                                                          'classes)',
                                           'items': {'type': ['integer',
                                                              'string']},
                                           'minItems': 1,
                                           'minimum': 1,
                                           'type': ['integer', 'array']},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'Cross Entropy'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'n_classes': {'description': 'after '
                                                            'pre-processing, '
                                                            'this property '
                                                            'contains the '
                                                            'number of '
                                                            'classes in a '
                                                            'classification '
                                                            'parameter; if '
                                                            'the "classes" '
                                                            'property is '
                                                            'an integer '
                                                            'this is '
                                                            'identical; '
                                                            'otherwise it '
                                                            'is the length '
                                                            'of the '
                                                            '"classes" '
                                                            'array; '
                                                            'normally this '
                                                            'property '
                                                            'should not be '
                                                            'manually '
                                                            'specified',
                                             'minimum': 1,
                                             'type': 'integer'},
                               'type': {'const': 'classification'}},
                'required': ['classes']}],
     'properties': {'type': {'description': 'parameter type; either '
                                            '"regression" or '
                                            '"classification".  '
                                            'Classification parameters '
                                            'require the additional '
                                            '"classes" property',
                             'enum': ['regression', 'classification']}},
     'required': ['type'],
     'type': 'object'}

On instance['learned_params']['selection']:
    Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}); run again with --debug to view the full traceback
an unexpected error occurred: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
    {'description': 'details about a single parameter to learn in a '
                    'training run',
     'oneOf': [{'additionalProperties': False,
                'properties': {'log_transform': {'default': False,
                                                 'description': 'whether '
                                                                'or not a '
                                                                'log '
                                                                'transform '
                                                                'should be '
                                                                'applied '
                                                                'to this '
                                                                "parameter's "
                                                                'known '
                                                                'values '
                                                                'during '
                                                                'pre-processing; '
                                                                'training '
                                                                'is then '
                                                                'performed '
                                                                'with the '
                                                                'log '
                                                                'values '
                                                                '(regression '
                                                                'parameters '
                                                                'only)',
                                                 'type': 'boolean'},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'MSE'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'tied_to_position': {'default': False,
                                                    'description': 'values '
                                                                   'of '
                                                                   'this '
                                                                   'parameter '
                                                                   'are '
                                                                   'SNP '
                                                                   'positions, '
                                                                   'so any '
                                                                   'transformations '
                                                                   'or '
                                                                   'normalizations '
                                                                   'of the '
                                                                   'position '
                                                                   'array '
                                                                   'must '
                                                                   'also '
                                                                   'be '
                                                                   'applied '
                                                                   'to '
                                                                   'this '
                                                                   'parameter '
                                                                   'during '
                                                                   'training',
                                                    'type': 'boolean'},
                               'type': {'const': 'regression'}}},
               {'additionalProperties': False,
                'properties': {'classes': {'description': 'classification '
                                                          'parameters '
                                                          'classes, either '
                                                          'an integer '
                                                          'giving the '
                                                          'number of '
                                                          'classes in the '
                                                          'parameter, or '
                                                          'an array to '
                                                          'give explicit '
                                                          'names to the '
                                                          'classes (one '
                                                          'item for each '
                                                          'class);  class '
                                                          'names can '
                                                          'themselves be '
                                                          'either strings, '
                                                          'or integers '
                                                          '(which are '
                                                          'converted '
                                                          'automatically '
                                                          'to strings, as '
                                                          'they are just '
                                                          'labels for the '
                                                          'classes)',
                                           'items': {'type': ['integer',
                                                              'string']},
                                           'minItems': 1,
                                           'minimum': 1,
                                           'type': ['integer', 'array']},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'Cross Entropy'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'n_classes': {'description': 'after '
                                                            'pre-processing, '
                                                            'this property '
                                                            'contains the '
                                                            'number of '
                                                            'classes in a '
                                                            'classification '
                                                            'parameter; if '
                                                            'the "classes" '
                                                            'property is '
                                                            'an integer '
                                                            'this is '
                                                            'identical; '
                                                            'otherwise it '
                                                            'is the length '
                                                            'of the '
                                                            '"classes" '
                                                            'array; '
                                                            'normally this '
                                                            'property '
                                                            'should not be '
                                                            'manually '
                                                            'specified',
                                             'minimum': 1,
                                             'type': 'integer'},
                               'type': {'const': 'classification'}},
                'required': ['classes']}],
     'properties': {'type': {'description': 'parameter type; either '
                                            '"regression" or '
                                            '"classification".  '
                                            'Classification parameters '
                                            'require the additional '
                                            '"classes" property',
                             'enum': ['regression', 'classification']}},
     'required': ['type'],
     'type': 'object'}

On instance['learned_params']['selection']:
    Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}); run again with --debug to view the full traceback
Traceback (most recent call last):
  File "/home/jean/anaconda3/envs/dnadna/bin/dnadna", line 11, in <module>
    load_entry_point('dnadna', 'console_scripts', 'dnadna')()
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
    raise exc
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 790, in main
    ret2 = cls.run_subcommand(args)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 759, in run_subcommand
    return command_cls.main(command[1:], namespace=args)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
    raise exc
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 782, in main
    ret = cls.run(args)
  File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 653, in run
    args.config)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1357, in from_config_file
    return cls(config=config, validate=validate)
  File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 95, in __init__
    super().__init__(config=config, validate=validate)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1345, in __init__
    config.validate(schema=self.config_schema)
  File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1010, in validate
    validator.validate(self)
  File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 348, in validate
    raise error
jsonschema.exceptions.ValidationError: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
    {'description': 'details about a single parameter to learn in a '
                    'training run',
     'oneOf': [{'additionalProperties': False,
                'properties': {'log_transform': {'default': False,
                                                 'description': 'whether '
                                                                'or not a '
                                                                'log '
                                                                'transform '
                                                                'should be '
                                                                'applied '
                                                                'to this '
                                                                "parameter's "
                                                                'known '
                                                                'values '
                                                                'during '
                                                                'pre-processing; '
                                                                'training '
                                                                'is then '
                                                                'performed '
                                                                'with the '
                                                                'log '
                                                                'values '
                                                                '(regression '
                                                                'parameters '
                                                                'only)',
                                                 'type': 'boolean'},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'MSE'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'tied_to_position': {'default': False,
                                                    'description': 'values '
                                                                   'of '
                                                                   'this '
                                                                   'parameter '
                                                                   'are '
                                                                   'SNP '
                                                                   'positions, '
                                                                   'so any '
                                                                   'transformations '
                                                                   'or '
                                                                   'normalizations '
                                                                   'of the '
                                                                   'position '
                                                                   'array '
                                                                   'must '
                                                                   'also '
                                                                   'be '
                                                                   'applied '
                                                                   'to '
                                                                   'this '
                                                                   'parameter '
                                                                   'during '
                                                                   'training',
                                                    'type': 'boolean'},
                               'type': {'const': 'regression'}}},
               {'additionalProperties': False,
                'properties': {'classes': {'description': 'classification '
                                                          'parameters '
                                                          'classes, either '
                                                          'an integer '
                                                          'giving the '
                                                          'number of '
                                                          'classes in the '
                                                          'parameter, or '
                                                          'an array to '
                                                          'give explicit '
                                                          'names to the '
                                                          'classes (one '
                                                          'item for each '
                                                          'class);  class '
                                                          'names can '
                                                          'themselves be '
                                                          'either strings, '
                                                          'or integers '
                                                          '(which are '
                                                          'converted '
                                                          'automatically '
                                                          'to strings, as '
                                                          'they are just '
                                                          'labels for the '
                                                          'classes)',
                                           'items': {'type': ['integer',
                                                              'string']},
                                           'minItems': 1,
                                           'minimum': 1,
                                           'type': ['integer', 'array']},
                               'loss_func': {'$ref': '#/definitions/loss_func',
                                             'default': 'Cross Entropy'},
                               'loss_weight': {'$ref': '#/definitions/loss_weight',
                                               'default': 1},
                               'n_classes': {'description': 'after '
                                                            'pre-processing, '
                                                            'this property '
                                                            'contains the '
                                                            'number of '
                                                            'classes in a '
                                                            'classification '
                                                            'parameter; if '
                                                            'the "classes" '
                                                            'property is '
                                                            'an integer '
                                                            'this is '
                                                            'identical; '
                                                            'otherwise it '
                                                            'is the length '
                                                            'of the '
                                                            '"classes" '
                                                            'array; '
                                                            'normally this '
                                                            'property '
                                                            'should not be '
                                                            'manually '
                                                            'specified',
                                             'minimum': 1,
                                             'type': 'integer'},
                               'type': {'const': 'classification'}},
                'required': ['classes']}],
     'properties': {'type': {'description': 'parameter type; either '
                                            '"regression" or '
                                            '"classification".  '
                                            'Classification parameters '
                                            'require the additional '
                                            '"classes" property',
                             'enum': ['regression', 'classification']}},
     'required': ['type'],
     'type': 'object'}

On instance['learned_params']['selection']:
    Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1})

Summary:

Better or clearer error messages when possible. If not, we could invite the user to (re)read carefully the json-schema.
maybe output a link to the json-schema instead of showing it entirely here (possibly output it in the terminal only with debug)
Fix the repetition of error message when using debug

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Improve error message when errors in config file

Missing property:

Wrong format

Missing argument

Summary: