Improve error message when errors in config file
The json-schemas are quite powerful yet complex objects, and I think not everybody will take the time to read the schema properly, so if we could have easy to understand error messages it would be nice.
I'll show here 3 types of scenarios:
Missing property:
-> In the training config file, the parameter learned_param is missing>
command: (click to unroll the output)
$ dnadna preprocess bactsel/bactsel_training_config.yml
$ dnadna preprocess bactsel/bactsel_training_config.yml
an unexpected error occurred: 'learned_params' is a required property
Failed validating 'required' in schema:
{'$schema': 'http://json-schema.org/draft-07/schema#',
'additionalProperties': True,
'properties': {'SNP_min': {'description': 'minimum number of SNPs '
'each sample should have',
'minimum': 1,
'type': 'integer'},
'batch_size': {'default': 1,
'description': 'sample batch size to '
'train on',
'minimum': 1,
'type': 'integer'},
'cuda_device': {'default': None,
'description': 'specifies the CUDA '
'device index to use',
'oneOf': [{'minimum': 0,
'type': 'integer'},
{'type': 'null'}]},
'dataset_params': {'additionalProperties': True,
'default': {'concat': True,
'ignore_missing': False},
'description': 'options specific to '
'the dataset used '
'for training; e.g. '
'to apply '
'augmentations to '
'the dataset',
'properties': {'concat': {'default': True,
'description': 'when '
'loading '
'SNPs '
'from '
'a '
'dataset, '
'concatenate '
'the '
'positions '
'array '
'to '
'the '
'SNP '
'matrix '
'instead '
'of '
'multiplying '
'by '
'it',
'type': 'boolean'},
'ignore_missing': {'default': False,
'description': 'ignore '
'missing '
'scenarios '
'or '
'replicates '
'when '
'loading '
'data '
'samples; '
'in '
'the '
'case '
'of '
'missing '
'samples '
'the '
'next '
'one '
'is '
'tried '
'until '
'one '
'is '
'found',
'type': 'boolean'},
'transforms': {'additionalProperties': True,
'description': 'dictionary '
'of '
'transforms '
'to '
'apply '
'to '
'the '
'dataset; '
'all '
'optional '
'transforms '
'are '
'disabled '
'by '
'default '
'unless '
'specified '
'here; '
'some '
'transforms '
'may '
'take '
'one '
'or '
'more '
'parameters '
'specified '
'in '
'the '
'value '
'associated '
'with '
'the '
'transform '
'name--if '
'the '
'transform '
'does '
'not '
'take '
'a '
'parameter '
'then '
'just '
'use '
'the '
'value '
'true '
'to '
'enable '
'it',
'properties': {'rotate': {'default': False,
'description': 'apply '
'a '
'random '
'rotation '
'along '
'the '
'SNP '
'axis '
'of '
'a '
'sequence',
'type': 'boolean'},
'subsample': {'description': 'take '
'random '
'subsamples '
'of '
'the '
'SNP '
'matrix; '
'the '
'argument '
'is '
'a '
'pair '
'(min, '
'max) '
'of '
'integers '
'giving '
'the '
'range '
'for '
'random '
'sizes '
'of '
'the '
'subsamples, '
'or '
'a '
'single '
'integer '
'giving '
'a '
'fixed '
'size '
'for '
'the '
'subsamples',
'oneOf': [{'items': {'minimum': 1,
'type': 'integer'},
'maxItems': 2,
'minItems': 2,
'type': 'array'},
{'minimum': 1,
'type': 'integer'}]}},
'type': 'object'}},
'type': 'object'},
'evaluation_interval': {'default': 1,
'description': 'interval in '
'the training '
'loop in which '
'to perform '
'model '
'validation',
'minimum': 1,
'type': 'integer'},
'learned_params': {'$ref': 'learned-params.yml',
'description': 'configuration of '
'parameters to learn '
'in training'},
'learning_rate': {'default': 0.001,
'description': 'the learning rate '
'for runs using this '
'configuration',
'exclusiveMinimum': 0,
'type': 'number'},
'loader_num_workers': {'default': 0,
'description': 'number of '
'subprocesses to '
'use for data '
'loading',
'minimum': 0,
'type': 'integer'},
'maf': {'default': 0,
'description': 'minor allele frequency; used '
'during pre-processing',
'minimum': 0,
'type': 'number'},
'model_root': {'default': '.',
'description': 'root directory for all '
'training runs of this '
'model / training '
'configuration',
'format': 'filename!',
'type': 'string'},
'n_epochs': {'default': 1,
'description': 'number of epochs over '
'which to repeat the '
'training process',
'minimum': 1,
'type': 'integer'},
'n_validation_scenarios': {'default': 1,
'description': 'number of '
'scenarios '
'out of the '
'set of '
'usable '
'scenarios '
'to use for '
'validation '
'as opposed '
'to training',
'minimum': 1,
'type': 'integer'},
'net_params': {'additionalProperties': True,
'default': {},
'description': 'options specific to the '
'neural net model being '
'trained; these are '
'passed as keyword '
"arguments to the net's "
'constructor (see '
'dnadna.net module)',
'type': 'object'},
'network_name': {'default': 'SPIDNA1',
'description': 'name of the neural '
'net model to train',
'minLength': 1,
'type': 'string'},
'run_name_format': {'default': 'run_{run_id}',
'description': 'format string for '
'the name given to '
'this run for a '
'sequence of runs '
'of the same model; '
'the outputs of '
'each run are '
'placed in '
'subdirectories of '
'<run_path>/<model_name> '
'with the name of '
'this run; the '
'format string can '
'use the template '
'variables '
'model_name and '
'run_id',
'minLength': 4,
'type': 'string'},
'scenario_params_path': {'description': 'path to the '
'scenario '
'parameters '
'file, either '
'absolute or '
'relative to '
'this file',
'format': 'filename',
'minLength': 1,
'type': 'string'},
'seed': {'description': 'seed for initializing the '
'PRNG prior to a training run '
'for reproducible results; if '
'unspecified the PRNG chooses '
'its default seeding method',
'type': 'integer'},
'simulation': {'$ref': 'simulation.yml',
'description': 'the simulation '
'configuration'},
'start_from_last_checkpoint': {'default': False,
'description': 'if '
'true, '
'resume '
'training '
'from a '
'snapshot '
'of the '
'net '
'that is '
'saved '
'each '
'epoch',
'type': 'boolean'},
'transform_allel_min_major': {'default': False,
'type': 'boolean'},
'use_cuda': {'default': True,
'description': 'use CUDA-capable GPU '
'where available',
'type': 'boolean'},
'weight_decay': {'description': 'the weight decay to '
'apply to the '
'training; if ommitted '
'or zero weight decay '
'is not applied',
'minimum': 0,
'type': 'number'}},
'required': ['simulation', 'learned_params'],
'type': 'object'}
On instance:
Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}}); run again with --debug to view the full traceback
Given that in the command line, the first thing we see it the bottom, nothing seems to help us there unless we go all the way up to see that an unexpected error occurred: 'learned_params' is a required property
, which would have been enough.
Adding --debug
gives:
command:
$ dnadna preprocess bactsel/bactsel_training_config.yml --debug
$ dnadna preprocess bactsel/bactsel_training_config.yml --debug
an unexpected error occurred: 'learned_params' is a required property
Failed validating 'required' in schema:
{'$schema': 'http://json-schema.org/draft-07/schema#',
'additionalProperties': True,
'properties': {'SNP_min': {'description': 'minimum number of SNPs '
'each sample should have',
'minimum': 1,
'type': 'integer'},
'batch_size': {'default': 1,
'description': 'sample batch size to '
'train on',
'minimum': 1,
'type': 'integer'},
'cuda_device': {'default': None,
'description': 'specifies the CUDA '
'device index to use',
'oneOf': [{'minimum': 0,
'type': 'integer'},
{'type': 'null'}]},
'dataset_params': {'additionalProperties': True,
'default': {'concat': True,
'ignore_missing': False},
'description': 'options specific to '
'the dataset used '
'for training; e.g. '
'to apply '
'augmentations to '
'the dataset',
'properties': {'concat': {'default': True,
'description': 'when '
'loading '
'SNPs '
'from '
'a '
'dataset, '
'concatenate '
'the '
'positions '
'array '
'to '
'the '
'SNP '
'matrix '
'instead '
'of '
'multiplying '
'by '
'it',
'type': 'boolean'},
'ignore_missing': {'default': False,
'description': 'ignore '
'missing '
'scenarios '
'or '
'replicates '
'when '
'loading '
'data '
'samples; '
'in '
'the '
'case '
'of '
'missing '
'samples '
'the '
'next '
'one '
'is '
'tried '
'until '
'one '
'is '
'found',
'type': 'boolean'},
'transforms': {'additionalProperties': True,
'description': 'dictionary '
'of '
'transforms '
'to '
'apply '
'to '
'the '
'dataset; '
'all '
'optional '
'transforms '
'are '
'disabled '
'by '
'default '
'unless '
'specified '
'here; '
'some '
'transforms '
'may '
'take '
'one '
'or '
'more '
'parameters '
'specified '
'in '
'the '
'value '
'associated '
'with '
'the '
'transform '
'name--if '
'the '
'transform '
'does '
'not '
'take '
'a '
'parameter '
'then '
'just '
'use '
'the '
'value '
'true '
'to '
'enable '
'it',
'properties': {'rotate': {'default': False,
'description': 'apply '
'a '
'random '
'rotation '
'along '
'the '
'SNP '
'axis '
'of '
'a '
'sequence',
'type': 'boolean'},
'subsample': {'description': 'take '
'random '
'subsamples '
'of '
'the '
'SNP '
'matrix; '
'the '
'argument '
'is '
'a '
'pair '
'(min, '
'max) '
'of '
'integers '
'giving '
'the '
'range '
'for '
'random '
'sizes '
'of '
'the '
'subsamples, '
'or '
'a '
'single '
'integer '
'giving '
'a '
'fixed '
'size '
'for '
'the '
'subsamples',
'oneOf': [{'items': {'minimum': 1,
'type': 'integer'},
'maxItems': 2,
'minItems': 2,
'type': 'array'},
{'minimum': 1,
'type': 'integer'}]}},
'type': 'object'}},
'type': 'object'},
'evaluation_interval': {'default': 1,
'description': 'interval in '
'the training '
'loop in which '
'to perform '
'model '
'validation',
'minimum': 1,
'type': 'integer'},
'learned_params': {'$ref': 'learned-params.yml',
'description': 'configuration of '
'parameters to learn '
'in training'},
'learning_rate': {'default': 0.001,
'description': 'the learning rate '
'for runs using this '
'configuration',
'exclusiveMinimum': 0,
'type': 'number'},
'loader_num_workers': {'default': 0,
'description': 'number of '
'subprocesses to '
'use for data '
'loading',
'minimum': 0,
'type': 'integer'},
'maf': {'default': 0,
'description': 'minor allele frequency; used '
'during pre-processing',
'minimum': 0,
'type': 'number'},
'model_root': {'default': '.',
'description': 'root directory for all '
'training runs of this '
'model / training '
'configuration',
'format': 'filename!',
'type': 'string'},
'n_epochs': {'default': 1,
'description': 'number of epochs over '
'which to repeat the '
'training process',
'minimum': 1,
'type': 'integer'},
'n_validation_scenarios': {'default': 1,
'description': 'number of '
'scenarios '
'out of the '
'set of '
'usable '
'scenarios '
'to use for '
'validation '
'as opposed '
'to training',
'minimum': 1,
'type': 'integer'},
'net_params': {'additionalProperties': True,
'default': {},
'description': 'options specific to the '
'neural net model being '
'trained; these are '
'passed as keyword '
"arguments to the net's "
'constructor (see '
'dnadna.net module)',
'type': 'object'},
'network_name': {'default': 'SPIDNA1',
'description': 'name of the neural '
'net model to train',
'minLength': 1,
'type': 'string'},
'run_name_format': {'default': 'run_{run_id}',
'description': 'format string for '
'the name given to '
'this run for a '
'sequence of runs '
'of the same model; '
'the outputs of '
'each run are '
'placed in '
'subdirectories of '
'<run_path>/<model_name> '
'with the name of '
'this run; the '
'format string can '
'use the template '
'variables '
'model_name and '
'run_id',
'minLength': 4,
'type': 'string'},
'scenario_params_path': {'description': 'path to the '
'scenario '
'parameters '
'file, either '
'absolute or '
'relative to '
'this file',
'format': 'filename',
'minLength': 1,
'type': 'string'},
'seed': {'description': 'seed for initializing the '
'PRNG prior to a training run '
'for reproducible results; if '
'unspecified the PRNG chooses '
'its default seeding method',
'type': 'integer'},
'simulation': {'$ref': 'simulation.yml',
'description': 'the simulation '
'configuration'},
'start_from_last_checkpoint': {'default': False,
'description': 'if '
'true, '
'resume '
'training '
'from a '
'snapshot '
'of the '
'net '
'that is '
'saved '
'each '
'epoch',
'type': 'boolean'},
'transform_allel_min_major': {'default': False,
'type': 'boolean'},
'use_cuda': {'default': True,
'description': 'use CUDA-capable GPU '
'where available',
'type': 'boolean'},
'weight_decay': {'description': 'the weight decay to '
'apply to the '
'training; if ommitted '
'or zero weight decay '
'is not applied',
'minimum': 0,
'type': 'number'}},
'required': ['simulation', 'learned_params'],
'type': 'object'}
On instance:
Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}}); run again with --debug to view the full traceback
an unexpected error occurred: 'learned_params' is a required property
Failed validating 'required' in schema:
{'$schema': 'http://json-schema.org/draft-07/schema#',
'additionalProperties': True,
'properties': {'SNP_min': {'description': 'minimum number of SNPs '
'each sample should have',
'minimum': 1,
'type': 'integer'},
'batch_size': {'default': 1,
'description': 'sample batch size to '
'train on',
'minimum': 1,
'type': 'integer'},
'cuda_device': {'default': None,
'description': 'specifies the CUDA '
'device index to use',
'oneOf': [{'minimum': 0,
'type': 'integer'},
{'type': 'null'}]},
'dataset_params': {'additionalProperties': True,
'default': {'concat': True,
'ignore_missing': False},
'description': 'options specific to '
'the dataset used '
'for training; e.g. '
'to apply '
'augmentations to '
'the dataset',
'properties': {'concat': {'default': True,
'description': 'when '
'loading '
'SNPs '
'from '
'a '
'dataset, '
'concatenate '
'the '
'positions '
'array '
'to '
'the '
'SNP '
'matrix '
'instead '
'of '
'multiplying '
'by '
'it',
'type': 'boolean'},
'ignore_missing': {'default': False,
'description': 'ignore '
'missing '
'scenarios '
'or '
'replicates '
'when '
'loading '
'data '
'samples; '
'in '
'the '
'case '
'of '
'missing '
'samples '
'the '
'next '
'one '
'is '
'tried '
'until '
'one '
'is '
'found',
'type': 'boolean'},
'transforms': {'additionalProperties': True,
'description': 'dictionary '
'of '
'transforms '
'to '
'apply '
'to '
'the '
'dataset; '
'all '
'optional '
'transforms '
'are '
'disabled '
'by '
'default '
'unless '
'specified '
'here; '
'some '
'transforms '
'may '
'take '
'one '
'or '
'more '
'parameters '
'specified '
'in '
'the '
'value '
'associated '
'with '
'the '
'transform '
'name--if '
'the '
'transform '
'does '
'not '
'take '
'a '
'parameter '
'then '
'just '
'use '
'the '
'value '
'true '
'to '
'enable '
'it',
'properties': {'rotate': {'default': False,
'description': 'apply '
'a '
'random '
'rotation '
'along '
'the '
'SNP '
'axis '
'of '
'a '
'sequence',
'type': 'boolean'},
'subsample': {'description': 'take '
'random '
'subsamples '
'of '
'the '
'SNP '
'matrix; '
'the '
'argument '
'is '
'a '
'pair '
'(min, '
'max) '
'of '
'integers '
'giving '
'the '
'range '
'for '
'random '
'sizes '
'of '
'the '
'subsamples, '
'or '
'a '
'single '
'integer '
'giving '
'a '
'fixed '
'size '
'for '
'the '
'subsamples',
'oneOf': [{'items': {'minimum': 1,
'type': 'integer'},
'maxItems': 2,
'minItems': 2,
'type': 'array'},
{'minimum': 1,
'type': 'integer'}]}},
'type': 'object'}},
'type': 'object'},
'evaluation_interval': {'default': 1,
'description': 'interval in '
'the training '
'loop in which '
'to perform '
'model '
'validation',
'minimum': 1,
'type': 'integer'},
'learned_params': {'$ref': 'learned-params.yml',
'description': 'configuration of '
'parameters to learn '
'in training'},
'learning_rate': {'default': 0.001,
'description': 'the learning rate '
'for runs using this '
'configuration',
'exclusiveMinimum': 0,
'type': 'number'},
'loader_num_workers': {'default': 0,
'description': 'number of '
'subprocesses to '
'use for data '
'loading',
'minimum': 0,
'type': 'integer'},
'maf': {'default': 0,
'description': 'minor allele frequency; used '
'during pre-processing',
'minimum': 0,
'type': 'number'},
'model_root': {'default': '.',
'description': 'root directory for all '
'training runs of this '
'model / training '
'configuration',
'format': 'filename!',
'type': 'string'},
'n_epochs': {'default': 1,
'description': 'number of epochs over '
'which to repeat the '
'training process',
'minimum': 1,
'type': 'integer'},
'n_validation_scenarios': {'default': 1,
'description': 'number of '
'scenarios '
'out of the '
'set of '
'usable '
'scenarios '
'to use for '
'validation '
'as opposed '
'to training',
'minimum': 1,
'type': 'integer'},
'net_params': {'additionalProperties': True,
'default': {},
'description': 'options specific to the '
'neural net model being '
'trained; these are '
'passed as keyword '
"arguments to the net's "
'constructor (see '
'dnadna.net module)',
'type': 'object'},
'network_name': {'default': 'SPIDNA1',
'description': 'name of the neural '
'net model to train',
'minLength': 1,
'type': 'string'},
'run_name_format': {'default': 'run_{run_id}',
'description': 'format string for '
'the name given to '
'this run for a '
'sequence of runs '
'of the same model; '
'the outputs of '
'each run are '
'placed in '
'subdirectories of '
'<run_path>/<model_name> '
'with the name of '
'this run; the '
'format string can '
'use the template '
'variables '
'model_name and '
'run_id',
'minLength': 4,
'type': 'string'},
'scenario_params_path': {'description': 'path to the '
'scenario '
'parameters '
'file, either '
'absolute or '
'relative to '
'this file',
'format': 'filename',
'minLength': 1,
'type': 'string'},
'seed': {'description': 'seed for initializing the '
'PRNG prior to a training run '
'for reproducible results; if '
'unspecified the PRNG chooses '
'its default seeding method',
'type': 'integer'},
'simulation': {'$ref': 'simulation.yml',
'description': 'the simulation '
'configuration'},
'start_from_last_checkpoint': {'default': False,
'description': 'if '
'true, '
'resume '
'training '
'from a '
'snapshot '
'of the '
'net '
'that is '
'saved '
'each '
'epoch',
'type': 'boolean'},
'transform_allel_min_major': {'default': False,
'type': 'boolean'},
'use_cuda': {'default': True,
'description': 'use CUDA-capable GPU '
'where available',
'type': 'boolean'},
'weight_decay': {'description': 'the weight decay to '
'apply to the '
'training; if ommitted '
'or zero weight decay '
'is not applied',
'minimum': 0,
'type': 'number'}},
'required': ['simulation', 'learned_params'],
'type': 'object'}
On instance:
Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}}); run again with --debug to view the full traceback
Traceback (most recent call last):
File "/home/jean/anaconda3/envs/dnadna/bin/dnadna", line 11, in <module>
load_entry_point('dnadna', 'console_scripts', 'dnadna')()
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
raise exc
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 790, in main
ret2 = cls.run_subcommand(args)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 759, in run_subcommand
return command_cls.main(command[1:], namespace=args)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
raise exc
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 782, in main
ret = cls.run(args)
File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 653, in run
args.config)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1357, in from_config_file
return cls(config=config, validate=validate)
File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 95, in __init__
super().__init__(config=config, validate=validate)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1345, in __init__
config.validate(schema=self.config_schema)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1010, in validate
validator.validate(self)
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 348, in validate
raise error
jsonschema.exceptions.ValidationError: 'learned_params' is a required property
Failed validating 'required' in schema:
{'$schema': 'http://json-schema.org/draft-07/schema#',
'additionalProperties': True,
'properties': {'SNP_min': {'description': 'minimum number of SNPs '
'each sample should have',
'minimum': 1,
'type': 'integer'},
'batch_size': {'default': 1,
'description': 'sample batch size to '
'train on',
'minimum': 1,
'type': 'integer'},
'cuda_device': {'default': None,
'description': 'specifies the CUDA '
'device index to use',
'oneOf': [{'minimum': 0,
'type': 'integer'},
{'type': 'null'}]},
'dataset_params': {'additionalProperties': True,
'default': {'concat': True,
'ignore_missing': False},
'description': 'options specific to '
'the dataset used '
'for training; e.g. '
'to apply '
'augmentations to '
'the dataset',
'properties': {'concat': {'default': True,
'description': 'when '
'loading '
'SNPs '
'from '
'a '
'dataset, '
'concatenate '
'the '
'positions '
'array '
'to '
'the '
'SNP '
'matrix '
'instead '
'of '
'multiplying '
'by '
'it',
'type': 'boolean'},
'ignore_missing': {'default': False,
'description': 'ignore '
'missing '
'scenarios '
'or '
'replicates '
'when '
'loading '
'data '
'samples; '
'in '
'the '
'case '
'of '
'missing '
'samples '
'the '
'next '
'one '
'is '
'tried '
'until '
'one '
'is '
'found',
'type': 'boolean'},
'transforms': {'additionalProperties': True,
'description': 'dictionary '
'of '
'transforms '
'to '
'apply '
'to '
'the '
'dataset; '
'all '
'optional '
'transforms '
'are '
'disabled '
'by '
'default '
'unless '
'specified '
'here; '
'some '
'transforms '
'may '
'take '
'one '
'or '
'more '
'parameters '
'specified '
'in '
'the '
'value '
'associated '
'with '
'the '
'transform '
'name--if '
'the '
'transform '
'does '
'not '
'take '
'a '
'parameter '
'then '
'just '
'use '
'the '
'value '
'true '
'to '
'enable '
'it',
'properties': {'rotate': {'default': False,
'description': 'apply '
'a '
'random '
'rotation '
'along '
'the '
'SNP '
'axis '
'of '
'a '
'sequence',
'type': 'boolean'},
'subsample': {'description': 'take '
'random '
'subsamples '
'of '
'the '
'SNP '
'matrix; '
'the '
'argument '
'is '
'a '
'pair '
'(min, '
'max) '
'of '
'integers '
'giving '
'the '
'range '
'for '
'random '
'sizes '
'of '
'the '
'subsamples, '
'or '
'a '
'single '
'integer '
'giving '
'a '
'fixed '
'size '
'for '
'the '
'subsamples',
'oneOf': [{'items': {'minimum': 1,
'type': 'integer'},
'maxItems': 2,
'minItems': 2,
'type': 'array'},
{'minimum': 1,
'type': 'integer'}]}},
'type': 'object'}},
'type': 'object'},
'evaluation_interval': {'default': 1,
'description': 'interval in '
'the training '
'loop in which '
'to perform '
'model '
'validation',
'minimum': 1,
'type': 'integer'},
'learned_params': {'$ref': 'learned-params.yml',
'description': 'configuration of '
'parameters to learn '
'in training'},
'learning_rate': {'default': 0.001,
'description': 'the learning rate '
'for runs using this '
'configuration',
'exclusiveMinimum': 0,
'type': 'number'},
'loader_num_workers': {'default': 0,
'description': 'number of '
'subprocesses to '
'use for data '
'loading',
'minimum': 0,
'type': 'integer'},
'maf': {'default': 0,
'description': 'minor allele frequency; used '
'during pre-processing',
'minimum': 0,
'type': 'number'},
'model_root': {'default': '.',
'description': 'root directory for all '
'training runs of this '
'model / training '
'configuration',
'format': 'filename!',
'type': 'string'},
'n_epochs': {'default': 1,
'description': 'number of epochs over '
'which to repeat the '
'training process',
'minimum': 1,
'type': 'integer'},
'n_validation_scenarios': {'default': 1,
'description': 'number of '
'scenarios '
'out of the '
'set of '
'usable '
'scenarios '
'to use for '
'validation '
'as opposed '
'to training',
'minimum': 1,
'type': 'integer'},
'net_params': {'additionalProperties': True,
'default': {},
'description': 'options specific to the '
'neural net model being '
'trained; these are '
'passed as keyword '
"arguments to the net's "
'constructor (see '
'dnadna.net module)',
'type': 'object'},
'network_name': {'default': 'SPIDNA1',
'description': 'name of the neural '
'net model to train',
'minLength': 1,
'type': 'string'},
'run_name_format': {'default': 'run_{run_id}',
'description': 'format string for '
'the name given to '
'this run for a '
'sequence of runs '
'of the same model; '
'the outputs of '
'each run are '
'placed in '
'subdirectories of '
'<run_path>/<model_name> '
'with the name of '
'this run; the '
'format string can '
'use the template '
'variables '
'model_name and '
'run_id',
'minLength': 4,
'type': 'string'},
'scenario_params_path': {'description': 'path to the '
'scenario '
'parameters '
'file, either '
'absolute or '
'relative to '
'this file',
'format': 'filename',
'minLength': 1,
'type': 'string'},
'seed': {'description': 'seed for initializing the '
'PRNG prior to a training run '
'for reproducible results; if '
'unspecified the PRNG chooses '
'its default seeding method',
'type': 'integer'},
'simulation': {'$ref': 'simulation.yml',
'description': 'the simulation '
'configuration'},
'start_from_last_checkpoint': {'default': False,
'description': 'if '
'true, '
'resume '
'training '
'from a '
'snapshot '
'of the '
'net '
'that is '
'saved '
'each '
'epoch',
'type': 'boolean'},
'transform_allel_min_major': {'default': False,
'type': 'boolean'},
'use_cuda': {'default': True,
'description': 'use CUDA-capable GPU '
'where available',
'type': 'boolean'},
'weight_decay': {'description': 'the weight decay to '
'apply to the '
'training; if ommitted '
'or zero weight decay '
'is not applied',
'minimum': 0,
'type': 'number'}},
'required': ['simulation', 'learned_params'],
'type': 'object'}
On instance:
Config({'model_root': '/home/jean/Documents/ML_genetics/dnadna_run/bactsel', 'simulation': {'data_root': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2', 'n_scenarios': 27, 'n_replicates': 10, 'model_name': 'bactsel', 'scenario_params_path': '/home/jean/Documents/ML_genetics/dnadna_run/TestDATA2/BacterialDemoSelection_paramok', 'data_source': {'format': 'dnadna', 'filename_format': 'scenario_{scenario:05}/BacterialDemoSelection_{scenario:05}_{replicate:03}.npz'}, 'summary_statistics': {'filename_format': 'sumstats/scenario_{scenario}/{model_name}_{scenario}_{type}.csv', 'chromosome_size': 2000000.0, 'ld_options': {'circular': False, 'distance_bins': 19}, 'sfs_options': {'folded': False}, 'sel_options': {'window': 100}}, 'n_samples': 600, 'segment_length': 2000000.0, 'seed': 2}, 'network_name': 'CNN3', 'n_validation_scenarios': 7, 'n_epochs': 1, 'batch_size': 10, 'learning_rate': 0.001, 'evaluation_interval': 50, 'SNP_min': 400, 'run_name_format': 'run_{run_id}', 'use_cuda': True, 'cuda_device': None, 'loader_num_workers': 2, 'seed': 0, 'maf': 0, 'transform_allel_min_major': False, 'dataset_params': {'concat': True, 'ignore_missing': False}, 'start_from_last_checkpoint': False, 'net_params': {}})
Here --debug
doesn't help, and it is even weird that we still have in addition the error message that we have before (asking to use --debug).
Wrong format
When using dnadna init -t default
at the learned_param line, there is: learned_param: {}
, so one might think that we should add the parameter's name in there like this: learned_params: {selection}
This gives:
$ dnadna preprocess bactsel/bactsel_training_config.yml
an unexpected error occurred: argument of type 'NoneType' is not iterable; run again with --debug to view the full traceback
and with --debug
:
command:
dnadna preprocess bactsel/bactsel_training_config.yml --debug
dnadna preprocess bactsel/bactsel_training_config.yml --debug
an unexpected error occurred: argument of type 'NoneType' is not iterable; run again with --debug to view the full traceback
an unexpected error occurred: argument of type 'NoneType' is not iterable; run again with --debug to view the full traceback
Traceback (most recent call last):
File "/home/jean/anaconda3/envs/dnadna/bin/dnadna", line 11, in <module>
load_entry_point('dnadna', 'console_scripts', 'dnadna')()
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
raise exc
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 790, in main
ret2 = cls.run_subcommand(args)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 759, in run_subcommand
return command_cls.main(command[1:], namespace=args)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
raise exc
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 782, in main
ret = cls.run(args)
File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 653, in run
args.config)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1357, in from_config_file
return cls(config=config, validate=validate)
File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 95, in __init__
super().__init__(config=config, validate=validate)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1345, in __init__
config.validate(schema=self.config_schema)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1010, in validate
validator.validate(self)
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 347, in validate
for error in self.iter_errors(*args, **kwargs):
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
for error in errors:
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1159, in validate_config_properties
schema))
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 286, in properties
schema_path=property,
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
for error in self.iter_errors(instance, schema):
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
for error in errors:
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 263, in ref
for error in validator.descend(instance, resolved):
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
for error in self.iter_errors(instance, schema):
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
for error in errors:
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 49, in additionalProperties
for error in validator.descend(instance[extra], aP, path=extra):
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
for error in self.iter_errors(instance, schema):
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
for error in errors:
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/_validators.py", line 337, in oneOf
errs = list(validator.descend(instance, subschema, schema_path=index))
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 339, in descend
for error in self.iter_errors(instance, schema):
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 323, in iter_errors
for error in errors:
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1171, in validate_config_properties
if 'default' in subschema and prop not in instance:
TypeError: argument of type 'NoneType' is not iterable
Not helping either, and same as above, previous error is repeated. I don't know how this error should be handled, but a first step could be to modify the default template.
Missing argument
Say now that we have:
learned_params:
selection:
type: classification
loss_func: Cross Entropy
command:
dnadna_run dnadna preprocess bactsel/bactsel_training_config.yml
dnadna_run dnadna preprocess bactsel/bactsel_training_config.yml
an unexpected error occurred: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas
Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
{'description': 'details about a single parameter to learn in a '
'training run',
'oneOf': [{'additionalProperties': False,
'properties': {'log_transform': {'default': False,
'description': 'whether '
'or not a '
'log '
'transform '
'should be '
'applied '
'to this '
"parameter's "
'known '
'values '
'during '
'pre-processing; '
'training '
'is then '
'performed '
'with the '
'log '
'values '
'(regression '
'parameters '
'only)',
'type': 'boolean'},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'MSE'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'tied_to_position': {'default': False,
'description': 'values '
'of '
'this '
'parameter '
'are '
'SNP '
'positions, '
'so any '
'transformations '
'or '
'normalizations '
'of the '
'position '
'array '
'must '
'also '
'be '
'applied '
'to '
'this '
'parameter '
'during '
'training',
'type': 'boolean'},
'type': {'const': 'regression'}}},
{'additionalProperties': False,
'properties': {'classes': {'description': 'classification '
'parameters '
'classes, either '
'an integer '
'giving the '
'number of '
'classes in the '
'parameter, or '
'an array to '
'give explicit '
'names to the '
'classes (one '
'item for each '
'class); class '
'names can '
'themselves be '
'either strings, '
'or integers '
'(which are '
'converted '
'automatically '
'to strings, as '
'they are just '
'labels for the '
'classes)',
'items': {'type': ['integer',
'string']},
'minItems': 1,
'minimum': 1,
'type': ['integer', 'array']},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'Cross Entropy'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'n_classes': {'description': 'after '
'pre-processing, '
'this property '
'contains the '
'number of '
'classes in a '
'classification '
'parameter; if '
'the "classes" '
'property is '
'an integer '
'this is '
'identical; '
'otherwise it '
'is the length '
'of the '
'"classes" '
'array; '
'normally this '
'property '
'should not be '
'manually '
'specified',
'minimum': 1,
'type': 'integer'},
'type': {'const': 'classification'}},
'required': ['classes']}],
'properties': {'type': {'description': 'parameter type; either '
'"regression" or '
'"classification". '
'Classification parameters '
'require the additional '
'"classes" property',
'enum': ['regression', 'classification']}},
'required': ['type'],
'type': 'object'}
On instance['learned_params']['selection']:
Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}); run again with --debug to view the full traceback
Here the short message isn't telling us that the argument classes
is missing when we have a classification task. We can see it by reading the json-schema, but it is not obvious at first sight, especially given the text-wrapping which is not helping.
With debug: it doesn't help + duplicates of the schema and errors doesn't help either.
command:
dnadna preprocess bactsel/bactsel_training_config.yml --debug
dnadna preprocess bactsel/bactsel_training_config.yml --debug
an unexpected error occurred: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas
Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
{'description': 'details about a single parameter to learn in a '
'training run',
'oneOf': [{'additionalProperties': False,
'properties': {'log_transform': {'default': False,
'description': 'whether '
'or not a '
'log '
'transform '
'should be '
'applied '
'to this '
"parameter's "
'known '
'values '
'during '
'pre-processing; '
'training '
'is then '
'performed '
'with the '
'log '
'values '
'(regression '
'parameters '
'only)',
'type': 'boolean'},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'MSE'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'tied_to_position': {'default': False,
'description': 'values '
'of '
'this '
'parameter '
'are '
'SNP '
'positions, '
'so any '
'transformations '
'or '
'normalizations '
'of the '
'position '
'array '
'must '
'also '
'be '
'applied '
'to '
'this '
'parameter '
'during '
'training',
'type': 'boolean'},
'type': {'const': 'regression'}}},
{'additionalProperties': False,
'properties': {'classes': {'description': 'classification '
'parameters '
'classes, either '
'an integer '
'giving the '
'number of '
'classes in the '
'parameter, or '
'an array to '
'give explicit '
'names to the '
'classes (one '
'item for each '
'class); class '
'names can '
'themselves be '
'either strings, '
'or integers '
'(which are '
'converted '
'automatically '
'to strings, as '
'they are just '
'labels for the '
'classes)',
'items': {'type': ['integer',
'string']},
'minItems': 1,
'minimum': 1,
'type': ['integer', 'array']},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'Cross Entropy'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'n_classes': {'description': 'after '
'pre-processing, '
'this property '
'contains the '
'number of '
'classes in a '
'classification '
'parameter; if '
'the "classes" '
'property is '
'an integer '
'this is '
'identical; '
'otherwise it '
'is the length '
'of the '
'"classes" '
'array; '
'normally this '
'property '
'should not be '
'manually '
'specified',
'minimum': 1,
'type': 'integer'},
'type': {'const': 'classification'}},
'required': ['classes']}],
'properties': {'type': {'description': 'parameter type; either '
'"regression" or '
'"classification". '
'Classification parameters '
'require the additional '
'"classes" property',
'enum': ['regression', 'classification']}},
'required': ['type'],
'type': 'object'}
On instance['learned_params']['selection']:
Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}); run again with --debug to view the full traceback
an unexpected error occurred: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas
Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
{'description': 'details about a single parameter to learn in a '
'training run',
'oneOf': [{'additionalProperties': False,
'properties': {'log_transform': {'default': False,
'description': 'whether '
'or not a '
'log '
'transform '
'should be '
'applied '
'to this '
"parameter's "
'known '
'values '
'during '
'pre-processing; '
'training '
'is then '
'performed '
'with the '
'log '
'values '
'(regression '
'parameters '
'only)',
'type': 'boolean'},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'MSE'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'tied_to_position': {'default': False,
'description': 'values '
'of '
'this '
'parameter '
'are '
'SNP '
'positions, '
'so any '
'transformations '
'or '
'normalizations '
'of the '
'position '
'array '
'must '
'also '
'be '
'applied '
'to '
'this '
'parameter '
'during '
'training',
'type': 'boolean'},
'type': {'const': 'regression'}}},
{'additionalProperties': False,
'properties': {'classes': {'description': 'classification '
'parameters '
'classes, either '
'an integer '
'giving the '
'number of '
'classes in the '
'parameter, or '
'an array to '
'give explicit '
'names to the '
'classes (one '
'item for each '
'class); class '
'names can '
'themselves be '
'either strings, '
'or integers '
'(which are '
'converted '
'automatically '
'to strings, as '
'they are just '
'labels for the '
'classes)',
'items': {'type': ['integer',
'string']},
'minItems': 1,
'minimum': 1,
'type': ['integer', 'array']},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'Cross Entropy'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'n_classes': {'description': 'after '
'pre-processing, '
'this property '
'contains the '
'number of '
'classes in a '
'classification '
'parameter; if '
'the "classes" '
'property is '
'an integer '
'this is '
'identical; '
'otherwise it '
'is the length '
'of the '
'"classes" '
'array; '
'normally this '
'property '
'should not be '
'manually '
'specified',
'minimum': 1,
'type': 'integer'},
'type': {'const': 'classification'}},
'required': ['classes']}],
'properties': {'type': {'description': 'parameter type; either '
'"regression" or '
'"classification". '
'Classification parameters '
'require the additional '
'"classes" property',
'enum': ['regression', 'classification']}},
'required': ['type'],
'type': 'object'}
On instance['learned_params']['selection']:
Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}); run again with --debug to view the full traceback
Traceback (most recent call last):
File "/home/jean/anaconda3/envs/dnadna/bin/dnadna", line 11, in <module>
load_entry_point('dnadna', 'console_scripts', 'dnadna')()
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
raise exc
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 790, in main
ret2 = cls.run_subcommand(args)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 759, in run_subcommand
return command_cls.main(command[1:], namespace=args)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 798, in main
raise exc
File "/home/jean/Documents/Git/dnadna/dnadna/utils/__init__.py", line 782, in main
ret = cls.run(args)
File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 653, in run
args.config)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1357, in from_config_file
return cls(config=config, validate=validate)
File "/home/jean/Documents/Git/dnadna/dnadna/data_preprocessing.py", line 95, in __init__
super().__init__(config=config, validate=validate)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1345, in __init__
config.validate(schema=self.config_schema)
File "/home/jean/Documents/Git/dnadna/dnadna/utils/config.py", line 1010, in validate
validator.validate(self)
File "/home/jean/anaconda3/envs/dnadna/lib/python3.7/site-packages/jsonschema/validators.py", line 348, in validate
raise error
jsonschema.exceptions.ValidationError: Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1}) is not valid under any of the given schemas
Failed validating 'oneOf' in schema['properties']['learned_params']['additionalProperties']:
{'description': 'details about a single parameter to learn in a '
'training run',
'oneOf': [{'additionalProperties': False,
'properties': {'log_transform': {'default': False,
'description': 'whether '
'or not a '
'log '
'transform '
'should be '
'applied '
'to this '
"parameter's "
'known '
'values '
'during '
'pre-processing; '
'training '
'is then '
'performed '
'with the '
'log '
'values '
'(regression '
'parameters '
'only)',
'type': 'boolean'},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'MSE'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'tied_to_position': {'default': False,
'description': 'values '
'of '
'this '
'parameter '
'are '
'SNP '
'positions, '
'so any '
'transformations '
'or '
'normalizations '
'of the '
'position '
'array '
'must '
'also '
'be '
'applied '
'to '
'this '
'parameter '
'during '
'training',
'type': 'boolean'},
'type': {'const': 'regression'}}},
{'additionalProperties': False,
'properties': {'classes': {'description': 'classification '
'parameters '
'classes, either '
'an integer '
'giving the '
'number of '
'classes in the '
'parameter, or '
'an array to '
'give explicit '
'names to the '
'classes (one '
'item for each '
'class); class '
'names can '
'themselves be '
'either strings, '
'or integers '
'(which are '
'converted '
'automatically '
'to strings, as '
'they are just '
'labels for the '
'classes)',
'items': {'type': ['integer',
'string']},
'minItems': 1,
'minimum': 1,
'type': ['integer', 'array']},
'loss_func': {'$ref': '#/definitions/loss_func',
'default': 'Cross Entropy'},
'loss_weight': {'$ref': '#/definitions/loss_weight',
'default': 1},
'n_classes': {'description': 'after '
'pre-processing, '
'this property '
'contains the '
'number of '
'classes in a '
'classification '
'parameter; if '
'the "classes" '
'property is '
'an integer '
'this is '
'identical; '
'otherwise it '
'is the length '
'of the '
'"classes" '
'array; '
'normally this '
'property '
'should not be '
'manually '
'specified',
'minimum': 1,
'type': 'integer'},
'type': {'const': 'classification'}},
'required': ['classes']}],
'properties': {'type': {'description': 'parameter type; either '
'"regression" or '
'"classification". '
'Classification parameters '
'require the additional '
'"classes" property',
'enum': ['regression', 'classification']}},
'required': ['type'],
'type': 'object'}
On instance['learned_params']['selection']:
Config({'type': 'classification', 'loss_func': 'Cross Entropy', 'loss_weight': 1})
Summary:
- Better or clearer error messages when possible. If not, we could invite the user to (re)read carefully the json-schema.
- maybe output a link to the json-schema instead of showing it entirely here (possibly output it in the terminal only with debug)
- Fix the repetition of error message when using debug