Work with another dataset: KeyError: 'n_replicates'

I am going to spam issues to have ourselves a memory of common log errors and solutions for future documentation.

I am trying to work with dnadna with another dataset:
Here are the steps I did:
Command: dnadna simulation init cattle
=> Then, modified cattle/cattle_simulation_config.yml (find in attached file)
Command: dnadna init --simulation-config=cattle/cattle_simulation_config.yml
=> Then, modified cattle/cattle_training_config.yml (find in attached file)
Command: dnadna --debug preprocess cattle/cattle_training_config.yml

log error: KeyError: 'n_replicates'

$ dnadna --debug preprocess cattle/cattle_training_config.yml
18/05/2020 17:29:04; INFO;  Removing scenarios with:                                                                                                                                          
18/05/2020 17:29:04; INFO;   - Missing replicates                                                                                                                                             
18/05/2020 17:29:04; INFO;  ...                                                                                                                                                               
18/05/2020 17:29:04; INFO;  Using 1 CPU for checking scenarios                                                                                                                                
  0%|                                                                                                                                                          | 0/2000 [00:00<?, ?scenario/s]an unexpected error occurred: 'n_replicates'; run again with --debug to view the full traceback
an unexpected error occurred: 'n_replicates'; run again with --debug to view the full traceback
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4736, in get_value
    return libindex.get_value_box(s, key)
  File "pandas/_libs/index.pyx", line 51, in pandas._libs.index.get_value_box
  File "pandas/_libs/index.pyx", line 47, in pandas._libs.index.get_value_at
  File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
  File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 219, in _check_scenario_wrapped
    return self.check_scenario(*scenario)
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 144, in check_scenario
    n_replicates = int(scenario['n_replicates'])
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/pandas/core/series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4744, in get_value
    raise e1
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'n_replicates'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pjobic/anaconda3/envs/dnadna/bin/dnadna", line 11, in <module>
    load_entry_point('dnadna', 'console_scripts', 'dnadna')()
  File "/home/pjobic/Work/DNADNA/master/dnadna/utils/__init__.py", line 821, in main
    raise exc
  File "/home/pjobic/Work/DNADNA/master/dnadna/utils/__init__.py", line 813, in main
    ret2 = cls.run_subcommand(args)
  File "/home/pjobic/Work/DNADNA/master/dnadna/utils/__init__.py", line 782, in run_subcommand
    return command_cls.main(command[1:], namespace=args)
  File "/home/pjobic/Work/DNADNA/master/dnadna/utils/__init__.py", line 821, in main
    raise exc
  File "/home/pjobic/Work/DNADNA/master/dnadna/utils/__init__.py", line 805, in main
    ret = cls.run(args)
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 675, in run
    progress_bar=True)
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 450, in prepare_training_run
    progress_bar=progress_bar)
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 313, in preprocess_scenario_params
    for idx, result in enumerate(bar):
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/tqdm/std.py", line 1097, in __iter__
    for obj in iterable:
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 277, in check_scenarios
    for result in iter_results():
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 269, in iter_results
    param_iter):
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 219, in _check_scenario_wrapped
    return self.check_scenario(*scenario)
  File "/home/pjobic/Work/DNADNA/master/dnadna/data_preprocessing.py", line 144, in check_scenario
    n_replicates = int(scenario['n_replicates'])
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/pandas/core/series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4744, in get_value
    raise e1
  File "/home/pjobic/anaconda3/envs/dnadna/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'n_replicates'
  0%|                                                                                                                                                          | 0/2000 [00:00<?, ?scenario/s]

files:

cattle_simulation_config.yml

cattle_training_config.yml

Setup:

data_root: /home/pjobic/Work/DNADNA/notebooks/data/cattle
Format: data_root/scenario_XXX
dataset_path_on_titanic: titanic:/home/tau/thsanche/data/cattle/scenario_{0_to_99}
working directory: /home/pjobic/Work/DNADNA/master

Edited May 18, 2020 by JOBIC Pierre

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Admin message

Work with another dataset: KeyError: 'n_replicates'