- Jul 30, 2021
-
-
E Madison Bray authored
-
E Madison Bray authored
-
-
-
-
E Madison Bray authored
-
-
-
-
- Jul 29, 2021
-
-
E Madison Bray authored
Improved interrupt handling for dnadna train See merge request !127
-
E Madison Bray authored
Add a default of `seed: null` for the simulator config in the schema See merge request !126
-
E Madison Bray authored
schema fixes the issue raised at !123 (comment 551445)
-
E Madison Bray authored
The message displayed when pausing training is now on a separate line from the progress bar(s). Unfortunately tqdm does not have a public method to get all active progress bars, so their displays can be cleared. It would be better if they were hidden outright or something, and then when resuming training could be redrawn again in the same place. Currently this is difficult to do with tqdm. I have ideas for a workaround but it's not worth spending a lot of time on. Also handle when the user hits Ctrl-D while suspended.
-
E Madison Bray authored
Previously, trying to interrupt `dnadna train` (e.g. with Ctrl-C) could often take several tries, and would result in a bunch of messy tracebacks (often overlapping each other due to tracebacks from interrupted worker processes) This now handles some interrupts more cleanly. In particular, pressing Ctrl-C does two things: 1) Rather than immediately interrupting the training, it simply pauses it. Pressing Enter resumes the training, and pressing Ctrl-C again cancels it. 2) When canceling the training, it attempts to shut down gracefully: If in a validation pass it interrupts the validation, and if in a training pass it interrupts the training loop and tries to exit cleanly. This is not always 100% guarantee as not all code is interrupt-safe, but it will be more rare to get a non-clean interrupt. In a follow-up, we could also choose to save a checkpoint right when interrupted. Likewise, trying to terminate the process will attempt a clean shutdown.
-
E Madison Bray authored
Improved Simulator documentation See merge request !88
-
- Jul 28, 2021
-
-
E Madison Bray authored
commands' implementations Instead it prints/logs them. For testing purposes it's better to be able to catch and check the original exceptions, so a raise_exceptions flag is added to Command.main
-
E Madison Bray authored
-
E Madison Bray authored
around this pandas/numpy bug: https://github.com/pandas-dev/pandas/issues/39520 If I understand correctly the bug only occurs when initializing an "empty" DataFrame (even if the index is not empty). Instead we construct a dict of the columns first, and then initialize the DataFrame from this dict. That should avoid triggering this bug.
-
E Madison Bray authored
-
-
E Madison Bray authored
documentation, including the tutorial on writing a custom simulator The code in this documentation has been hand-tested but is not automatically tested. That will be a task for the future.
-
E Madison Bray authored
documentation.
-
E Madison Bray authored
-
E Madison Bray authored
usable. Since !74 the note in its docstring about providing default templates is no longer valid either. Its existence is only likely to confuse users, since it cannot be run.
-
E Madison Bray authored
Start to address #84 See merge request !117
-
- Jul 27, 2021
-
-
E Madison Bray authored
Slightly improved error formatting See merge request !122
-
E Madison Bray authored
Fix the test_random_seed test on CUDA again See merge request !124
-
E Madison Bray authored
when using --overwrite instead of --backup The fact that the same method is being used both for --backup and --overwrite is a bit messy, but there's enough overlap in their functionality that I keep it as is for now.
-
E Madison Bray authored
[documentation] remove note about random seed in training docs See merge request !125
-
E Madison Bray authored
This warning is no longer applicable since I fixed it in !123 [skip ci]
-
E Madison Bray authored
-
E Madison Bray authored
Since merging !72 this seems to fail randomly a lot for SPIDNA. It didn't fail on the MR for some reason, but due to its non-deterministic nature (especially when the tests are run in parallel) there could be some other minor influence causing it to fail more often now that it's merged.
-
E Madison Bray authored
[bug] get rid of all default seeds in different configuration sources See merge request !123
-
- Jul 26, 2021
-
-
E Madison Bray authored
There are currently 3 "seed" options: 1) A "seed" for simulations 2) A "seed" for preprocessing (this mostly controls randomization of dataset splits) 3) A "seed" for training All of these had default values in the default config files. I think partly as an artifact of when I ported over some of Jean and Theophile's old config files into the code. In practice, users should expect stochasticity by default. The schemas have a default value of "null" for all these seeds, which is equivalent to random seeding of the PNRG. If users want to set a specific seed for reproducibility they should do so manually. (Possible future enhancement: Record the seed that was used so they can reproduce the same run even if the seed was not set explicitly first.)
-
E Madison Bray authored
is now a recommended `--backup` flag which creates a backup of existing simulation data that could conflict with a previous simulation run Otherwise, passing --overwrite just deletes previous conflicting data. Here "conflicting" means the scenario params file is the same as the one in the simulation config file, or the scenario files have the same filename format as in the simulation config file. If they are a different format (e.g. a simulation was run once, then the config file was changed to use a different filename format, and the simulation run again in the same directory) then there is no way to know for sure what files belonged to a previous simulation, under the current scheme. However, it shouldn't be a problem, in this (unusual) case, as the old simulation data won't conflict with the new simulation data, as it's in a different filename format. This would have to be cleaned up manually by the user. Or better still, a new simulation should be run in a new directory. The backup option moves all possibly conflicting files into a new `{data_root}/{dataset_name}-backup.{timestamp}` directory.
-
E Madison Bray authored
Instead, an error is raised indicating that the --overwrite flag should be used. If giving --overwrite, then the existing file will be overwritten. When using the CLI, an existing file will never be loaded, though it remains an option when using the API (albeit a bit clunkily).
-
E Madison Bray authored
simulation run` This was a feature to allow specifying a different path to a scenario params file than the one in the config file. In retrospect, this features does not make a lot of sense under the current design: In order to use it properly, the user would still have to modify their simulation config file to point to the correct scenario params file if they want to use it for training. This feature was not even covered by the tests. If the need for something like this arises in the future, we can reconsider how it should work.
-
E Madison Bray authored
simulation run`
-
E Madison Bray authored
and/or write a new one, rather than loading an existing one for the CLI this is now the default--when re-running a simulation it will overwrite the existing file as a further improvement, maybe we should only overwrite (or at least mention to the user) if the n_scenarios/n_replicates in the scenario params table is not consistent with what's in the config file
-