Commits · flora/documentation/overview_network · Machine learning for population genetics / private / dnadna

Jul 30, 2021
- [documentation] change potentially confusing verbiage about "datasets" · b5bd9735
  E Madison Bray authored 3 years ago
  
  b5bd9735
- [documentation] misc minor nitpicks · 3d95d116
  E Madison Bray authored 3 years ago
  
  3d95d116
- Update overview.rst - summarize the prediction part · 7466e46e
  Jérémy Guez authored 3 years ago and E Madison Bray committed 3 years ago
  
  7466e46e
- Update prediction.rst : minor changes · f953e9b9
  Jérémy Guez authored 3 years ago and E Madison Bray committed 3 years ago
  
  f953e9b9
- Update prediction.rst · ba8ac4a2
  Jérémy Guez authored 3 years ago and E Madison Bray committed 3 years ago
  
  ba8ac4a2
- [documentation] minor updates, mostly for spelling/formatting · 15a59790
  E Madison Bray authored 3 years ago
  
  15a59790
- first version of completed overview · 502a4c39
  Flora Jay authored 3 years ago and E Madison Bray committed 3 years ago
  
  502a4c39
- mention of filenam_format propertie · c9c5dfbb
  Flora Jay authored 3 years ago and E Madison Bray committed 3 years ago
  
  c9c5dfbb
- start working on overview · 14774d9d
  Flora Jay authored 3 years ago and E Madison Bray committed 3 years ago
  
  14774d9d
Jul 29, 2021

Merge branch 'embray/enhancement/improved-shutdown-handling' into 'master' · c0a45224
E Madison Bray authored 3 years ago
```
Improved interrupt handling for dnadna train

See merge request !127
```
c0a45224
Merge branch 'embray/missing-default-simulator-seed' into 'master' · c430d099
E Madison Bray authored 3 years ago
```
Add a default of `seed: null` for the simulator config in the schema

See merge request !126
```
c430d099
[bug] add a default of `seed: null` for the simulator config in the · d973556e
E Madison Bray authored 3 years ago
```
schema

fixes the issue raised at !123 (comment 551445)
```
d973556e

Slightly better output surrounding the progress bar. · 9c2da6e9

E Madison Bray authored 3 years ago

The message displayed when pausing training is now on a separate line
from the progress bar(s).

Unfortunately tqdm does not have a public method to get all active
progress bars, so their displays can be cleared.  It would be better if
they were hidden outright or something, and then when resuming training
could be redrawn again in the same place.  Currently this is difficult
to do with tqdm.  I have ideas for a workaround but it's not worth
spending a lot of time on.

Also handle when the user hits Ctrl-D while suspended.

9c2da6e9

[enhancement] improved interrupt handling for dnadna train · 17904150

E Madison Bray authored 3 years ago

Previously, trying to interrupt `dnadna train` (e.g. with Ctrl-C)
could often take several tries, and would result in a bunch of messy
tracebacks (often overlapping each other due to tracebacks from
interrupted worker processes)

This now handles some interrupts more cleanly.

In particular, pressing Ctrl-C does two things:

1) Rather than immediately interrupting the training, it simply pauses
   it.  Pressing Enter resumes the training, and pressing Ctrl-C again
   cancels it.

2) When canceling the training, it attempts to shut down gracefully:
   If in a validation pass it interrupts the validation, and if in a
   training pass it interrupts the training loop and tries to exit
   cleanly.  This is not always 100% guarantee as not all code is
   interrupt-safe, but it will be more rare to get a non-clean
   interrupt.

   In a follow-up, we could also choose to save a checkpoint right
   when interrupted.

Likewise, trying to terminate the process will attempt a clean shutdown.

17904150

Merge branch 'embray/documentation-simulators' into 'master' · b034e3f9
E Madison Bray authored 3 years ago
```
Improved Simulator documentation

See merge request !88
```
b034e3f9

Jul 28, 2021
- [testing] since !122 , Command.main never raises exceptions raised by · 2960a0eb
  E Madison Bray authored 3 years ago
  
  commands' implementations Instead it prints/logs them. For testing purposes it's better to be able to catch and check the original exceptions, so a raise_exceptions flag is added to Command.main
  2960a0eb
- [documentation] add missing import and some typo fixes · a4f68ff6
  E Madison Bray authored 3 years ago
  
  a4f68ff6
- [bug] rewrite this portion of the example code in such a way as to work · d5efa567
  E Madison Bray authored 3 years ago
  
  around this pandas/numpy bug: https://github.com/pandas-dev/pandas/issues/39520 If I understand correctly the bug only occurs when initializing an "empty" DataFrame (even if the index is not empty). Instead we construct a dict of the columns first, and then initialize the DataFrame from this dict. That should avoid triggering this bug.
  d5efa567
- "population change event" -> "population size change event" · ec82b5f4
  E Madison Bray authored 3 years ago
  
  ec82b5f4
- [documentation] Minor typos · 7ba739c9
  j.guez authored 3 years ago and E Madison Bray committed 3 years ago
  
  7ba739c9
- [documentation] finish writing the initial version of the simulation · af9077a0
  E Madison Bray authored 3 years ago
  
  documentation, including the tutorial on writing a custom simulator The code in this documentation has been hand-tested but is not automatically tested. That will be a task for the future.
  af9077a0
- Fix some minor Simulator bugs encountered while working on updating the · a4473722
  E Madison Bray authored 3 years ago
  
  documentation.
  a4473722
- [documentation] WIP on the improved Simulator documentation · 7bdb385f
  E Madison Bray authored 3 years ago
  
  7bdb385f
- [refactoring] remove the confusing DefaultSimulator that is not actually · db3ca0fe
  E Madison Bray authored 3 years ago
  
  usable. Since !74 the note in its docstring about providing default templates is no longer valid either. Its existence is only likely to confuse users, since it cannot be run.
  db3ca0fe
- Merge branch 'embray/simulator/scenario-params-overwrite' into 'master' · d7997b2d
  E Madison Bray authored 3 years ago
  
  Start to address #84 See merge request !117
  d7997b2d
Jul 27, 2021

Merge branch 'embray/error-formatting' into 'master' · 5a40893b
E Madison Bray authored 3 years ago
```
Slightly improved error formatting

See merge request !122
```
5a40893b
Merge branch 'embray/testing/fix-random-seed-test-again' into 'master' · dab6b0ae
E Madison Bray authored 3 years ago
```
Fix the test_random_seed test on CUDA again

See merge request !124
```
dab6b0ae

[bug] don't create backup directory or display misleading log messages · 45f43c62

E Madison Bray authored 3 years ago

when using --overwrite instead of --backup

The fact that the same method is being used both for --backup and
--overwrite is a bit messy, but there's enough overlap in their
functionality that I keep it as is for now.

45f43c62

Merge branch 'embray/documentation/random-seed' into 'master' · d17c5ba2
E Madison Bray authored 3 years ago
```
[documentation] remove note about random seed in training docs

See merge request !125
```
d17c5ba2
[documentation] remove note about random seed in training docs · 0b4f1b75
E Madison Bray authored 3 years ago
```
This warning is no longer applicable since I fixed it in !123

[skip ci]
```
0b4f1b75
[testing] minor test fix and whitespace fix · 4edd6b43
E Madison Bray authored 3 years ago

4edd6b43

[testing] fix the test_random_seed test on CUDA again · 24d32a38

E Madison Bray authored 3 years ago

Since merging !72 this seems to fail randomly a lot for SPIDNA.  It
didn't fail on the MR for some reason, but due to its non-deterministic
nature (especially when the tests are run in parallel) there could be
some other minor influence causing it to fail more often now that it's
merged.

24d32a38

Merge branch 'embray/issue-109' into 'master' · ef6304ce
E Madison Bray authored 3 years ago
```
[bug] get rid of all default seeds in different configuration sources

See merge request !123
```
ef6304ce

Jul 26, 2021

[bug] get rid of all default seeds in different configuration sources · 23eb4340

E Madison Bray authored 3 years ago

There are currently 3 "seed" options:

1) A "seed" for simulations
2) A "seed" for preprocessing (this mostly controls randomization of
dataset splits)
3) A "seed" for training

All of these had default values in the default config files.  I think
partly as an artifact of when I ported over some of Jean and Theophile's
old config files into the code.

In practice, users should expect stochasticity by default.  The schemas
have a default value of "null" for all these seeds, which is equivalent
to random seeding of the PNRG.  If users want to set a specific seed for
reproducibility they should do so manually.

(Possible future enhancement: Record the seed that was used so they can
reproduce the same run even if the seed was not set explicitly first.)

23eb4340

Merge branch 'jcury/documentation/training' into 'master' · de5f7c6e
JAY Flora authored 3 years ago
```
[doc] start work on training doc

See merge request !95
```
de5f7c6e

[enhancement] in addition to `dnadna simulation run --overwrite` there · 254f0799

E Madison Bray authored 3 years ago

is now a recommended `--backup` flag which creates a backup of existing
simulation data that could conflict with a previous simulation run

Otherwise, passing --overwrite just deletes previous conflicting data.

Here "conflicting" means the scenario params file is the same as the
one in the simulation config file, or the scenario files have the same
filename format as in the simulation config file.  If they are a
different format (e.g. a simulation was run once, then the config file
was changed to use a different filename format, and the simulation run
again in the same directory) then there is no way to know for sure what
files belonged to a previous simulation, under the current scheme.

However, it shouldn't be a problem, in this (unusual) case, as the old
simulation data won't conflict with the new simulation data, as it's in
a different filename format.  This would have to be cleaned up manually
by the user.  Or better still, a new simulation should be run in a new
directory.

The backup option moves all possibly conflicting files into a new
`{data_root}/{dataset_name}-backup.{timestamp}` directory.

254f0799

[bug] by default do *not* overwrite an existing simulation params file · 3c9e5704

E Madison Bray authored 3 years ago

Instead, an error is raised indicating that the --overwrite flag should
be used.  If giving --overwrite, then the existing file will be
overwritten.

When using the CLI, an existing file will never be loaded, though it
remains an option when using the API (albeit a bit clunkily).

3c9e5704

[refactoring] remove the --scenario-params argument from `dnadna · bc601242

E Madison Bray authored 3 years ago

simulation run`

This was a feature to allow specifying a different path to a scenario
params file than the one in the config file.  In retrospect, this
features does not make a lot of sense under the current design: In order
to use it properly, the user would still have to modify their simulation
config file to point to the correct scenario params file if they want to
use it for training.

This feature was not even covered by the tests.

If the need for something like this arises in the future, we can
reconsider how it should work.

bc601242

[testing] fix README to account for new output formatting from `dnadna · 15e19ed2
E Madison Bray authored 3 years ago
```
simulation run`
```
15e19ed2

[enhancement] add option to overwrite an existing scenario params file · 359a68f7

E Madison Bray authored 3 years ago

and/or write a new one, rather than loading an existing one

for the CLI this is now the default--when re-running a simulation it
will overwrite the existing file

as a further improvement, maybe we should only overwrite (or at least
mention to the user) if the n_scenarios/n_replicates in the scenario
params table is not consistent with what's in the config file

359a68f7

Admin message