Documentation sprint
One of the major remaining issues for the 1.0 release is to fill out the missing documentation. This is a meta-issue to track and divide up that work to make sure we aren't duplicating efforts.
Below is a list of sections I believe remain to be written for the documentation. I can write most of the technical parts about how to use the software, but help anywhere is appreciated (in particular since in using the software you might have some good tips to give for users). Please add your username or usernames (if you will work on it together) to the list to claim the remaining documentation tasks. For example I've claimed the section on implementing simulators
-
(@fjay ) Theoretical -- we need at least one page in the documentation explaining how the different networks work (at least those that will be in the first release). This can be briefly summarized from any papers written or being worked on, and can have links to the relevant papers.
-
(@fjay ) Overview--I started some Usage Overview documentation going into more detail about what DNADNA does and how to use it (more detail than the README, less detail than the following pages). It has some sections that need to be finished; they can be brief and link to the main pages for these topics to give more detail:
- Preprocessing
- Training
- Prediction
-
(@jcury, !93 (merged)) Preprocessing -- why do we need to run a pre-processing step and what exactly does it do? What are the outputs? How do you configure and run the pre-processing command? Here we also probably need to discuss configuration of the parameters to learn.
-
(@jcury, !95 (merged)) Training -- brief overview of how model training works (most of the details are typical for training neural nets with pytorch so this can be glossed over). What are the inputs to the
dnadna train
command? What are the outputs of a training run? How do you write the config file? -
(@j.guez )Prediction -- how to use the prediction command on new data given a trained model.
-
(@embray, !88 (merged)) Simulators -- the minor role the
dnadna simulation
command plays and how to implement new simulators, including using simulators to convert existing datasets (e.g. how to convert msprime tree sequences) -
Summary statistics? There is a stub for a section on running summary statistics, but since we've said that won't be a focus for the first release (if at all) we might just leave it undocumented for now and remove the section.
Finally, any thoughts on the overall organization of the documentation? Is there anything we should change?