DemoSEQ
Inferring demographic history from whole-genome sequence data using summary statistics
This repository contains python 2.7 scripts for computing summary statistics
used in
Jay, F., Boitard, S., & Austerlitz, F. (2019). An ABC method for whole-genome sequence data: inferring paleolithic and neolithic human expansions. Molecular biology and evolution, 36(7), 1565-1579
- Classic (Expected heterozygosity, Tajima D, Proportion of segregating sites, Average 50kb-window haplotypic heterozygosity)
- Site-Frequency-Spectrum
- Linkage-Desiquilibrium
- Identical-By-State Tract Length
- Allele-Frequency Identical-By-State (Theunert et al. 2012)
Install python modules and run
For easier use create a conda environment containing the required modules
conda env create -f environment.yml
conda activate demoseq
If you want to use jupyter lab you can install a kernel that you can then use in jupyter notebook or jupyter lab:
python -m ipykernel install --user --name=demoseq
Please then refer to compute_sumstats_example notebook
Comments
The summary_statistics.py script was built upon PopSizeABC (Boitard et al. 2016).
We implemented two versions of AF-IBS computation:
-
(1) 'orignal' implementation used in the Jay et al. MBE2019 (that stores previous segment border for each encountered SNP configuration -> fast but does not scale well to large sample size)
-
(2) algo based on Positional Burrows-Wheeler transform (PBWT)
see AFIBS_BWT notes -
An application of both algorithms to msprime simulated data: afibs_minimal notebook
I will decribe this new algorithm for AF-IBS computation in a future note
We know that python 2.7 is not very up-to-date anymore and hope to integrate these summary statistics into a new package currently developed in the lab (hopefully in the coming months)
Please cite
Jay, F., Boitard, S., & Austerlitz, F. (2019). An ABC method for whole-genome sequence data: inferring paleolithic and neolithic human expansions. Molecular biology and evolution, 36(7), 1565-1579.
Acknowledgments
Bertrand Servin who contributed to the code
Bibliography
- Boitard S, Rodríguez W, Jay F, Mona S, Austerlitz F. 2016. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.Beaumont MA, editor. PLoS Genet. 12:e1005877.
- Theunert C, Tang K, Lachmann M, Hu S, Stoneking M. 2012. Inferring the History of Population Size Change from Genome-Wide SNP Data. Mol. Biol. Evol. 29:3653–3667.