Mentions légales du service

Skip to content
Snippets Groups Projects
JAY Flora's avatar
JAY Flora authored
4d078322
History
Name Last commit Last update
AFIBS_BWT
sumstats
.gitignore
README.md
environment.yml

DemoSEQ

Inferring demographic history from whole-genome sequence data using summary statistics

This repository contains python 2.7 scripts for computing summary statistics used in
Jay, F., Boitard, S., & Austerlitz, F. (2019). An ABC method for whole-genome sequence data: inferring paleolithic and neolithic human expansions. Molecular biology and evolution, 36(7), 1565-1579

  • Classic (Expected heterozygosity, Tajima D, Proportion of segregating sites, Average 50kb-window haplotypic heterozygosity)
  • Site-Frequency-Spectrum
  • Linkage-Desiquilibrium
  • Identical-By-State Tract Length
  • Allele-Frequency Identical-By-State (Theunert et al. 2012)

Install python modules and run

For easier use create a conda environment containing the required modules

conda env create -f environment.yml  
conda activate demoseq  

If you want to use jupyter lab you can install a kernel that you can then use in jupyter notebook or jupyter lab:

python -m ipykernel install --user --name=demoseq

Please then refer to compute_sumstats_example notebook

Comments

The summary_statistics.py script was built upon PopSizeABC (Boitard et al. 2016).

We implemented two versions of AF-IBS computation:

  • (1) 'orignal' implementation used in the Jay et al. MBE2019 (that stores previous segment border for each encountered SNP configuration -> fast but does not scale well to large sample size)

  • (2) algo based on Positional Burrows-Wheeler transform (PBWT)
    see AFIBS_BWT notes

  • An application of both algorithms to msprime simulated data: afibs_minimal notebook

I will decribe this new algorithm for AF-IBS computation in a future note

We know that python 2.7 is not very up-to-date anymore and hope to integrate these summary statistics into a new package currently developed in the lab (hopefully in the coming months)

Please cite

Jay, F., Boitard, S., & Austerlitz, F. (2019). An ABC method for whole-genome sequence data: inferring paleolithic and neolithic human expansions. Molecular biology and evolution, 36(7), 1565-1579.

Acknowledgments

Bertrand Servin who contributed to the code

Bibliography

  • Boitard S, Rodríguez W, Jay F, Mona S, Austerlitz F. 2016. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.Beaumont MA, editor. PLoS Genet. 12:e1005877.
  • Theunert C, Tang K, Lachmann M, Hu S, Stoneking M. 2012. Inferring the History of Population Size Change from Genome-Wide SNP Data. Mol. Biol. Evol. 29:3653–3667.