Call ROH in low coverage ancient human DNA data (1240K SNPs) using modern reference panel

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

hapROH

Software to identify runs of homozygosity (ROH) in ancient and present-day DNA, using a panel of reference haplotypes.

This package contains functions and wrappers to call ROH and functions for downstream analysis of the results (visualization and analysis).

For downward compatibility, the package uses hapsburg as module name, after installation you can import functions via from hapsburg.XX import YY

Installation

You can install the package using the Package manager pip:

python3 -m pip install hapROH

(python3 -m makes sure you use your python installation)

The package distributes source code. The setup.py contains information that should automatically install the package. For customized installations, find more info in the section below (c Extension)

Getting Started

To get started, please find vignette jupyter notebooks: https://www.dropbox.com/sh/eq4drs62tu6wuob/AABM41qAErmI2S3iypAV-j2da?dl=0

These are a ressource to do show example usecases, that you can use as template for your own applications.

These notebooks walk you through examples for

how to use the core functions to call ROH from eigenstrat files, and generate ROH tables from results of multiple individuals ('callROH_vignette')
how to use functions for visualizing ROH results ('plotting_vignette' - warning: Some of these are experimental and require additional packages. You might want to consider creating your own plotting functions for visualizing the results in the way that works best for you)
how to call IBD on the X chromosome between two male X chromosomes ('callIBD_maleX_vignette', warning: experimental)

Scope of the Method

Standard parameters are tuned for human 1240K capture data (ca. 1.2 million SNPs) and using 1000 Genome haplotypes as reference. The software worked for a wide range of test cases, both 1240k data and also whole genome sequencing data downsampled to 1240k. Test cases included 45k year old Ust Ishim man, and a wide range of American, Eurasian and Oceanian ancient DNA, showing that the method generally works for split times of reference panel and target up to a few 10k years (Attention: Neanderthals and Denisovans do not fall into that range).

In the first version, hapROH works on eigenstrat file (either packed or unpacked, the mode can be set). A future release will add functionality to use diploid genotype calls, or genotype likelihoods from a .vcf.

If you have whole genome data available, you should downsample an create eigenstrat files for biallelic 1240k SNPs first.

In case you are planning applications to other kind of SNP or bigger SNP sets, or even other organisms, the method parameters have to be updated (the default parameters are optimized for human 1240K data). You can mirror our procedure to find good parameters (described in the publication), and if you contact me for assistance - I am happy to help with my own experience.

Get reference Data

hapROH currently uses global 1000 Genome data (n=5008 haplotypes), filtered down to bi-allelic 1240k SNPs. We use .hdf5 format for the reference panel - which includes a genetic map.

You can download the prepared reference data (including a necessary metadata .csv) from:
https://www.dropbox.com/s/0qhjgo1npeih0bw/1000g1240khdf5.tar.gz?dl=0

and unpack into a directory of your choise using

tar -xvf FILE.tar.gz

You then have to link the paths in the hapROH run parameters (see vignette notebook)

Example Use

Please find example notebooks, walking through a typical application to an eigenstrat file at https://www.dropbox.com/sh/eq4drs62tu6wuob/AABM41qAErmI2S3iypAV-j2da?dl=0

All you need is a Eigenstrat file, and the reference genome data (see link above), and you are good to go to run your own ROH calling!

Dependencies

The basic requirements for calling ROH are kept minimal and only address the core ROH calling. If you want to use extended analysis and plotting functionality: There are extra Python packages that you need to install (e.g. via pip).

If you want to use the plotting functionality, you need matplotlib installed.
For plotting of maps, you will need basemap (warning: installing can be tricky on some architectures).
If you want to use the effective population size fitting functionality from ROH output, you require the package statsmodels.

c Extension

For performance reasons, the heavy lifting of the algorithm is coded into a cfunction cfunc.c. This "extension" is built via cython from cfunc.pyx

The pypi package is distributed via source. This means that a c extension has to be built. Ideally, this is done automatically via the package cython (as CYTHON=True in setup.py by default).

You can also set CYTHON=FALSE, then the extension is compiled from cfunc.c directly (experimental, not tested on all platforms).

Citation

If you use the software for a scientific publication and want to cite it, you can use: https://www.biorxiv.org/content/10.1101/2020.05.31.126912v1

Contact

If you have any bug reports or comments, please feel always free to contact me: harald_ringbauer AT hms harvard edu (fill in blanks with dots)

Bug reports and user experiences will help me to improve this software - so please do not hold back!

Author: Harald Ringbauer, 2020

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.64

Apr 14, 2023

0.63

Jan 26, 2023

0.62

Nov 19, 2022

0.61

Nov 18, 2022

0.60

Aug 30, 2022

0.53

Jul 1, 2022

0.52

Jun 29, 2022

0.51a0 pre-release

Jun 8, 2022

0.5a0 pre-release

Mar 14, 2022

0.4a1 pre-release

Dec 16, 2021

0.3a4 pre-release

Jun 14, 2021

0.3a3 pre-release

Apr 26, 2021

0.3a2 pre-release

Mar 19, 2021

0.3a1 pre-release

Dec 14, 2020

0.2a3 pre-release

Nov 20, 2020

0.2a2 pre-release

Nov 19, 2020

0.2a1 pre-release

Oct 20, 2020

0.1a9 pre-release

Oct 5, 2020

0.1a8 pre-release

Sep 10, 2020

0.1a7 pre-release

Aug 24, 2020

0.1a6 pre-release

Jul 23, 2020

This version

0.1a5 pre-release

Jul 22, 2020

0.1a4 pre-release

Jun 28, 2020

0.1a3 pre-release

Jun 28, 2020

0.1a2 pre-release

Jun 1, 2020

0.1a1 pre-release

May 31, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hapROH-0.1a5.tar.gz (1.2 MB view details)

Uploaded Jul 22, 2020 Source

File details

Details for the file hapROH-0.1a5.tar.gz.

File metadata

Download URL: hapROH-0.1a5.tar.gz
Upload date: Jul 22, 2020
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0

File hashes

Hashes for hapROH-0.1a5.tar.gz
Algorithm	Hash digest
SHA256	`680b307a99ae3634cecb606dc9ac84602b640d6fa14efff04193f5c5db1cad05`
MD5	`549cad85e5b6d45eb4eb8fa20b06cef0`
BLAKE2b-256	`c0d74e7090b9c596dd361b9c5d6ab8452c7a4f72e5c97dd032c703bde26b70ef`

See more details on using hashes here.

hapROH 0.1a5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hapROH

Installation

Getting Started

Scope of the Method

Get reference Data

Example Use

Dependencies

c Extension

Citation

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes