Skip to main content

Resource for working with microhaploptype data from the ALFRED database (https://alfred.med.yale.edu).

Project description

MicroHapDB

Daniel Standage, 2018
https://github.com/bioforensics/microhapdb

MicroHapDB is a package designed for scientists and researchers interested in microhaplotype analysis. This package is a distribution and convenience mechanism and does not implement any analytics itself. All microhap data was obtained from the Allele Frequency Database (ALFRED)[1] at the Yale University School of Medicine. We are not affiliated with the provisioners of this data, but we are enthusiatic about the utility of this data type in a variety of bioforensic applications!

Installation

To install:

pip3 install microhapdb

To make sure the package installed correctly:

pip3 install pytest
py.test --pyargs microhapdb

MicroHapDB requires Python version 3.

Usage

MicroHapDB provides several convenient methods to access microhaplotype data.

  • a command-line interface
  • a Python API
  • a collection of tab-delimited text files

Command-line interface

The following commands are available on the command line.

  • microhapdb locus
  • microhapdb variant
  • microhapdb allele
  • microhapdb population

Invoke microhapdb <cmd> -h to see usage instructions and examples for each command.

Python API

Programmatic access to microhap data within Python is as simple as invoking import microhapdb and querying the following tables.

  • microhapdb.allelefreqs
  • microhapdb.loci
  • microhapdb.populations
  • microhapdb.variants

Each is a Pandas[2] dataframe object, supporting convenient and efficient listing, subsetting, and query capabilities. The following example demonstrates how data across the different tables can be cross-referenced.

>>> import microhapdb
>>> microhapdb.loci.query('ID == "SI664558I"')
            ID        Name  Chrom      Start        End                        Variants
101  SI664558I  mh02KK-136      2  227227672  227227743  rs6714835,rs6756898,rs12617010
>>> microhapdb.populations.query('Name.str.contains("Amer")')
           ID                Name  NumChrom
41  SA000101C   African Americans       168
59  SA004047P   African Americans       122
83  SA004250L  European Americans       198
>>> microhapdb.allelefreqs.query('Locus == "SI664558I" and Population.isin(["SA000101C", "SA004047P", "SA004250L"])')
           Locus Population Allele  Frequency
38320  SI664558I  SA000101C  G,T,C      0.172
38321  SI664558I  SA000101C  G,T,A      0.103
38322  SI664558I  SA000101C  G,C,C      0.029
38323  SI664558I  SA000101C  G,C,A      0.000
38324  SI664558I  SA000101C  T,T,C      0.293
38325  SI664558I  SA000101C  T,T,A      0.063
38326  SI664558I  SA000101C  T,C,C      0.132
38327  SI664558I  SA000101C  T,C,A      0.207
38344  SI664558I  SA004047P  G,T,C      0.156
38345  SI664558I  SA004047P  G,T,A      0.148
38346  SI664558I  SA004047P  G,C,C      0.016
38347  SI664558I  SA004047P  G,C,A      0.000
38348  SI664558I  SA004047P  T,T,C      0.336
38349  SI664558I  SA004047P  T,T,A      0.049
38350  SI664558I  SA004047P  T,C,C      0.156
38351  SI664558I  SA004047P  T,C,A      0.139
38488  SI664558I  SA004250L  G,T,C      0.384
38489  SI664558I  SA004250L  G,T,A      0.202
38490  SI664558I  SA004250L  G,C,C      0.000
38491  SI664558I  SA004250L  G,C,A      0.000
38492  SI664558I  SA004250L  T,T,C      0.197
38493  SI664558I  SA004250L  T,T,A      0.000
38494  SI664558I  SA004250L  T,C,C      0.071
38495  SI664558I  SA004250L  T,C,A      0.146

See the Pandas documentation for more details on dataframe access and query methods.

Tab-delimited text files

The data behind MicroHapDB is contained in 4 tab-delimited text files. If you'd prefer not to use MicroHapDB's command-line interface or Python API, it should be trivial load these files directly into R, Julia, or the data science environment of your choice. Invoke microhapdb files on the command line to see the location of the installed .tsv files.

  • locus.tsv: microhaplotype loci
  • variant.tsv: variants associated with each microhap locus
  • allele.tsv: allele frequencies for 148 loci across 84 populations
  • population.tsv: summary of the populations studied

Citation

If you use this package, please cite our work.

Standage DS (2018) MicroHapDB: programmatic access to published microhaplotype data. GitHub repository, https://github.com/bioforensics/microhapdb.


[1]Rajeevan H, Soundararajan U, Kidd JR, Pakstis AJ, Kidd KK (2012) ALFRED: an allele frequency resource for research and teaching. Nucleic Acids Research, 40(D1): D1010-D1015. doi:10.1093/nar/gkr924.

[2]McKinney W (2010) Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 51-56.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microhapdb-0.1.0.tar.gz (24.6 kB view details)

Uploaded Source

File details

Details for the file microhapdb-0.1.0.tar.gz.

File metadata

  • Download URL: microhapdb-0.1.0.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.5.1

File hashes

Hashes for microhapdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aaf41f6fd9078d5d229bb9e5de920842d2e12432a261755a04fc55cc1dff023f
MD5 9efa5c76e0f715fd7084be1f97e76a1a
BLAKE2b-256 eeb990172ce0bd8dc37c5526c0674e88001a1c8dbc79067b860e4ac6b306ecf3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page