Skip to main content

Haplotype reconstruction from unphased diplotypes

Project description

HaploPy – Haplotype estimation and phasing in Python

This package contains tools for estimating haplotype (or allele list) frequencies in a population using measurements of unphased genotype data, that is, phenotypes.

Introduction

In layman terms, a phenotype is defined as an observation of two-allele sets over multiple gene loci:

    Aa ––––––– Bb ––––––– Cc
    |          |          |
  locus 1    locus 2    locus 3

Note that the above datum doesn't reveal what are the exact haplotype (allele sequence) pair behind the phenotype. Possible parent haplotype pairs that could result into the above phenotype are given by

(ABC, abc), (aBC, Abc), (AbC, aBc), (abC, ABc), ...

In other words, the mapping that maps a haplotype pair to a phenotype is defined by the example

(Abc, aBC) => (Aa, Bb, Cc)

and so on. Note that each item in the phenotype is a set of two alleles where the order doesn't matter.

Problem: Suppose that we have a set of phenotype observations from a large population of N individuals. For each individual phenotype we would like to estimate what are the most probable haplotype pair that resulted into the phenotype. The main ingredient of solution is the estimation of individual haplotype frequencies in the population

Installation

The package is found in PyPi.

pip install haplopy

Alternatively, install development version manually using Conda

git clone https://github.com/malmgrek/haplopy.git
pip install -r requirements
pip install -e .

To check if the development version installed correctly, run tests with

pytest -v 

Examples

Estimate haplotype frequencies

Simulate dataset using a prescribed haplotype probabilities and a multinomial distribution model.

import haplopy as hp


proba_haplotypes = {
    ("A", "B", "C"): 0.34,
    ("a", "B", "c"): 0.20,
    ("a", "B", "C"): 0.13,
    ("a", "b", "c"): 0.23,
    ("A", "b", "C"): 0.10
}

phenotypes = hp.multinomial.Model(proba_haplotypes).random(100)

fig = hp.plot.plot_haplotypes(proba_haplotypes)
fig = hp.plot.plot_phenotypes(phenotypes)

Original relative haplotype frequencies

Simulated phenotype observation set

Pretend that we don't know the underlying haplotype distribution and let's try to estimate it.

model = hp.multinomial.Model().fit(phenotypes)
fig = hp.plot.plot_haplotypes(
    model.proba_haplotypes,
    thres=1.0e-6  # Hide probabilities smaller than this
)

Estimated relative haplotype frequencies

Phenotype phasing

Use an existing model to calculate the probabilities (conditional to given the phenotype) of different diplotype representations of a given phenotype.

import haplopy as hp


model = hp.multinomial.Model({
    ("A", "B"): 0.4,
    ("A", "b"): 0.3,
    ("a", "B"): 0.2,
    ("a", "b"): 0.1
})

# A complete phenotype observation
model.calculate_proba_diplotypes(("Aa", "Bb"))
# {(('A', 'B'), ('a', 'b')): 0.4, (('A', 'b'), ('a', 'B')): 0.6}

# A phenotype with some missing SNPs
model.calculate_proba_diplotypes(("A.", ".."))
# {(('A', 'B'), ('A', 'B')): 0.17582417582417584,
#  (('A', 'B'), ('A', 'b')): 0.2637362637362637,
#  (('A', 'B'), ('a', 'B')): 0.17582417582417584,
#  (('A', 'B'), ('a', 'b')): 0.08791208791208792,
#  (('A', 'b'), ('A', 'b')): 0.09890109890109888,
#  (('A', 'b'), ('a', 'B')): 0.13186813186813184,
#  (('A', 'b'), ('a', 'b')): 0.06593406593406592}

In particular, phenotype phasing also enables computing the probabilities of different admissible phenotypes as well as imputation of missing data:

model.calculate_proba_phenotypes(("A.", ".."))
# {('AA', 'BB'): 0.17582417582417584,
#  ('AA', 'Bb'): 0.2637362637362637,
#  ('Aa', 'BB'): 0.17582417582417584,
#  ('Aa', 'Bb'): 0.21978021978021978,
#  ('AA', 'bb'): 0.09890109890109888,
#  ('Aa', 'bb'): 0.06593406593406592}

# Imputes with the most probable one
model.impute(("A.", ".."))
# ("AA", "Bb")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haplopy-0.1.3.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

haplopy-0.1.3-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file haplopy-0.1.3.tar.gz.

File metadata

  • Download URL: haplopy-0.1.3.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for haplopy-0.1.3.tar.gz
Algorithm Hash digest
SHA256 a5260770185ee8c1e01cfc24195ae67aaf494e87a03825f05742bc715898137a
MD5 6daeceb285d19c2dd2bb1bd52b83922f
BLAKE2b-256 802b271dff0a2e53d959e8a92b2ad7be411c2b3fbb51d814c1994b2f5dcea61c

See more details on using hashes here.

File details

Details for the file haplopy-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: haplopy-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for haplopy-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5744b52b84d5ffb8b9350964f0bff1e8a941ff165b67f42074005d18ed9c5657
MD5 dd10dd900e73a91248c1ebb2b61477f4
BLAKE2b-256 8a87ec76ce105e93b04e61be0623f569fcc2b3a8b19b627bb162fd0939c3ef8f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page