Skip to main content

Geographic Ancestry Inference Algorithm - Python implementation

Project description

gaiapy: Geographic Ancestry Inference Algorithm (Python)

gaiapy is a Python port of the GAIA R package for inferring the geographic locations of genetic ancestors using tree sequences. It implements three approaches to ancestral location reconstruction:

  1. Discrete parsimony - for ancestors restricted to a finite set of locations
  2. Squared change parsimony - for ancestors in continuous space, minimizing squared distances
  3. Linear parsimony - for ancestors in continuous space, minimizing absolute distances

This package leverages the Python tskit API directly, avoiding the need for C wrappers and making the implementation more accessible and maintainable.

Installation

Install from source:

git clone <repository-url>
cd gaiapy
pip install .

For development:

pip install -e ".[dev]"

Quick Start

Working with Discrete Locations

import gaiapy
import tskit
import numpy as np

# Load your tree sequence
ts = tskit.load("path/to/treesequence.trees")

# Define sample locations - each sample must be assigned to a discrete state
# node_id: Tree sequence node IDs (0-based)
# state_id: Location state IDs (0-based in Python, unlike R version)
samples = np.array([
    [0, 0],  # node 0 -> state 0
    [1, 0],  # node 1 -> state 0  
    [2, 1],  # node 2 -> state 1
    # ... more samples
])

# Create cost matrix for migrations between states
# Must be symmetric with non-negative values
# Entry [i,j] = cost of migrating from state i to state j
num_states = 2
costs = np.ones((num_states, num_states))  # Default cost of 1 between states
np.fill_diagonal(costs, 0)  # No cost to stay in same state

# Compute migration costs
mpr = gaiapy.discrete_mpr(ts, samples, costs)

# Get optimal state assignments for ancestors
states = gaiapy.discrete_mpr_minimize(mpr)

# Get detailed migration histories (optional)
history = gaiapy.discrete_mpr_edge_history(ts, mpr, costs)

Working with Continuous Space

# For ancestors in continuous space, provide sample coordinates
samples = np.array([
    [0, 1.5, 2.0],  # node 0 at coordinates (1.5, 2.0)
    [1, 4.2, 3.1],  # node 1 at coordinates (4.2, 3.1) 
    [2, 6.7, 5.5],  # node 2 at coordinates (6.7, 5.5)
    # ... more samples
])

# Using squared distance (minimizes sum of squared Euclidean distances)
mpr_quad = gaiapy.quadratic_mpr(ts, samples)
locations_quad = gaiapy.quadratic_mpr_minimize(mpr_quad)

# Using absolute distance (minimizes sum of Manhattan distances)
mpr_lin = gaiapy.linear_mpr(ts, samples)
locations_lin = gaiapy.linear_mpr_minimize(mpr_lin)

Key Functions

  • discrete_mpr() - Discrete state reconstruction
  • quadratic_mpr() - Continuous space reconstruction using squared distances
  • linear_mpr() - Continuous space reconstruction using absolute distances
  • discrete_mpr_minimize() - Find optimal discrete state assignments
  • discrete_mpr_edge_history() - Detailed migration histories
  • discrete_mpr_ancestry() - Ancestry coefficients through time
  • discrete_mpr_ancestry_flux() - Migration flux between regions

Differences from R Version

  • Uses 0-based indexing throughout (consistent with Python/tskit conventions)
  • Returns NumPy arrays instead of R matrices/data frames
  • Leverages tskit Python API directly instead of C wrappers
  • More Pythonic API design and error handling

References

Grundler, M.C., Terhorst, J., and Bradburd, G.S. (2025) A geographic history of human genetic ancestry. Science 387(6741): 1391-1397. DOI: 10.1126/science.adp4642

License

MIT License (adapted from original CC-BY 4.0 International)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoancestry-0.1.0.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoancestry-0.1.0-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file geoancestry-0.1.0.tar.gz.

File metadata

  • Download URL: geoancestry-0.1.0.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for geoancestry-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dec8c19f837864511ff58aecafcf23f233e5c94c1b4348eae823d4b3b5223aac
MD5 11f810fd4a178d01e230d8b6417fe644
BLAKE2b-256 e753c924ae532caa272c50548810fa78d5b90e34fc7512a9e3b7914cb44cbbf7

See more details on using hashes here.

File details

Details for the file geoancestry-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: geoancestry-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for geoancestry-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4ac49ef74a9140f5bcb0c6230fa5ac4472250d5d76278ff52b5537f12a44a1b
MD5 68d23957d6e4c8bcbdd205473e68d506
BLAKE2b-256 5f3a59df8cf6d4c41336107e72613751366696dc725c9f64b174e2d0a012c44d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page