Geographic Ancestry Inference Algorithm - Python implementation
Project description
gaiapy: Geographic Ancestry Inference Algorithm (Python)
gaiapy is a Python port of the GAIA R package for inferring the geographic locations of genetic ancestors using tree sequences. It implements three approaches to ancestral location reconstruction:
- Discrete parsimony - for ancestors restricted to a finite set of locations
- Squared change parsimony - for ancestors in continuous space, minimizing squared distances
- Linear parsimony - for ancestors in continuous space, minimizing absolute distances
This package leverages the Python tskit API directly, avoiding the need for C wrappers and making the implementation more accessible and maintainable.
Installation
Install from source:
git clone <repository-url>
cd gaiapy
pip install .
For development:
pip install -e ".[dev]"
Quick Start
Working with Discrete Locations
import gaiapy
import tskit
import numpy as np
# Load your tree sequence
ts = tskit.load("path/to/treesequence.trees")
# Define sample locations - each sample must be assigned to a discrete state
# node_id: Tree sequence node IDs (0-based)
# state_id: Location state IDs (0-based in Python, unlike R version)
samples = np.array([
[0, 0], # node 0 -> state 0
[1, 0], # node 1 -> state 0
[2, 1], # node 2 -> state 1
# ... more samples
])
# Create cost matrix for migrations between states
# Must be symmetric with non-negative values
# Entry [i,j] = cost of migrating from state i to state j
num_states = 2
costs = np.ones((num_states, num_states)) # Default cost of 1 between states
np.fill_diagonal(costs, 0) # No cost to stay in same state
# Compute migration costs
mpr = gaiapy.discrete_mpr(ts, samples, costs)
# Get optimal state assignments for ancestors
states = gaiapy.discrete_mpr_minimize(mpr)
# Get detailed migration histories (optional)
history = gaiapy.discrete_mpr_edge_history(ts, mpr, costs)
Working with Continuous Space
# For ancestors in continuous space, provide sample coordinates
samples = np.array([
[0, 1.5, 2.0], # node 0 at coordinates (1.5, 2.0)
[1, 4.2, 3.1], # node 1 at coordinates (4.2, 3.1)
[2, 6.7, 5.5], # node 2 at coordinates (6.7, 5.5)
# ... more samples
])
# Using squared distance (minimizes sum of squared Euclidean distances)
mpr_quad = gaiapy.quadratic_mpr(ts, samples)
locations_quad = gaiapy.quadratic_mpr_minimize(mpr_quad)
# Using absolute distance (minimizes sum of Manhattan distances)
mpr_lin = gaiapy.linear_mpr(ts, samples)
locations_lin = gaiapy.linear_mpr_minimize(mpr_lin)
Key Functions
discrete_mpr()- Discrete state reconstructionquadratic_mpr()- Continuous space reconstruction using squared distanceslinear_mpr()- Continuous space reconstruction using absolute distancesdiscrete_mpr_minimize()- Find optimal discrete state assignmentsdiscrete_mpr_edge_history()- Detailed migration historiesdiscrete_mpr_ancestry()- Ancestry coefficients through timediscrete_mpr_ancestry_flux()- Migration flux between regions
Differences from R Version
- Uses 0-based indexing throughout (consistent with Python/tskit conventions)
- Returns NumPy arrays instead of R matrices/data frames
- Leverages tskit Python API directly instead of C wrappers
- More Pythonic API design and error handling
References
Grundler, M.C., Terhorst, J., and Bradburd, G.S. (2025) A geographic history of human genetic ancestry. Science 387(6741): 1391-1397. DOI: 10.1126/science.adp4642
License
MIT License (adapted from original CC-BY 4.0 International)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geoancestry-0.1.0.tar.gz.
File metadata
- Download URL: geoancestry-0.1.0.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dec8c19f837864511ff58aecafcf23f233e5c94c1b4348eae823d4b3b5223aac
|
|
| MD5 |
11f810fd4a178d01e230d8b6417fe644
|
|
| BLAKE2b-256 |
e753c924ae532caa272c50548810fa78d5b90e34fc7512a9e3b7914cb44cbbf7
|
File details
Details for the file geoancestry-0.1.0-py3-none-any.whl.
File metadata
- Download URL: geoancestry-0.1.0-py3-none-any.whl
- Upload date:
- Size: 42.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4ac49ef74a9140f5bcb0c6230fa5ac4472250d5d76278ff52b5537f12a44a1b
|
|
| MD5 |
68d23957d6e4c8bcbdd205473e68d506
|
|
| BLAKE2b-256 |
5f3a59df8cf6d4c41336107e72613751366696dc725c9f64b174e2d0a012c44d
|