Geographic Ancestry Inference Algorithm - Python implementation (continuous space parsimony methods)
Project description
gaiapy: Geographic Ancestry Inference Algorithm (Python)
gaiapy is a Python port of the GAIA R package for inferring the geographic locations of genetic ancestors using tree sequences. This package implements generalized parsimony methods for ancestral location reconstruction in continuous geographic space.
Note: This implementation is under active development. Use the R implementation for all scientific applications.
Note: This package is distributed on PyPI as geoancestry but the module name is gaiapy.
Current Implementation Status
✅ Implemented and Ready to Use:
- Quadratic parsimony - for ancestors in continuous space, minimizing sum of squared Euclidean distances
- Linear parsimony - for ancestors in continuous space, minimizing sum of absolute (Manhattan) distances
- Full metadata integration and tree sequence augmentation
- Comprehensive validation and utility functions
🚧 Not Yet Implemented:
- Discrete parsimony - for ancestors restricted to finite location sets (coming soon)
- Ancestry coefficients - temporal analysis of ancestry proportions (coming soon)
- Migration flux - migration flow analysis between regions (coming soon)
This package leverages the Python tskit API directly, avoiding the need for C wrappers and making the implementation more accessible for Python users and web applications.
Installation
Install from PyPI:
pip install geoancestry
Install from source for development:
git clone https://github.com/chris-a-talbot/gaiapy
cd gaiapy
pip install -e ".[dev]"
Quick Start
Basic Continuous Space Reconstruction
import gaiapy as gp
import tskit
import numpy as np
# Load your tree sequence
ts = tskit.load("path/to/treesequence.trees")
# Define sample locations as [node_id, x_coord, y_coord]
# node_id: Tree sequence node IDs (0-based)
# x_coord, y_coord: Geographic coordinates (any coordinate system)
samples = np.array([
[0, 1.5, 2.0], # node 0 at coordinates (1.5, 2.0)
[1, 4.2, 3.1], # node 1 at coordinates (4.2, 3.1)
[2, 6.7, 5.5], # node 2 at coordinates (6.7, 5.5)
# ... more samples
])
# Quadratic reconstruction (minimizes sum of squared Euclidean distances)
mpr_quad = gp.quadratic_mpr(ts, samples)
locations_quad = gp.quadratic_mpr_minimize(mpr_quad)
# Linear reconstruction (minimizes sum of Manhattan distances)
mpr_lin = gp.linear_mpr(ts, samples)
locations_lin = gp.linear_mpr_minimize(mpr_lin)
print(f"Quadratic reconstruction shape: {locations_quad.shape}")
print(f"Linear reconstruction shape: {locations_lin.shape}")
Working with Tree Sequence Metadata
# If your tree sequence has location metadata, you can extract it automatically
sample_locs = gp.extract_sample_locations_from_metadata(ts)
# Or augment a tree sequence with location data
ts_with_locs = gp.augment_tree_sequence_with_locations(ts, samples)
# Use metadata-aware reconstruction
mpr_meta = gp.quadratic_mpr_with_metadata(ts_with_locs)
locations_meta = gp.quadratic_mpr_minimize(mpr_meta)
Advanced Options
# Discrete coordinate system reconstruction (useful for grid-based coordinates)
locations_discrete = gp.quadratic_mpr_minimize_discrete(mpr_quad)
# Alternative linear parsimony with discrete output
locations_lin_discrete = gp.linear_mpr_minimize_discrete(mpr_lin)
# Export results for further analysis
gp.export_locations_to_file(locations_quad, "ancestral_locations.tsv")
Key Functions (Currently Implemented)
Continuous Space Functions
quadratic_mpr()- Continuous space reconstruction using squared distanceslinear_mpr()- Continuous space reconstruction using absolute distancesquadratic_mpr_minimize()- Find optimal continuous locations (quadratic)linear_mpr_minimize()- Find optimal continuous locations (linear)quadratic_mpr_minimize_discrete()- Discrete coordinate optimization (quadratic)linear_mpr_minimize_discrete()- Discrete coordinate optimization (linear)
Metadata Integration
quadratic_mpr_with_metadata()- Metadata-aware quadratic reconstructionlinear_mpr_with_metadata()- Metadata-aware linear reconstructionextract_sample_locations_from_metadata()- Extract locations from tree sequence metadataaugment_tree_sequence_with_locations()- Add location data to tree sequencesvalidate_location_metadata()- Validate location data formatexport_locations_to_file()/import_locations_from_file()- I/O utilities
Input Data Format
Sample locations should be provided as a NumPy array with shape (n_samples, 3):
samples = np.array([
[node_id, x_coordinate, y_coordinate],
[node_id, x_coordinate, y_coordinate],
# ...
])
- node_id: Tree sequence node ID (0-based, integer)
- x_coordinate, y_coordinate: Geographic coordinates (float, any coordinate system)
Output Format
Reconstructed locations are returned as NumPy arrays with shape (n_nodes, 2) where:
- Row index corresponds to tree sequence node ID
- Column 0: x-coordinate of reconstructed location
- Column 1: y-coordinate of reconstructed location
Coming Soon
The following features from the original GAIA R package are planned for future releases:
- Discrete parsimony -
discrete_mpr(),discrete_mpr_minimize(),discrete_mpr_edge_history() - Ancestry analysis -
discrete_mpr_ancestry(),discrete_mpr_ancestry_flux()
References
Grundler, M.C., Terhorst, J., and Bradburd, G.S. (2025) A geographic history of human genetic ancestry. Science 387(6741): 1391-1397. DOI: 10.1126/science.adp4642
License
MIT License (adapted from original CC-BY 4.0 International)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geoancestry-0.1.2.tar.gz.
File metadata
- Download URL: geoancestry-0.1.2.tar.gz
- Upload date:
- Size: 37.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33cbae7e69daa9bad3bb2266136365729fd1ed7b5f97da7e42b601bcf1663371
|
|
| MD5 |
f84ba6c7d0ba922b8c379fb2fcbccea9
|
|
| BLAKE2b-256 |
73856e39c96797873d50cc75cbc5fb3912cd36f9d790de1819583f25014beee4
|
File details
Details for the file geoancestry-0.1.2-py3-none-any.whl.
File metadata
- Download URL: geoancestry-0.1.2-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86ddad96767bdd09dd8bd8baa5cce7609f4b6d018d8ab39f4646b954354ffcb6
|
|
| MD5 |
2759129e377b3dac1018c977ca4e4eac
|
|
| BLAKE2b-256 |
86ed53fbc0c5cfe097a114640c7bb244f0f7d38ee24a4b87fa27390bb7216233
|