Skip to main content

Polyploid Haplotype Analysis for Sequenced Eukaryotic References - A comprehensive toolkit for haplotype analysis in complex genomes

Project description

๐Ÿงฌ Haplophaser

Haplotype analysis toolkit for complex genomes with full polyploid support.

Haplophaser analyzes haplotype inheritance patterns in derived lines relative to founder/source populations. Designed from the ground up for polyploid genomes, from diploids through hexaploids and beyond.

Features

  • Haplotype Proportion Estimation: Calculate what fraction of a sample's genome derives from each founder population
  • Chromosome Painting: Paint genomic regions by haplotype origin using Hidden Markov Models
  • Chimeric Contig Detection: Identify potential misassemblies through haplotype switches
  • Linkage-Informed Scaffolding: Order and orient scaffolds using haplotype phase information
  • Full Polyploid Support: First-class support for diploid, autopolyploid, and allopolyploid genomes

Installation

From PyPI

pip install haplophaser

Development Installation

# Clone the repository
git clone https://github.com/aseetharam/haplophaser.git
cd haplophaser

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode with dev dependencies
pip install -e ".[dev]"

Dependencies

Core dependencies:

  • Python 3.10+
  • NumPy
  • Pydantic v2
  • cyvcf2
  • PyYAML
  • Typer

Quick Start

Basic Usage

# Estimate haplotype proportions
haplophaser proportion variants.vcf.gz -p populations.tsv -o results/

# Paint chromosomes by haplotype origin
haplophaser paint variants.vcf.gz -p populations.tsv -o painted/

# Order scaffolds using linkage
haplophaser scaffold scaffolds.vcf.gz -p populations.tsv -g genetic_map.tsv

# Run quality control checks
haplophaser qc variants.vcf.gz -p populations.tsv

Population File Format

Haplophaser uses TSV or YAML files to define population structure:

TSV format (populations.tsv):

sample	population	role	ploidy
B73	NAM_founders	founder	2
Mo17	NAM_founders	founder	2
W22	NAM_founders	founder	2
RIL_001	NAM_RILs	derived	2
RIL_002	NAM_RILs	derived	2

YAML format (populations.yaml):

populations:
  - name: NAM_founders
    role: founder
    ploidy: 2
    samples:
      - B73
      - Mo17
      - W22

  - name: NAM_RILs
    role: derived
    ploidy: 2
    samples:
      - RIL_001
      - RIL_002

Polyploid Examples

For polyploid species, define subgenomes in YAML:

populations:
  - name: wheat_founders
    role: founder
    ploidy: 6
    subgenomes:
      - name: A
        ploidy: 2
      - name: B
        ploidy: 2
      - name: D
        ploidy: 2
    samples:
      - Chinese_Spring
      - Jagger

Configuration

Generate a configuration template:

haplophaser init-config -o haplophaser.yaml

Then customize and use:

haplophaser proportion variants.vcf.gz -p populations.tsv -c haplophaser.yaml

Python API

from haplophaser import Sample, Population, PopulationRole
from haplophaser.core.models import make_hexaploid_sample
from haplophaser.io import load_populations_yaml, VCFReader

# Create samples programmatically
b73 = Sample(name="B73", ploidy=2, population="founders")

# Create polyploid samples
wheat = make_hexaploid_sample("Chinese_Spring", ("A", "B", "D"), "founders")

# Load populations from file
populations = load_populations_yaml("populations.yaml")

# Read VCF files
with VCFReader("variants.vcf.gz") as reader:
    for variant in reader.fetch("chr1", 0, 1_000_000):
        print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alt}")

Coordinate System

Haplophaser uses 0-based, half-open intervals (BED-style) internally:

  • Position 0 is the first base
  • Intervals are [start, end) โ€” start is included, end is excluded

Conversion to/from 1-based systems (VCF, GFF) happens automatically during I/O.

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=haplophaser --cov-report=html

# Run specific test file
pytest tests/test_models.py

Code Quality

# Lint and format check
ruff check src tests

# Format code
ruff format src tests

# Type checking
mypy src

Project Structure

haplophaser/
โ”œโ”€โ”€ pyproject.toml          # Package configuration
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ haplophaser/
โ”‚       โ”œโ”€โ”€ __init__.py     # Package exports
โ”‚       โ”œโ”€โ”€ core/
โ”‚       โ”‚   โ”œโ”€โ”€ models.py   # Data models (Sample, Variant, etc.)
โ”‚       โ”‚   โ””โ”€โ”€ config.py   # Configuration system
โ”‚       โ”œโ”€โ”€ io/
โ”‚       โ”‚   โ”œโ”€โ”€ vcf.py      # VCF reading
โ”‚       โ”‚   โ””โ”€โ”€ populations.py  # Population file I/O
โ”‚       โ””โ”€โ”€ cli/
โ”‚           โ””โ”€โ”€ main.py     # CLI commands
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ conftest.py         # Test fixtures
โ”‚   โ”œโ”€โ”€ test_models.py
โ”‚   โ”œโ”€โ”€ test_config.py
โ”‚   โ””โ”€โ”€ test_populations.py
โ””โ”€โ”€ docs/

Roadmap

  • Core data models with polyploid support
  • Configuration system
  • Population file I/O
  • CLI skeleton
  • VCF reading implementation
  • Window-based analysis
  • HMM-based haplotype inference
  • Chromosome painting
  • Proportion estimation
  • Scaffold ordering
  • Integration with chromoplot for visualization
  • Expression bias analysis
  • Subgenome dominance testing

Citation

If you use Haplophaser in your research, please cite:

Haplophaser: Haplotype analysis toolkit for complex genomes. (in preparation)

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haplophaser-0.1.1.tar.gz (387.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

haplophaser-0.1.1-py3-none-any.whl (300.1 kB view details)

Uploaded Python 3

File details

Details for the file haplophaser-0.1.1.tar.gz.

File metadata

  • Download URL: haplophaser-0.1.1.tar.gz
  • Upload date:
  • Size: 387.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for haplophaser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cfc6f99929b76893eb2fbe05c2af6c3163be5c9a0c4638051a0d92d6f40620bd
MD5 20b0678c1b1f628683b40b764d4b9dac
BLAKE2b-256 68ab79cc9f8f2578c5a084fd9dbb973eb933f69b00b7af7876051aa308a0891c

See more details on using hashes here.

File details

Details for the file haplophaser-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: haplophaser-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 300.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for haplophaser-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 574dc6f50aa27f388feac1ff9efff42be5382c4b0485b2dc84ec75eaf98256b9
MD5 1f287eb1b0bb15acf50f803b1b87270b
BLAKE2b-256 64527881a32396d73653a067b3738adfdbaeeae5fef706c1ca795aab1a994534

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page