Skip to main content

Polyploid Haplotype Analysis for Sequenced Eukaryotic References - A comprehensive toolkit for haplotype analysis in complex genomes

Project description

๐Ÿ”ซ Phaser

Haplotype analysis toolkit for complex genomes with full polyploid support.

Phaser analyzes haplotype inheritance patterns in derived lines relative to founder/source populations. Designed from the ground up for polyploid genomes, from diploids through hexaploids and beyond.

Features

  • Haplotype Proportion Estimation: Calculate what fraction of a sample's genome derives from each founder population
  • Chromosome Painting: Paint genomic regions by haplotype origin using Hidden Markov Models
  • Chimeric Contig Detection: Identify potential misassemblies through haplotype switches
  • Linkage-Informed Scaffolding: Order and orient scaffolds using haplotype phase information
  • Full Polyploid Support: First-class support for diploid, autopolyploid, and allopolyploid genomes

Installation

From PyPI (when released)

pip install haplophaser

Development Installation

# Clone the repository
git clone https://github.com/your-org/phaser.git
cd phaser

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode with dev dependencies
pip install -e ".[dev]"

Dependencies

Core dependencies:

  • Python 3.10+
  • NumPy
  • Pydantic v2
  • cyvcf2
  • PyYAML
  • Typer

Quick Start

Basic Usage

# Estimate haplotype proportions
haplophaser proportion variants.vcf.gz -p populations.tsv -o results/

# Paint chromosomes by haplotype origin
haplophaser paint variants.vcf.gz -p populations.tsv -o painted/

# Order scaffolds using linkage
haplophaser scaffold scaffolds.vcf.gz -p populations.tsv -g genetic_map.tsv

# Run quality control checks
haplophaser qc variants.vcf.gz -p populations.tsv

Population File Format

Phaser uses TSV or YAML files to define population structure:

TSV format (populations.tsv):

sample	population	role	ploidy
B73	NAM_founders	founder	2
Mo17	NAM_founders	founder	2
W22	NAM_founders	founder	2
RIL_001	NAM_RILs	derived	2
RIL_002	NAM_RILs	derived	2

YAML format (populations.yaml):

populations:
  - name: NAM_founders
    role: founder
    ploidy: 2
    samples:
      - B73
      - Mo17
      - W22

  - name: NAM_RILs
    role: derived
    ploidy: 2
    samples:
      - RIL_001
      - RIL_002

Polyploid Examples

For polyploid species, define subgenomes in YAML:

populations:
  - name: wheat_founders
    role: founder
    ploidy: 6
    subgenomes:
      - name: A
        ploidy: 2
      - name: B
        ploidy: 2
      - name: D
        ploidy: 2
    samples:
      - Chinese_Spring
      - Jagger

Configuration

Generate a configuration template:

haplophaser init-config -o phaser.yaml

Then customize and use:

haplophaser proportion variants.vcf.gz -p populations.tsv -c phaser.yaml

Python API

from haplophaser import Sample, Population, PopulationRole
from haplophaser.core.models import make_hexaploid_sample
from haplophaser.io import load_populations_yaml, VCFReader

# Create samples programmatically
b73 = Sample(name="B73", ploidy=2, population="founders")

# Create polyploid samples
wheat = make_hexaploid_sample("Chinese_Spring", ("A", "B", "D"), "founders")

# Load populations from file
populations = load_populations_yaml("populations.yaml")

# Read VCF files
with VCFReader("variants.vcf.gz") as reader:
    for variant in reader.fetch("chr1", 0, 1_000_000):
        print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alt}")

Coordinate System

Phaser uses 0-based, half-open intervals (BED-style) internally:

  • Position 0 is the first base
  • Intervals are [start, end) โ€” start is included, end is excluded

Conversion to/from 1-based systems (VCF, GFF) happens automatically during I/O.

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=phaser --cov-report=html

# Run specific test file
pytest tests/test_models.py

Code Quality

# Lint and format check
ruff check src tests

# Format code
ruff format src tests

# Type checking
mypy src

Project Structure

phaser/
โ”œโ”€โ”€ pyproject.toml          # Package configuration
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ phaser/
โ”‚       โ”œโ”€โ”€ __init__.py     # Package exports
โ”‚       โ”œโ”€โ”€ core/
โ”‚       โ”‚   โ”œโ”€โ”€ models.py   # Data models (Sample, Variant, etc.)
โ”‚       โ”‚   โ””โ”€โ”€ config.py   # Configuration system
โ”‚       โ”œโ”€โ”€ io/
โ”‚       โ”‚   โ”œโ”€โ”€ vcf.py      # VCF reading
โ”‚       โ”‚   โ””โ”€โ”€ populations.py  # Population file I/O
โ”‚       โ””โ”€โ”€ cli/
โ”‚           โ””โ”€โ”€ main.py     # CLI commands
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ conftest.py         # Test fixtures
โ”‚   โ”œโ”€โ”€ test_models.py
โ”‚   โ”œโ”€โ”€ test_config.py
โ”‚   โ””โ”€โ”€ test_populations.py
โ””โ”€โ”€ docs/

Roadmap

  • Core data models with polyploid support
  • Configuration system
  • Population file I/O
  • CLI skeleton
  • VCF reading implementation
  • Window-based analysis
  • HMM-based haplotype inference
  • Chromosome painting
  • Proportion estimation
  • Scaffold ordering
  • Integration with chromoplot for visualization

Citation

If you use Phaser in your research, please cite:

Phaser: Haplotype analysis toolkit for complex genomes. (in preparation)

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haplophaser-0.1.0.tar.gz (387.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

haplophaser-0.1.0-py3-none-any.whl (300.0 kB view details)

Uploaded Python 3

File details

Details for the file haplophaser-0.1.0.tar.gz.

File metadata

  • Download URL: haplophaser-0.1.0.tar.gz
  • Upload date:
  • Size: 387.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for haplophaser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1c88a493fe369623b386317e51f10d732dd28d7e3f802fb32a8272c82c79108d
MD5 baeea9f471b751797d768ce99a4884f4
BLAKE2b-256 abb142d3661117b6a08c7e2ab23fd0cce45123c587a3905977ce91d8587e9a88

See more details on using hashes here.

File details

Details for the file haplophaser-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: haplophaser-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 300.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for haplophaser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6577a2a8b9734bb40208fac476ca2cfca5f6a02de54222b6e60c20e9e7b57cae
MD5 01301d04add66a8ce1424370e3ded75f
BLAKE2b-256 6c55e78774cb9bad314ee8e781f2368bdc7b51c00078f5aa04417454775ff6bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page