Polyploid Haplotype Analysis for Sequenced Eukaryotic References - A comprehensive toolkit for haplotype analysis in complex genomes
Project description
๐งฌ Haplophaser
Haplotype analysis toolkit for complex genomes with full polyploid support.
Haplophaser analyzes haplotype inheritance patterns in derived lines relative to founder/source populations. Designed from the ground up for polyploid genomes, from diploids through hexaploids and beyond.
Features
- Haplotype Proportion Estimation: Calculate what fraction of a sample's genome derives from each founder population
- Chromosome Painting: Paint genomic regions by haplotype origin using Hidden Markov Models
- Chimeric Contig Detection: Identify potential misassemblies through haplotype switches
- Linkage-Informed Scaffolding: Order and orient scaffolds using haplotype phase information
- Full Polyploid Support: First-class support for diploid, autopolyploid, and allopolyploid genomes
Installation
From PyPI
pip install haplophaser
Development Installation
# Clone the repository
git clone https://github.com/aseetharam/haplophaser.git
cd haplophaser
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode with dev dependencies
pip install -e ".[dev]"
Dependencies
Core dependencies:
- Python 3.10+
- NumPy
- Pydantic v2
- cyvcf2
- PyYAML
- Typer
Quick Start
Basic Usage
# Estimate haplotype proportions
haplophaser proportion variants.vcf.gz -p populations.tsv -o results/
# Paint chromosomes by haplotype origin
haplophaser paint variants.vcf.gz -p populations.tsv -o painted/
# Order scaffolds using linkage
haplophaser scaffold scaffolds.vcf.gz -p populations.tsv -g genetic_map.tsv
# Run quality control checks
haplophaser qc variants.vcf.gz -p populations.tsv
Population File Format
Haplophaser uses TSV or YAML files to define population structure:
TSV format (populations.tsv):
sample population role ploidy
B73 NAM_founders founder 2
Mo17 NAM_founders founder 2
W22 NAM_founders founder 2
RIL_001 NAM_RILs derived 2
RIL_002 NAM_RILs derived 2
YAML format (populations.yaml):
populations:
- name: NAM_founders
role: founder
ploidy: 2
samples:
- B73
- Mo17
- W22
- name: NAM_RILs
role: derived
ploidy: 2
samples:
- RIL_001
- RIL_002
Polyploid Examples
For polyploid species, define subgenomes in YAML:
populations:
- name: wheat_founders
role: founder
ploidy: 6
subgenomes:
- name: A
ploidy: 2
- name: B
ploidy: 2
- name: D
ploidy: 2
samples:
- Chinese_Spring
- Jagger
Configuration
Generate a configuration template:
haplophaser init-config -o haplophaser.yaml
Then customize and use:
haplophaser proportion variants.vcf.gz -p populations.tsv -c haplophaser.yaml
Python API
from haplophaser import Sample, Population, PopulationRole
from haplophaser.core.models import make_hexaploid_sample
from haplophaser.io import load_populations_yaml, VCFReader
# Create samples programmatically
b73 = Sample(name="B73", ploidy=2, population="founders")
# Create polyploid samples
wheat = make_hexaploid_sample("Chinese_Spring", ("A", "B", "D"), "founders")
# Load populations from file
populations = load_populations_yaml("populations.yaml")
# Read VCF files
with VCFReader("variants.vcf.gz") as reader:
for variant in reader.fetch("chr1", 0, 1_000_000):
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alt}")
Coordinate System
Haplophaser uses 0-based, half-open intervals (BED-style) internally:
- Position 0 is the first base
- Intervals are
[start, end)โ start is included, end is excluded
Conversion to/from 1-based systems (VCF, GFF) happens automatically during I/O.
Development
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=haplophaser --cov-report=html
# Run specific test file
pytest tests/test_models.py
Code Quality
# Lint and format check
ruff check src tests
# Format code
ruff format src tests
# Type checking
mypy src
Project Structure
haplophaser/
โโโ pyproject.toml # Package configuration
โโโ README.md
โโโ src/
โ โโโ haplophaser/
โ โโโ __init__.py # Package exports
โ โโโ core/
โ โ โโโ models.py # Data models (Sample, Variant, etc.)
โ โ โโโ config.py # Configuration system
โ โโโ io/
โ โ โโโ vcf.py # VCF reading
โ โ โโโ populations.py # Population file I/O
โ โโโ cli/
โ โโโ main.py # CLI commands
โโโ tests/
โ โโโ conftest.py # Test fixtures
โ โโโ test_models.py
โ โโโ test_config.py
โ โโโ test_populations.py
โโโ docs/
Roadmap
- Core data models with polyploid support
- Configuration system
- Population file I/O
- CLI skeleton
- VCF reading implementation
- Window-based analysis
- HMM-based haplotype inference
- Chromosome painting
- Proportion estimation
- Scaffold ordering
- Integration with chromoplot for visualization
- Expression bias analysis
- Subgenome dominance testing
Citation
If you use Haplophaser in your research, please cite:
Haplophaser: Haplotype analysis toolkit for complex genomes. (in preparation)
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file haplophaser-0.1.1.tar.gz.
File metadata
- Download URL: haplophaser-0.1.1.tar.gz
- Upload date:
- Size: 387.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfc6f99929b76893eb2fbe05c2af6c3163be5c9a0c4638051a0d92d6f40620bd
|
|
| MD5 |
20b0678c1b1f628683b40b764d4b9dac
|
|
| BLAKE2b-256 |
68ab79cc9f8f2578c5a084fd9dbb973eb933f69b00b7af7876051aa308a0891c
|
File details
Details for the file haplophaser-0.1.1-py3-none-any.whl.
File metadata
- Download URL: haplophaser-0.1.1-py3-none-any.whl
- Upload date:
- Size: 300.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
574dc6f50aa27f388feac1ff9efff42be5382c4b0485b2dc84ec75eaf98256b9
|
|
| MD5 |
1f287eb1b0bb15acf50f803b1b87270b
|
|
| BLAKE2b-256 |
64527881a32396d73653a067b3738adfdbaeeae5fef706c1ca795aab1a994534
|