Population-aware, MST-based haplotype graph analysis and visualization in Python.
Project description
HapNet
HapNet is a lightweight Python command-line package for constructing population-aware, minimum-spanning-tree-based haplotype graphs from aligned FASTA files.
HapNet is designed for reproducible single-locus workflows. It collapses identical aligned sequences into haplotypes, calculates pairwise Hamming distances among haplotypes, constructs a minimum spanning tree (MST), plots a population-aware haplotype graph, and writes machine-readable TSV summaries.
Important scope note: HapNet constructs an MST-based haplotype graph. It does not infer median-joining, statistical-parsimony, or reticulate haplotype networks.
Installation
pip install hapnet
Standard input format
By default, population identity is parsed from the last underscore-delimited FASTA header token:
>Ind01_NK
ACTGACTG
>Ind02_RI
ACTGATTA
Run:
hapnet examples/basic/basic.fasta --out basic.svg --log-prefix basic
Phased diploid input
HapNet v0.2.0 adds optional support for already phased diploid sequences. HapNet does not infer phase; it preserves individual identity across phased allele copies that were generated by external software.
Header format in phased mode:
>Ind01_a_NK
ACTGACTG
>Ind01_b_NK
ACTGATTA
Run:
hapnet examples/phased/phased_example.fasta --phased --out phased.svg --log-prefix phased
This writes an additional individual-level genotype table:
phased_individual_genotypes.tsv
Metadata file option
Instead of encoding population and allele information in headers, users can provide a tab-delimited metadata file:
sequence_id individual_id allele population
Ind01_a Ind01 a NK
Ind01_b Ind01 b NK
Run:
hapnet input.fasta --metadata metadata.tsv --phased --out network.svg --log-prefix run1
Main output files
For --log-prefix run1, HapNet writes:
run1_haplotypes.tsv: haplotype IDs, frequencies, populations, and sequencesrun1_membership.tsv: sequence-to-haplotype membershiprun1_shared_haplotypes.tsv: haplotypes shared among populationsrun1_haplotype_individuals.tsv: individuals represented in each haplotyperun1_summary.tsv: summary statisticsrun1_run_metadata.tsv: run metadata for reproducibilityrun1_individual_genotypes.tsv: phased individual genotype table, written only with--phased
Ambiguous and missing data
By default, ambiguous characters and gaps are treated as literal character states during Hamming-distance calculation. To ignore positions containing N, ?, -, or . in either sequence during pairwise comparisons, use:
hapnet input.fasta --ignore-ambiguous --out network.svg --log-prefix run1
Scripted workflow example
HapNet can be integrated into a shell workflow that processes many aligned FASTA files:
for fasta in alignments/*.fasta
do
base=$(basename "$fasta" .fasta)
hapnet "$fasta" \
--out "networks/${base}.svg" \
--log-prefix "results/${base}"
done
Citation
If you use HapNet, please cite the associated manuscript when available.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hapnet-0.2.0.tar.gz.
File metadata
- Download URL: hapnet-0.2.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bda3e0e97c14de08f6eaf8465e0b7183b6f5190a35426f4a136a472fb99c90ef
|
|
| MD5 |
110c93f8453e0f110cee5083b5286c83
|
|
| BLAKE2b-256 |
6e8396f9350a29664a135d582d8d425ee6a76258572b4d03dcbdc7c1acd9596a
|
File details
Details for the file hapnet-0.2.0-py3-none-any.whl.
File metadata
- Download URL: hapnet-0.2.0-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e190db3ad95e56727b37b35a6ce06460b31c6ed6c1105afd2d4d7c61e526f79
|
|
| MD5 |
3ba9b9974d126c62808ba8fe7267b1ef
|
|
| BLAKE2b-256 |
46f660c12cf5a3d0977d8056f7970519be95b36c74dcf752fce99cf64848212b
|