Population-aware, MST-based haplotype graph analysis and visualization in Python.
Project description
HapNet
HapNet is a lightweight Python command-line package for constructing population-aware, minimum-spanning-tree-based haplotype graphs from aligned FASTA files.
HapNet is designed for reproducible single-locus workflows. It collapses identical aligned sequences into haplotypes, calculates pairwise Hamming distances among haplotypes, constructs a minimum spanning tree (MST), plots a population-aware haplotype graph, and writes machine-readable TSV summaries describing haplotype composition, population membership, individual membership, shared/private haplotypes, and run metadata.
Important scope note: HapNet constructs an MST-based haplotype graph. It does not infer median-joining, statistical-parsimony, or reticulate haplotype networks.
Installation
pip install hapnet
To check the installed version:
hapnet --version
Basic usage
For a standard aligned FASTA file in which population identity is encoded in the sequence header, run:
hapnet input.fasta --out network.svg --log-prefix run1
This writes a network figure and a set of TSV output files using the prefix run1.
For a cleaner publication-style figure without haplotype labels inside nodes, use:
hapnet input.fasta --out network_unlabeled.svg --log-prefix run1 --hide-labels
Haplotype identities remain documented in the output tables even when labels are hidden from the figure.
Standard input format
By default, HapNet parses population identity from the final underscore-delimited token in each FASTA header.
Example:
>Ind01_NK
ACTGACTG
>Ind02_RI
ACTGATTA
>Ind03_RI
ACTGATTA
In this example, Ind01_NK is interpreted as individual or sequence Ind01 from population NK.
Run:
hapnet examples/basic/basic.fasta --out basic.svg --log-prefix basic
Phased diploid input
HapNet v0.2.0 adds optional support for already phased diploid sequences. HapNet does not infer phase. It preserves individual identity across phased allele copies generated by external software.
Header format in phased mode:
>Ind01_a_NK
ACTGACTG
>Ind01_b_NK
ACTGATTA
>Ind02_a_RI
ACTGACTG
>Ind02_b_RI
ACTGACTG
Run:
hapnet examples/phased/phased_example.fasta --phased --out phased.svg --log-prefix phased
This writes all standard HapNet output files plus an individual-level genotype table:
phased_individual_genotypes.tsv
This table summarizes the haplotypes carried by each phased individual and classifies individuals as homozygous or heterozygous with respect to the inferred haplotypes.
Metadata file option
Instead of encoding population and allele information directly in FASTA headers, users can provide a tab-delimited metadata file.
Example metadata file:
sequence_id individual_id allele population
Ind01_a Ind01 a NK
Ind01_b Ind01 b NK
Ind02_a Ind02 a RI
Ind02_b Ind02 b RI
Run:
hapnet input.fasta --metadata metadata.tsv --phased --out network.svg --log-prefix run1
Metadata values override header parsing. This option is useful when FASTA headers do not follow the default underscore-delimited format.
Ambiguous and missing data
By default, ambiguous characters and gaps are treated as literal character states during Hamming-distance calculation. To ignore positions containing N, ?, -, or . in either sequence during pairwise comparisons, use:
hapnet input.fasta --ignore-ambiguous --out network.svg --log-prefix run1
This option may be useful for alignments containing missing data, ambiguous base calls, or gaps.
Figure-label options
By default, HapNet displays haplotype labels such as H1, H2, and H3 inside network nodes.
To hide haplotype labels for a cleaner publication figure:
hapnet input.fasta --out network_unlabeled.svg --log-prefix run1 --hide-labels
To include haplotype counts inside node labels:
hapnet input.fasta --out network_counts.svg --log-prefix run1 --show-counts-in-label
The figure-label options affect only the network image. Haplotype identities, sequence membership, and population membership are still recorded in the TSV output files.
Main output files
For --log-prefix run1, HapNet writes:
run1_haplotypes.tsv: haplotype IDs, frequencies, population composition, and sequencesrun1_membership.tsv: sequence-to-haplotype membershiprun1_shared_haplotypes.tsv: haplotypes shared among two or more populationsrun1_haplotype_individuals.tsv: individuals represented in each haplotyperun1_summary.tsv: summary statistics, including number of sequences, haplotypes, shared haplotypes, private haplotypes, and largest haplotype sizerun1_run_metadata.tsv: run metadata for reproducibility, including HapNet version, input file, output file, algorithm, distance metric, and parameter settingsrun1_individual_genotypes.tsv: phased individual genotype table, written only when--phasedis used
Scripted workflow example
HapNet can be integrated into a shell workflow that processes many aligned FASTA files. For example, the following loop generates one network figure and one set of output tables for every FASTA file in an alignments/ directory:
mkdir -p networks results
for fasta in alignments/*.fasta
do
base=$(basename "$fasta" .fasta)
hapnet "$fasta" \
--out "networks/${base}_network.svg" \
--log-prefix "results/${base}" \
--hide-labels
done
This workflow is useful for comparative barcoding, phylogeographic, taxonomic, or teaching datasets where many aligned single-locus FASTA files need to be processed consistently.
Example: standard mitochondrial workflow
A typical mitochondrial COI workflow uses an aligned FASTA file with population identity encoded in the sequence headers.
hapnet examples/polydora_websteri/Pwebsteri_revised.fasta \
--out examples/polydora_websteri/polydorawebsteri_network.svg \
--log-prefix examples/polydora_websteri/polydorawebsteri
For a cleaner publication-style figure:
hapnet examples/polydora_websteri/Pwebsteri_revised.fasta \
--out examples/polydora_websteri/polydorawebsteri_network_unlabeled.svg \
--log-prefix examples/polydora_websteri/polydorawebsteri \
--hide-labels
Command-line help
To see all available options:
hapnet --help
Citation
If you use HapNet, please cite the associated manuscript when available.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hapnet-0.2.1.tar.gz.
File metadata
- Download URL: hapnet-0.2.1.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7333362d548d3117a59c1954cabe0a3c5a2a7ffd8c2096f49c1c96a497933c20
|
|
| MD5 |
1d4433742c9d29401b412f3755b95c49
|
|
| BLAKE2b-256 |
a08252f7dc09d3505ec69530c907af482aff1dd3e6d29e84998e7bf9b07b71e7
|
File details
Details for the file hapnet-0.2.1-py3-none-any.whl.
File metadata
- Download URL: hapnet-0.2.1-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3b9cbfde9811f9adb554d91e37ab5ad94472782664ea1f445f5650edcb9cc6c
|
|
| MD5 |
a07f46fcb210857c288ef2e8820c74ee
|
|
| BLAKE2b-256 |
83f28883ed552821903a89da7cfc60eb77c62eee581da6bc04c56106ef991661
|