Skip to main content

Population-aware, MST-based haplotype graph analysis and visualization in Python.

Project description

HapNet

HapNet is a lightweight Python command-line package for constructing population-aware, minimum-spanning-tree-based haplotype graphs from aligned FASTA files.

HapNet is designed for reproducible single-locus workflows. It collapses identical aligned sequences into haplotypes, calculates pairwise Hamming distances among haplotypes, constructs a minimum spanning tree (MST), plots a population-aware haplotype graph, and writes machine-readable TSV summaries describing haplotype composition, population membership, individual membership, shared/private haplotypes, and run metadata.

Important scope note: HapNet constructs an MST-based haplotype graph. It does not infer median-joining, statistical-parsimony, or reticulate haplotype networks.

Installation

pip install hapnet

To check the installed version:

hapnet --version

Basic usage

For a standard aligned FASTA file in which population identity is encoded in the sequence header, run:

hapnet input.fasta --out network.svg --log-prefix run1

This writes a network figure and a set of TSV output files using the prefix run1.

For a cleaner publication-style figure without haplotype labels inside nodes, use:

hapnet input.fasta --out network_unlabeled.svg --log-prefix run1 --hide-labels

Haplotype identities remain documented in the output tables even when labels are hidden from the figure.

Standard input format

By default, HapNet parses population identity from the final underscore-delimited token in each FASTA header.

Example:

>Ind01_NK
ACTGACTG
>Ind02_RI
ACTGATTA
>Ind03_RI
ACTGATTA

In this example, Ind01_NK is interpreted as individual or sequence Ind01 from population NK.

Run:

hapnet examples/basic/basic.fasta --out basic.svg --log-prefix basic

Phased diploid input

HapNet v0.2.0 adds optional support for already phased diploid sequences. HapNet does not infer phase. It preserves individual identity across phased allele copies generated by external software.

Header format in phased mode:

>Ind01_a_NK
ACTGACTG
>Ind01_b_NK
ACTGATTA
>Ind02_a_RI
ACTGACTG
>Ind02_b_RI
ACTGACTG

Run:

hapnet examples/phased/phased_example.fasta --phased --out phased.svg --log-prefix phased

This writes all standard HapNet output files plus an individual-level genotype table:

phased_individual_genotypes.tsv

This table summarizes the haplotypes carried by each phased individual and classifies individuals as homozygous or heterozygous with respect to the inferred haplotypes.

Metadata file option

Instead of encoding population and allele information directly in FASTA headers, users can provide a tab-delimited metadata file.

Example metadata file:

sequence_id	individual_id	allele	population
Ind01_a	Ind01	a	NK
Ind01_b	Ind01	b	NK
Ind02_a	Ind02	a	RI
Ind02_b	Ind02	b	RI

Run:

hapnet input.fasta --metadata metadata.tsv --phased --out network.svg --log-prefix run1

Metadata values override header parsing. This option is useful when FASTA headers do not follow the default underscore-delimited format.

Ambiguous and missing data

By default, ambiguous characters and gaps are treated as literal character states during Hamming-distance calculation. To ignore positions containing N, ?, -, or . in either sequence during pairwise comparisons, use:

hapnet input.fasta --ignore-ambiguous --out network.svg --log-prefix run1

This option may be useful for alignments containing missing data, ambiguous base calls, or gaps.

Figure-label options

By default, HapNet displays haplotype labels such as H1, H2, and H3 inside network nodes.

To hide haplotype labels for a cleaner publication figure:

hapnet input.fasta --out network_unlabeled.svg --log-prefix run1 --hide-labels

To include haplotype counts inside node labels:

hapnet input.fasta --out network_counts.svg --log-prefix run1 --show-counts-in-label

The figure-label options affect only the network image. Haplotype identities, sequence membership, and population membership are still recorded in the TSV output files.

Main output files

For --log-prefix run1, HapNet writes:

  • run1_haplotypes.tsv: haplotype IDs, frequencies, population composition, and sequences
  • run1_membership.tsv: sequence-to-haplotype membership
  • run1_shared_haplotypes.tsv: haplotypes shared among two or more populations
  • run1_haplotype_individuals.tsv: individuals represented in each haplotype
  • run1_summary.tsv: summary statistics, including number of sequences, haplotypes, shared haplotypes, private haplotypes, and largest haplotype size
  • run1_run_metadata.tsv: run metadata for reproducibility, including HapNet version, input file, output file, algorithm, distance metric, and parameter settings
  • run1_individual_genotypes.tsv: phased individual genotype table, written only when --phased is used

Scripted workflow example

HapNet can be integrated into a shell workflow that processes many aligned FASTA files. For example, the following loop generates one network figure and one set of output tables for every FASTA file in an alignments/ directory:

mkdir -p networks results

for fasta in alignments/*.fasta
do
    base=$(basename "$fasta" .fasta)

    hapnet "$fasta" \
      --out "networks/${base}_network.svg" \
      --log-prefix "results/${base}" \
      --hide-labels
done

This workflow is useful for comparative barcoding, phylogeographic, taxonomic, or teaching datasets where many aligned single-locus FASTA files need to be processed consistently.

Example: standard mitochondrial workflow

A typical mitochondrial COI workflow uses an aligned FASTA file with population identity encoded in the sequence headers.

hapnet examples/polydora_websteri/Pwebsteri_revised.fasta \
  --out examples/polydora_websteri/polydorawebsteri_network.svg \
  --log-prefix examples/polydora_websteri/polydorawebsteri

For a cleaner publication-style figure:

hapnet examples/polydora_websteri/Pwebsteri_revised.fasta \
  --out examples/polydora_websteri/polydorawebsteri_network_unlabeled.svg \
  --log-prefix examples/polydora_websteri/polydorawebsteri \
  --hide-labels

Command-line help

To see all available options:

hapnet --help

Citation

If you use HapNet, please cite the associated manuscript when available.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hapnet-0.2.1.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hapnet-0.2.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file hapnet-0.2.1.tar.gz.

File metadata

  • Download URL: hapnet-0.2.1.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for hapnet-0.2.1.tar.gz
Algorithm Hash digest
SHA256 7333362d548d3117a59c1954cabe0a3c5a2a7ffd8c2096f49c1c96a497933c20
MD5 1d4433742c9d29401b412f3755b95c49
BLAKE2b-256 a08252f7dc09d3505ec69530c907af482aff1dd3e6d29e84998e7bf9b07b71e7

See more details on using hashes here.

File details

Details for the file hapnet-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: hapnet-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for hapnet-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3b9cbfde9811f9adb554d91e37ab5ad94472782664ea1f445f5650edcb9cc6c
MD5 a07f46fcb210857c288ef2e8820c74ee
BLAKE2b-256 83f28883ed552821903a89da7cfc60eb77c62eee581da6bc04c56106ef991661

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page