Hidden Markov Model for calling chromatin footprints from fiber-seq and DAF-seq data

These details have not been verified by PyPI

Project links

Project description

FiberHMM

Hidden Markov Model toolkit for calling chromatin footprints from Fiber-seq, DAF-seq, and other single-molecule footprinting data.

FiberHMM identifies protected regions (footprints) and accessible regions (methylase-sensitive patches, MSPs) from single-molecule DNA modification data, including m6A methylation (fiber-seq) and deamination marks (DAF-seq).

Key Features

No genome context files -- hexamer context computed directly from read sequences
Fibertools-compatible output -- tagged BAM with ns/nl/as/al tags, ready for downstream tools
Native HMM implementation -- no hmmlearn dependency; Numba JIT optional for ~10x speedup
Streaming pipeline -- scales linearly with cores; supports stdin/stdout piping for composable workflows
Multi-platform -- supports PacBio fiber-seq, Nanopore fiber-seq, and DAF-seq
DAF-seq preprocessing -- fiberhmm-daf-encode converts plain aligned BAMs to IUPAC R/Y encoded BAMs
Legacy model support -- loads old hmmlearn-trained pickle/NPZ models seamlessly

Installation

Using pip

pip install fiberhmm

From source

git clone https://github.com/fiberseq/FiberHMM.git
cd FiberHMM
pip install -e .

Optional dependencies

pip install numba        # ~10x faster HMM computation
pip install matplotlib   # --stats visualization
pip install h5py         # HDF5 posteriors export

For bigBed output, install bedToBigBed from UCSC tools.

Quick Start

Fiber-seq (PacBio or Nanopore)

If you have a BAM with m6A modification tags (MM/ML), you can call footprints directly with a pre-trained model:

fiberhmm-apply -i experiment.bam -m models/hia5_pacbio.json -o output/ -c 8

DAF-seq

DAF-seq BAMs from minimap2 contain deamination events as C→T / G→A mismatches but lack the IUPAC encoding and strand tags that FiberHMM expects. Use fiberhmm-daf-encode to preprocess, then call footprints:

# Two-step
fiberhmm-daf-encode -i aligned.bam -o encoded.bam
fiberhmm-apply --mode daf -i encoded.bam -m models/ddda_pacbio.json -o output/ -c 8

# Or as a streaming pipeline (no intermediate file)
fiberhmm-daf-encode -i aligned.bam -o - | \
    fiberhmm-apply --mode daf -i - -m models/ddda_pacbio.json -o output/

If your BAM is missing MD tags (minimap2 wasn't run with --MD), add them first:

samtools calmd -b aligned.bam ref.fa | \
    fiberhmm-daf-encode -i - -o - | \
    fiberhmm-apply --mode daf -i - -m models/ddda_pacbio.json -o output/

Or pass the reference FASTA directly (slower):

fiberhmm-daf-encode -i aligned.bam -o encoded.bam --reference ref.fa

Extract to BED12/bigBed

fiberhmm-extract -i output/experiment_footprints.bam

Pre-trained Models

FiberHMM ships with pre-trained models in models/ ready for immediate use:

Model	File	Enzyme	Platform	Mode
Hia5 PacBio	`hia5_pacbio.json`	Hia5 (m6A)	PacBio	`pacbio-fiber`
Hia5 Nanopore	`hia5_nanopore.json`	Hia5 (m6A)	Nanopore	`nanopore-fiber`
DddA PacBio	`ddda_pacbio.json`	DddA (deamination)	PacBio	`daf`
DddB Nanopore	`dddb_nanopore.json`	DddB (deamination)	Nanopore	`daf`

Analysis Modes

Mode	Flag	Description	Target bases
PacBio fiber-seq	`--mode pacbio-fiber`	Default. m6A at A and T (both strands)	A, T (with RC)
Nanopore fiber-seq	`--mode nanopore-fiber`	m6A at A only (single strand)	A only
DAF-seq	`--mode daf`	Deamination at C/G (strand-specific)	C or G

Note on mode selection: The pacbio-fiber vs nanopore-fiber distinction only matters for Hia5 (m6A), where PacBio detects modifications on both strands while Nanopore detects only one. For deaminase-based methods (DddA, DddB), --mode daf is always used regardless of sequencing platform -- the chemistry is inherently strand-specific.

CLI Tools Reference

fiberhmm-daf-encode

Preprocess plain aligned DAF-seq BAMs for FiberHMM. Identifies C→T and G→A deamination mismatches via the MD tag, encodes them as IUPAC Y/R in the query sequence, and adds st:Z strand tags.

fiberhmm-daf-encode -i aligned.bam -o encoded.bam

Flag	Default	Description
`-i/--input`	required	Input BAM, or `-` for stdin
`-o/--output`	required	Output BAM, or `-` for stdout
`--reference`	none	Reference FASTA (fallback if MD tag missing)
`-q/--min-mapq`	20	Min mapping quality
`--min-read-length`	1000	Min aligned read length (bp)
`--io-threads`	4	htslib I/O threads
`--strand`	auto	Force strand: `CT`, `GA`, or `auto` (per-read consensus)

The encoder determines conversion strand per read by counting which mismatch type (C→T vs G→A) dominates. Reads with no mismatches or equal counts are skipped. All C→T or G→A mismatches are encoded (the HMM handles noise via emission probabilities). Existing tags (including MM/ML for dual-labeling) are preserved.

fiberhmm-apply

Apply a trained HMM to call footprints. Uses a streaming pipeline that scales with -c cores and supports stdin/stdout piping.

# Multi-core (streaming is the default)
fiberhmm-apply -i experiment.bam -m model.json -o output/ -c 8

# Stdin/stdout piping
fiberhmm-apply -i - -m model.json -o output/

Flag	Default	Description
`-i/--input`	required	Input BAM, or `-` for stdin
`-m/--model`	required	Trained HMM model (.json, .npz, or .pickle)
`-o/--outdir`	required	Output directory, or `-` for stdout BAM
`--mode`	from model	Analysis mode (see table above)
`-c/--cores`	1	CPU cores (0 = auto-detect)
`--io-threads`	4	htslib I/O threads
`-q/--min-mapq`	20	Min mapping quality
`--min-read-length`	1000	Min aligned read length (bp)
`-e/--edge-trim`	10	Edge masking (bp)
`--scores`	false	Compute per-footprint confidence scores
`--skip-scaffolds`	false	Skip scaffold/contig chromosomes
`--chroms`	all	Process only these chromosomes
`--msp-min-size`	60	Minimum MSP region size (bp)
`--primary`	false	Only process primary alignments

Read Filtering

By default, reads are skipped (written unchanged without footprint tags) when:

Reason	Default	Override
MAPQ too low	< 20	`--min-mapq 0`
Aligned length too short	< 1000 bp	`--min-read-length 0`
No MM/ML modification tags	--	Cannot override (no data to process)
Unmapped	--	Cannot override
No footprints detected	HMM found nothing	Cannot override

fiberhmm-probs

Generate emission probability tables from accessible and inaccessible control BAMs.

fiberhmm-probs \
    -a accessible_control.bam \
    -u inaccessible_control.bam \
    -o probs/ \
    --mode pacbio-fiber \
    -k 3 4 5 6 \
    --stats

fiberhmm-train

Train the HMM on BAM data using precomputed emission probabilities.

fiberhmm-train \
    -i sample.bam \
    -p probs/tables/accessible_A_k3.tsv probs/tables/inaccessible_A_k3.tsv \
    -o models/ \
    -k 3 \
    --stats

Output:

models/
├── best-model.json      # Primary format (recommended)
├── best-model.npz       # NumPy format
├── all_models.json      # All training iterations
├── training-reads.tsv   # Read IDs used for training
├── config.json          # Training parameters
└── plots/               # (with --stats)

fiberhmm-extract

Extract footprint/MSP/m6A/m5C features from tagged BAMs to BED12/bigBed.

fiberhmm-extract -i output/sample_footprints.bam -o output/ -c 8

# Extract only footprints
fiberhmm-extract -i output/sample_footprints.bam --footprint

# Keep BED files alongside bigBed
fiberhmm-extract -i output/sample_footprints.bam --keep-bed

fiberhmm-posteriors

Export per-position HMM posterior probabilities for downstream analysis (e.g., CNN training, custom scoring). Runs the HMM forward-backward algorithm on each read to produce P(footprint) at every position.

Input is the same BAM you would pass to fiberhmm-apply (any BAM with MM/ML modification tags or IUPAC-encoded DAF-seq reads). This is a parallel pipeline, not a downstream step.

# Export to gzipped TSV (no extra deps)
fiberhmm-posteriors -i experiment.bam -m model.json -o posteriors.tsv.gz -c 4

# Export to HDF5 (requires: pip install h5py)
fiberhmm-posteriors -i experiment.bam -m model.json -o posteriors.h5 -c 4

fiberhmm-utils

Consolidated utility for model and probability management.

convert -- Convert legacy pickle/NPZ models to JSON:

fiberhmm-utils convert old_model.pickle new_model.json

inspect -- Print model metadata, parameters, and emission statistics:

fiberhmm-utils inspect model.json
fiberhmm-utils inspect model.json --full   # full emission table

transfer -- Transfer emission probabilities between modalities (e.g., fiber-seq to DAF-seq) using accessibility priors from a matched cell type:

fiberhmm-utils transfer \
    --target daf_sample.bam \
    --reference-bam fiberseq_footprints.bam \
    -o daf_probs/ \
    --mode daf \
    --stats

adjust -- Scale emission probabilities in a model (clamped to [0, 1]):

fiberhmm-utils adjust model.json --state accessible --scale 1.1 -o adjusted.json

Output

BAM Tags

fiberhmm-apply adds footprint tags compatible with the fibertools ecosystem:

Tag	Type	Description
`ns`	B,I	Nucleosome/footprint starts (0-based query coords)
`nl`	B,I	Nucleosome/footprint lengths
`as`	B,I	Accessible/MSP starts
`al`	B,I	Accessible/MSP lengths
`nq`	B,C	Footprint quality scores (0-255, with `--scores`)
`aq`	B,C	MSP quality scores (0-255, with `--scores`)

For DAF-seq, fiberhmm-daf-encode also adds:

Tag	Type	Description
`st`	Z	Conversion strand: `CT` (+ strand, C→T) or `GA` (- strand, G→A)

Streaming Pipelines

All FiberHMM tools support - for stdin/stdout, enabling Unix-style piping with no intermediate files:

# DAF-seq: align → encode → call footprints
minimap2 --MD -a ref.fa reads.fq | samtools view -b | \
    fiberhmm-daf-encode -i - -o - | \
    fiberhmm-apply --mode daf -i - -m models/ddda_pacbio.json -o output/

# Fiber-seq: call footprints from stdin
samtools view -b -h input.bam chr1 | \
    fiberhmm-apply -i - -m model.json -o output/

When writing to stdout (-o -), the output is unsorted BAM. Sort and index downstream if needed:

fiberhmm-apply -i input.bam -m model.json -o - | \
    samtools sort -o sorted.bam && samtools index sorted.bam

Pipe FiberHMM footprints directly to ft fire for FIRE element calling with no intermediate files:

fiberhmm-apply -i input.bam -m model.json -o - -c 8 | \
    ft fire - output_fire.bam

When writing to a file, FiberHMM automatically sorts and indexes the output BAM.

Fibertools Integration

FiberHMM produces BAM output with the same tag conventions used by fibertools:

ns/nl for nucleosome footprints
as/al for methylase-sensitive patches (MSPs)
nq/aq for quality scores

This means FiberHMM output can be used directly with any tool in the fibertools ecosystem, including ft extract, downstream analysis pipelines, and genome browsers that support fibertools-style BAM tags.

Performance Tips

Use multiple cores: -c 8 (or more) for parallel processing
Use --io-threads: for BAM compression/decompression (default: 4)
Skip scaffolds: --skip-scaffolds to avoid thousands of small contigs
Install numba: pip install numba for ~10x faster HMM computation
Pipe directly: use -o - to pipe to downstream tools (e.g., ft fire) with no intermediate files

Model Formats

Format	Extension	Description
JSON	`.json`	Primary format -- portable, human-readable
NPZ	`.npz`	NumPy archive -- supported for loading
Pickle	`.pickle`	Legacy format -- supported for loading

New models are always saved in JSON. Convert legacy models with:

fiberhmm-utils convert old_model.pickle new_model.json

License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.10.5

Apr 24, 2026

2.10.4

Apr 24, 2026

2.10.3

Apr 23, 2026

2.10.2

Apr 23, 2026

2.10.1

Apr 23, 2026

2.10.0

Apr 22, 2026

2.9.7

Apr 22, 2026

2.9.0

Apr 17, 2026

2.8.2

Apr 17, 2026

2.8.1

Apr 17, 2026

2.8.0

Apr 16, 2026

2.7.1

Apr 16, 2026

2.7.0

Apr 16, 2026

2.6.9

Apr 16, 2026

2.6.8

Apr 16, 2026

2.6.7

Apr 16, 2026

2.6.6

Apr 16, 2026

2.6.5

Apr 14, 2026

2.6.4

Apr 14, 2026

2.6.3

Apr 13, 2026

2.6.2

Apr 13, 2026

2.6.1

Apr 13, 2026

2.6.0

Apr 13, 2026

This version

2.5.0

Apr 10, 2026

2.4.0

Apr 1, 2026

2.2.0

Apr 14, 2026

2.1.0

Mar 6, 2026

2.0.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fiberhmm-2.5.0.tar.gz (133.2 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fiberhmm-2.5.0-py3-none-any.whl (126.8 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file fiberhmm-2.5.0.tar.gz.

File metadata

Download URL: fiberhmm-2.5.0.tar.gz
Upload date: Apr 10, 2026
Size: 133.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for fiberhmm-2.5.0.tar.gz
Algorithm	Hash digest
SHA256	`4cc9a81974759988aaa2580280f499b9353edfd104daf09fe432a8fc0118c192`
MD5	`83ea66164a4262a40676d4c8e6a340c3`
BLAKE2b-256	`f0e0eefaf3a6c31bace57386c4cfcb54c27bc5476ca0f43dedf53425d23a8ac1`

See more details on using hashes here.

File details

Details for the file fiberhmm-2.5.0-py3-none-any.whl.

File metadata

Download URL: fiberhmm-2.5.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 126.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for fiberhmm-2.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`918efc6ced470f817cb3999e3bb93e5eb146193ccfc9ef24b23b5242f2a710ed`
MD5	`c4a1a5fbef35e5f7803f234ace941975`
BLAKE2b-256	`768c90b7d298f49b2da491071ea7685d70f43279fa74d24b92ee98b783667f1f`

See more details on using hashes here.

fiberhmm 2.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FiberHMM

Key Features

Installation

Using pip

From source

Optional dependencies

Quick Start

Fiber-seq (PacBio or Nanopore)

DAF-seq

Extract to BED12/bigBed

Pre-trained Models

Analysis Modes

CLI Tools Reference

fiberhmm-daf-encode

fiberhmm-apply

Read Filtering

fiberhmm-probs

fiberhmm-train

fiberhmm-extract

fiberhmm-posteriors

fiberhmm-utils

Output

BAM Tags

Streaming Pipelines

Fibertools Integration

Performance Tips

Model Formats

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes