Skip to main content

Comprehensive CLI for Oxford Nanopore end_reason analysis: discover, tag, filter, analyze, visualize.

Project description

🧬 ont-end-reason

Comprehensive CLI for Oxford Nanopore end_reason analysis. Discover, tag, filter, analyse, and visualise read-termination patterns.

CI Docs Python 3.10+ License: MIT Tests Coverage Version

🚀 → Interactive dashboard & tutorials

Companion to the end-reason paper.


Table of contents


Why this tool

Oxford Nanopore sequencers tag every read with an end_reason explaining why sequencing stopped. A read can have high base quality (Q>20) and still be truncated or rejected by adaptive sampling — filtering by Q-score alone is not enough for accurate downstream analysis.

ont-end-reason unifies the eight published analyses from the end-reason paper into a single PyPI-installable CLI, including the paper's novel posterior length model for adaptive-sampling-truncated reads.

Before this tool, the analyses lived in scattered scripts inside End_Reason_Manuscript/pipeline/bin/ (now archived). Every script was promoted to this repo with provenance headers crediting commit b47166a of the source. The package is the canonical implementation going forward.


Install

# Static figures only (matplotlib)
pip install ont-end-reason

# + Plotly for interactive HTML reports
pip install "ont-end-reason[interactive]"

# Development (from source)
git clone https://github.com/Single-Molecule-Sequencing/ont-end-reason.git
cd ont-end-reason
pip install -e ".[dev,interactive]"

Python 3.10+ required. Tested on Linux + macOS, Python 3.10 through 3.13.


Quickstart

Five commands cover the canonical pipeline:

# 1. Inventory what's in a sequencing-run directory
ont-end-reason discover /path/to/run --manifest run.json

# 2. Tag a BAM with end_reason from sequencing_summary.txt
ont-end-reason tag --summary sequencing_summary.txt \
                   --bam aligned.bam --out tagged.bam

# 3. Filter to complete reads only (signal_positive)
ont-end-reason filter --bam tagged.bam --keep SP --out complete.bam

# 4. Run the paper's central novel analysis
ont-end-reason analyze umc-posterior sequencing_summary.txt --plot umc.pdf

# 5. Build a self-contained 6-section HTML report
ont-end-reason report interactive sequencing_summary.txt --out report.html

→ Full walkthrough with live charts on the dashboard.


The headline result

On the synthetic 5000-read test fixture:

$ ont-end-reason analyze umc-posterior tests/fixtures/sequencing_summary_synthetic.txt
UMC reads:              600
Prior class:            signal_positive  (log μ=8.488, log σ=0.600)
Observed mean length:        926.2 bp
Posterior expected mean:    5868.1 bp
Posterior bonus mean:       4941.9 bp/read
Posterior bonus total:       2,965,111 bp     ← ~3 Mb of unobserved sequence

Adaptive-sampling truncation hides ~5× more sequence than the observed read length suggests. Scaled to a real PromethION run with millions of UMC reads, the recovered-sequence estimate grows linearly. This is exactly what the paper's central analysis is for — and the tool surfaces it as one command on any sequencing_summary.txt.

UMC posterior


CLI surface

Discovery + filter operations

Command Purpose
ont-end-reason discover <path> Walk a directory, inventory POD5 / Fast5 / summary / BAM / FASTQ files
ont-end-reason tag Add end_reason tag to BAM reads from sequencing_summary.txt
ont-end-reason filter Keep / drop BAM reads by end_reason short code
ont-end-reason export-fastq Convert filtered BAM → FASTQ for NanoPack tools
ont-end-reason stats Streaming QC summary from sequencing_summary.txt

Analysis (9 subcommands)

Command What it does
ont-end-reason analyze distribution Per-end_reason counts + OK/CHECK/FAIL quality gate
ont-end-reason analyze length Length distributions per end_reason (N50, percentiles)
ont-end-reason analyze quality Q-score distributions with Gaussian Mixture Model fit
ont-end-reason analyze temporal End_reason rates over sequencing-run time
ont-end-reason analyze hypothesis Mann-Whitney U / KS tests with Cliff's Δ effect size
ont-end-reason analyze umc-posterior Bayesian posterior on truncated UMC length (paper's central analysis)
ont-end-reason analyze signal-trace Raw POD5 current trace extraction for a single read
ont-end-reason analyze sma-metrics Optional bridge to the smaseq-qc package
ont-end-reason analyze tables Generate summary/per-class/quality tables (TSV/CSV/md/LaTeX)

Paper-figure reproducers + reports

Command Output
ont-end-reason figure fig3 <source> Paper Figure 3 — distribution bar chart
ont-end-reason figure fig5 <source> Paper Figure 5 — Q-score violins
ont-end-reason figure fig6 <source> Paper Figure 6 — UMC posterior diagram
ont-end-reason report interactive 6-section self-contained HTML report with embedded Plotly
ont-end-reason report static Paginated PDF report (v0.3.0 roadmap)

Run ont-end-reason <cmd> --help for full flag documentation. Examples and screenshots: dashboard.


Python API

Every CLI subcommand has a public Python API equivalent. Functions return typed dataclasses so callers can compose, persist, or pipe results without re-parsing CLI output:

from ont_end_reason import discover, classify
from ont_end_reason.analyze.distribution import distribution
from ont_end_reason.analyze.umc_posterior import umc_posterior
from ont_end_reason.viz.static import plot_umc_posterior

# Discovery → Manifest
manifest = discover("/path/to/sequencing_run")
print(f"Found {manifest.total_files()} files")

# Analysis → typed result
result = umc_posterior("sequencing_summary.txt")
print(f"Posterior bonus total: {result.posterior_bonus_total:,.0f} bp")

# Visualisation → matplotlib Figure
fig = plot_umc_posterior(result)
fig.savefig("umc.pdf")

Each analysis result has a .to_dict() for JSON serialisation and roundtrip.


End_reason taxonomy

The lab's canonical 7-class taxonomy. Print from the CLI any time with ont-end-reason codes:

Code Full name Class Action
SP signal_positive keep Complete read — always keep
UMC unblock_mux_change truncated Filter unless studying artifacts
MC mux_change truncated Filter
DUMC data_service_unblock_mux_change truncated Filter (software-triggered)
PART partial truncated Filter
SN signal_negative failed Always filter
UNK unknown unknown Investigate distribution

--keep SP is the canonical recommendation (Table 1 of end-reason-paper). Use --keep SP,UMC to retain truncated reads for artifact studies.


How the UMC posterior works

The paper's novel analytic contribution, in one paragraph:

Given an observed UMC read of length o, the molecule's true length L is unknown but at least o (it was truncated, not foreshortened). Fitting a lognormal prior L ~ Lognormal(μ, σ²) to signal_positive reads gives the prior on what completed reads look like; the posterior on a UMC read's true length is then the prior left-truncated at the observation:

P(L | L ≥ o)  ∝  Lognormal(L; μ, σ²) · 𝟙[L ≥ o]

The truncated mean has a closed form via the normal CDF's Mills ratio:

E[L | L ≥ o]  =  exp(μ + σ²/2) · Φ(σ - z) / (1 - F(o))    where  z = (log o − μ)/σ

Implementation: scipy.stats.lognorm, vectorised over all UMC reads. O(n). Aggregated, this is the paper's headline "sequence lost to adaptive sampling" estimate — runnable on any sequencing_summary.txt with one command.


Testing

pytest                       # 143 tests, ~10s
pytest --cov=ont_end_reason  # with coverage (currently 63%)
ruff check .                 # lint
mypy src/ont_end_reason      # type-check

Coverage gate is 60% in CI; target is 70% in v0.3.0 once filter/ is exercised with a real BAM fixture (issue #7).

Tests run against:

  • Synthetic fixture (5000 reads, deterministic distributions) for every analysis
  • Hypothesis property tests for the SP/UMC/MC taxonomy (round-trips, classification disjointness)
  • CliRunner integration tests for every subcommand's --help and dispatch

Lab infrastructure integration

ont-end-reason is part of the Single-Molecule-Sequencing org's analytic toolchain:

Repo How it integrates
end-reason-paper Companion paper. Claim atoms (results.alignment_rate_filtered, results.snv_f1_filtered, etc.) pin to this tool for reproducibility.
ont-ecosystem Lab Claude Code skills /end-reason and /end-reason-filter will become thin wrappers that pip install ont-end-reason (tracked in issue #6).
lab-onboarding Bundled in the canonical lab-repo manifest. Cloned automatically by bash wsl/bootstrap.sh on every new lab device.
End_Reason_Manuscript Archived. Each script in this repo carries a provenance header crediting commit b47166a of that source.
smaseq-qc Optional dependency for analyze sma-metrics. Tool detects-and-skips when missing.

Status / roadmap

Current: v0.2.0a1 (alpha)

  • ✅ 9 analysis subcommands fully implemented
  • ✅ Bayesian posterior model for UMC truncation (paper's central novel analysis)
  • ✅ Interactive HTML reports with embedded Plotly
  • ✅ 143 tests, CI matrix on Python 3.10–3.13 × Ubuntu/macOS
  • Interactive dashboard with live examples
  • 🚧 Reproducibility CI against end-reason-paper claim atoms (#4)
  • 🚧 Parallel sharded BAM filtering (#5)
  • 🚧 Lab-skill thin-wrap migration after PyPI release (#6)
  • ⏳ conda-forge feedstock (post-v0.1.0 PyPI)

See CHANGELOG.md for per-release detail and open issues for roadmap items.


Citing

If you use ont-end-reason in published work, please cite the companion paper:

Athey BD et al. (in preparation). End reason filtering for accurate analysis
of Oxford Nanopore sequencing data. Single-Molecule-Sequencing Lab,
University of Michigan.
https://github.com/Single-Molecule-Sequencing/end-reason-paper

Machine-readable citation metadata is in CITATION.cff.


License

MIT — see LICENSE.


Built by the Athey Lab at the University of Michigan.

Dashboard · Issues · CHANGELOG · Design spec

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ont_end_reason-0.2.0a1.tar.gz (54.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ont_end_reason-0.2.0a1-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file ont_end_reason-0.2.0a1.tar.gz.

File metadata

  • Download URL: ont_end_reason-0.2.0a1.tar.gz
  • Upload date:
  • Size: 54.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ont_end_reason-0.2.0a1.tar.gz
Algorithm Hash digest
SHA256 3e5b9898b630284a1344d8f9a12549de3fc425570bd8f436eb69c37602fc08b3
MD5 6f5dbe826735d3c113f2b4ef2356148c
BLAKE2b-256 5136a9a82e30664e700423a523bdce20554bdfd8f2a30aaa91f690ad9e0c4a37

See more details on using hashes here.

Provenance

The following attestation bundles were made for ont_end_reason-0.2.0a1.tar.gz:

Publisher: release.yml on Single-Molecule-Sequencing/ont-end-reason

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ont_end_reason-0.2.0a1-py3-none-any.whl.

File metadata

File hashes

Hashes for ont_end_reason-0.2.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f9969c1e1f38de48b4132ca9f66de43dd33740db28a51a8d7e11d6de4efcd1e
MD5 29361431919b14d187381e2e6062b8bc
BLAKE2b-256 333db20cd0b683ebca38f6c28d507781fa2bb3fb46e5f16b1c0ff57c5111d8c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ont_end_reason-0.2.0a1-py3-none-any.whl:

Publisher: release.yml on Single-Molecule-Sequencing/ont-end-reason

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page