Skip to main content

Calculate critical bandwidth for bimodal distributions in kernel density estimation

Project description

critband — Critical Bandwidth Analysis of Multimodal Distributions

PyPI version Python versions arXiv License

critband is the release name for v0.2.3 on GitHub and PyPI.

critband is a Python package for analyzing critical bandwidth and related multimodality structure in kernel density estimation (KDE).

It finds the smallest bandwidth where a KDE transitions from k-modal to fewer modes, and it also exposes related summaries such as bimodality_strength() and excess_mass().

Paper: Zhang, R., & Wang, Q. (2026). critband: Critical Bandwidth Analysis for Multimodal Distributions in Python. arXiv:2605.18686. https://arxiv.org/abs/2605.18686

Quick Start

pip install critband
import numpy as np
from critband import critical_bandwidth

# Generate bimodal data
x = np.concatenate([np.random.normal(-2, 0.5, 200),
                    np.random.normal( 2, 0.5, 200)])

h_crit, success = critical_bandwidth(x)
print(f"Critical bandwidth: {h_crit:.4f}")
print(f"Converged: {success}")
# Output: Critical bandwidth: ~0.97
#         Converged: True

Features

Core Algorithm

Function Description
silverman_bandwidth(x) Silverman's rule-of-thumb bandwidth: 1.06 · min(σ, IQR/1.34) · n^(-1/5)
gaussian_kde(x, grid, h, use_fft=None) KDE via direct O(n·g) or FFT O(g·log(g)) computation; auto-selects FFT for n > 5000
critical_bandwidth(x, k=2, return_ci=False) Critical bandwidth for mode-count detection; auto/binary/brent methods; optionally returns bootstrap CI
find_modes(x, h) Detect all KDE modes at bandwidth h; returns ModeResult with per-mode position, height, width, prominence
find_trough(x, h) Find valley between the two highest KDE peaks; returns x-coordinate or None
detect_components(x) Decompose bimodal distribution into two Gaussian components; returns BimodalDecomposition
bimodality_strength(x) Comprehensive bimodality assessment; returns interpretable strength label + score in BimodalityStrength
silverman_test(x) Silverman's bootstrap test for bimodality; returns SilvermanTestResult dataclass
excess_mass(x) Müller & Sawitzki excess mass test for multimodality; detects any number of modes, returns ExcessMassResult

The critical_bandwidth function uses automatic method selection: Brent's method for well-separated data (converges in fewer iterations), binary search for weak/small-n cases. All three methods (auto, binary, brent) produce consistent results. Hartigan's dip test is included as a complementary unimodality check, not as a core differentiator of the package.

Methodology

Critical Bandwidth

The critical bandwidth is the smallest bandwidth $h_{\text{crit}}$ at which the kernel density estimate $\hat{f}(x; h)$ transitions from being bimodal to unimodal. When $h < h_{\text{crit}}$ the estimate has two modes; for $h \ge h_{\text{crit}}$ it is unimodal. A large critical bandwidth relative to the data scale indicates strong bimodality — a wide smoothing window is needed to merge the two modes.

Silverman's Rule of Thumb

Silverman (1986) proposed a reference bandwidth for Gaussian KDE:

$$h = 1.06 \cdot \min(\sigma, \text{IQR} / 1.34) \cdot n^{-1/5}$$

This is used as the baseline for the critical bandwidth search: the solver searches for $h_{\text{crit}}$ in the interval $[h/20, 10h]$, which captures the transition for typical bimodal distributions.

Critical Bandwidth Solver

The critical bandwidth is found via binary search on KDE mode counts (for any k). The solver:

  1. Automatically computes bounds from Silverman's rule: $[h_{\text{silverman}}/20, 10 \cdot h_{\text{silverman}}]$
  2. Uses method="auto" (default): Brent's method for large, well-separated data (fewer iterations); binary search for weak/small-n cases
  3. For k > 2, critical_bandwidth(x, k=3) finds the bandwidth where trimodality disappears, etc.
  4. Optionally returns a bootstrap confidence interval via return_ci=True

The dip_ratio (via _trough_ratio) is available as a descriptive bimodality strength measure but is not used as the optimizer objective. Hartigan's dip test is included as a complementary unimodality check, not as a core differentiator of the package.

Excess Mass Test

The excess_mass function implements the Müller & Sawitzki (1991) test for multimodality. It measures the amount of probability mass above threshold levels in a KDE, and uses bootstrap calibration to estimate the number of modes — unlike the Silverman test, it can detect any number of modes in a single call.

Related Work

  • Silverman (1981) — The critical bandwidth test for multimodality in kernel density estimates
  • Hartigan & Hartigan (1985) — The Dip Test of Unimodality, a complementary non-parametric approach
  • Müller & Sawitzki (1991) — Excess mass estimates and tests for multimodality
  • Hall & York (2001) — On the calibration of Silverman's test for multimodality, refinements to the original method

References

  1. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall.
  2. Hartigan, J.A. & Hartigan, P.M. (1985). The Dip Test of Unimodality. The Annals of Statistics, 13(1), 70-84.
  3. Müller, D.W. & Sawitzki, G. (1991). Excess Mass Estimates and Tests for Multimodality. JASA, 86(415), 738-746.
  4. Cheng, M.-Y. & Hall, P. (1998). Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society: Series B, 60(3), 579-589.

Installation

# From PyPI
pip install critband

# Or using uv
uv add critband

# Development
git clone https://github.com/ryZhangHason/Polarization-CBW.git
cd Polarization-CBW
uv sync
uv run python -m pytest tests/ -v

Requirements

  • Python >= 3.10
  • Runtime: numpy, scipy, openpyxl, xlrd, python-docx, pdfplumber
  • Test only (not needed at runtime): pytest, pytest-cov, xlwt, reportlab
  • Visualization (optional): matplotlib >= 3.5

Current validation status

WS1 calibration has been run on the fixed design grid and is kept as a validation artifact only. The repository now passes the strict full-suite acceptance run, but WS1 remains descriptive rather than threshold-setting.

Why critband?

  • Critical bandwidth analysis: One function call computes the critical bandwidth for mode-count structure, and you can use bimodality_strength() as a descriptive score or excess_mass() to estimate how many modes your data has.
  • Mode-count detection: critical_bandwidth(x, k=3) detects the bandwidth where trimodality disappears and generalizes the mode-count parameter.
  • Any-file input: 9 formats from a single API. CSV, Excel, PDF, Word, JSON, HTML, Markdown — just point read_data at the file and go.
  • One-command install: pip install critband installs everything. No system packages, no manual steps, no Tesseract OCR.
  • Pure Python dependencies: All 4 additional libraries (openpyxl, xlrd, python-docx, pdfplumber) are pure Python — no compiled extensions.
  • Built for production and research: Works in CLI scripts and Jupyter notebooks equally well.

Example: Bimodality Test

import numpy as np
from critband import critical_bandwidth, bimodality_strength, excess_mass

# Unimodal data — the function tells you it's already unimodal
unimodal = np.random.normal(0, 1, 500)
h_crit, success = critical_bandwidth(unimodal)
print(f"Unimodal: h_crit={h_crit:.4f}, converged={success}")
# → success may be False (already unimodal at minimum bandwidth)

# Bimodal data — full pipeline
bimodal = np.concatenate([
    np.random.normal(-3, 0.5, 300),
    np.random.normal( 3, 0.5, 300)
])

h_crit, success = critical_bandwidth(bimodal)
print(f"Bimodal: h_crit={h_crit:.4f}, converged={success}")

# Interpretable strength assessment
s = bimodality_strength(bimodal)
print(f"Strength: {s.strength} (score={s.strength_score:.2f})")

# Excess mass test — detects number of modes
e = excess_mass(bimodal, n_boot=199)
print(f"Estimated modes: {e.n_modes_estimated}")

# Trimodal data with mode-count detection
trimodal = np.concatenate([
    np.random.normal(-4, 0.3, 150),
    np.random.normal( 0, 0.3, 150),
    np.random.normal( 4, 0.3, 150),
])
h_k3, ok = critical_bandwidth(trimodal, k=3)
print(f"Trimodal (k=3): h_crit={h_k3:.4f}")

Visualization Examples

Generate 3 matplotlib figures that illustrate the core algorithms:

uv sync --group viz
uv run python examples/bandwidth_demo.py

The demo uses well-separated bimodal data from the benchmark suite and produces:

  • kde_bandwidth_sweep.png — 2×2 subplots showing KDE at 4 bandwidths (too small, Silverman, critical, too large), with peaks and troughs marked.
  • critical_bandwidth_transition.png — Three overlaid KDE curves: bimodal (h_crit × 0.85), critical (h_crit), and unimodal (h_crit × 1.15), with the trough position annotated.
  • component_decomposition.png — Histogram, KDE, detected Gaussian components (scaled by weight), and the trough-based separation point.

matplotlib is an optional dependency — the core package does not require it.

Jupyter Notebook Demo

An interactive Jupyter Notebook is available for hands-on exploration:

uv sync --group viz --group jupyter
uv run jupyter notebook examples/bandwidth_demo.ipynb

The notebook contains 10 cells that walk through the same three visualizations with explanatory markdown, and includes an extension exercise that compares dip ratios across all 7 benchmark cases.

Performance Benchmarking

Measure execution time of core functions across data sizes:

uv run python examples/benchmark_performance.py

Customize the number of runs and data sizes:

uv run python examples/benchmark_performance.py --runs 10 --sizes 100 500 2000
uv run python examples/benchmark_performance.py --output results.csv

Historical Validation: R Package Comparison

This section records validation against R's multimode package across 12 benchmark cases.

Feature Comparison

Feature critband multimode
Critical bandwidth ✅ (k-based mode-count test, k >= 2) ✅ (mod0-based mode-count test, mod0 >= 1)
Mode-count detection ✅ (threshold parameter k) ✅ (threshold parameter mod0)
Bimodality strength ✅ (interpretable)
Excess mass test
Silverman's bootstrap test
Component decomposition
9-format I/O
Dependencies Pure Python R + compiled

Quantitative Results ($h_{\text{crit}}$ Accuracy)

Case n critband h_crit R modetest p Agreement
Well-separated 400 1.8650 0.000 ✅ Both detect bimodality
Moderate separation 500 1.0964 0.000 ✅ Both detect bimodality
Barely separated 600 0.2791 0.251 ✅ critband flags weak bimodality, R agrees (n.s.)
Unequal variance 400 1.7849 0.000 ✅ Both detect bimodality
Unequal weights 500 1.2591 0.000 ✅ critband correct; R yields spurious 772 modes
Extreme separation 400 4.6987 0.000 ✅ Both detect bimodality
Trimodal 450 1.3824 0.000 ✅ critband finds 3 modes; R detects multimodality
Skewed bimodal 500 1.1417 0.000 ✅ Both detect bimodality
Wide-component bimodal 400 2.7109 0.000 ✅ Both detect bimodality
Near unimodal 600 0.4186 0.055 ✅ critband flags weak; R agrees (n.s.)
Small sample bimodal 60 1.8608 0.000 ✅ Both detect bimodality (small n)
Overlapping variances 500 0.4598 0.045 ✅ critband flags weak; R agrees (n.s.)

Across these benchmark cases, critband achieves <0.5% mean absolute relative error vs high-precision reference values.

Performance Comparison (median runtime per case, Apple M1 chip)

Operation critband R (multimode) Comparison
critical_bandwidth 0.04–0.79 s 0.82–1.58 s 3–10× difference
find_modes (mode counting) <0.01 s 0.03–0.05 s Comparable

The runtime difference reflects critband's pure-Python numerical stack (NumPy/SciPy) and adaptive grid sizing, compared to R's compiled package dispatch overhead with bootstrap calibration (modetest(B = 199)). These timing numbers were collected on Apple M1-class Apple Silicon hardware.

Multi-Format Data Loading

Read data from 9 file formats without learning separate tools:

from critband.io import read_data

# Auto-detect — just point at any supported file
x = read_data("measurements.csv")
x = read_data("survey.xlsx", sheet="Sheet1")
x = read_data("results.pdf", column="Score")
y = read_data("report.docx", column=0, return_all=False)
Format Extension Library
CSV .csv Built-in csv (zero dependencies)
TSV / TXT .tsv, .txt Built-in (zero dependencies)
JSON .json Built-in json (zero dependencies)
Markdown (tables) .md Built-in parser (zero dependencies)
HTML (tables) .html, .htm Built-in html.parser (zero dependencies)
Excel .xlsx openpyxl
Excel (legacy) .xls xlrd
Word (tables) .docx python-docx
PDF (tables) .pdf pdfplumber

All dependencies are declared in pyproject.tomlpip install critband installs everything automatically. No system-level tools required.

Column / Sheet Selection

from critband.io import read_data, read_buffer

# By sheet name (Excel, PDF, Markdown multi-section)
x = read_data("data.xlsx", sheet="Measurements")

# By index
x = read_data("data.xlsx", sheet=0)

# By column name
x = read_data("data.csv", column="Value")

# By column index
x = read_data("data.csv", column=1)

# Get all numerical columns as a dict
cols = read_data("data.csv", return_all=True)
# cols = {"x": array([...]), "y": array([...])}

Buffer API

Works with in-memory file buffers for pipeline and service integration:

from critband.io import read_buffer

# From a BytesIO object
buf = io.BytesIO(uploaded_file.read())
x = read_buffer(buf, filename="data.csv")

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

critband-0.2.3.tar.gz (94.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

critband-0.2.3-py3-none-any.whl (76.3 kB view details)

Uploaded Python 3

File details

Details for the file critband-0.2.3.tar.gz.

File metadata

  • Download URL: critband-0.2.3.tar.gz
  • Upload date:
  • Size: 94.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for critband-0.2.3.tar.gz
Algorithm Hash digest
SHA256 cf6870d04108738b3578cbc2f4a43b7c8ea4da4a5ac4620b46dafb927cfe19a4
MD5 a837b92d8a9c27bcde5a1965d3c26bef
BLAKE2b-256 a639f2161360a933bf168e9ac8b9c2e45db849dfc430107f91224ff22b8ffeca

See more details on using hashes here.

File details

Details for the file critband-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: critband-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 76.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for critband-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 520ade8f80cecefaf58b1ef796d24b8ab08f14df56c6381669115f74c44ca24e
MD5 f7709354eeb055ddf48ecb2a56182ea2
BLAKE2b-256 5c26fa28acc337ad7e2f88e95213424df2cdef8f00ff7d4ea675078069fcf460

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page