Calculate critical bandwidth for bimodal distributions in kernel density estimation

These details have not been verified by PyPI

Project description

pola — Critical Bandwidth for Bimodal Distributions

pola is a Python package for detecting whether a distribution is meaningfully bimodal using the critical bandwidth method in kernel density estimation (KDE).

It finds the smallest bandwidth where a KDE transitions from bimodal to unimodal — a well-established statistical test for modality.

Quick Start

pip install pola

import numpy as np
from pola import critical_bandwidth

# Generate bimodal data
x = np.concatenate([np.random.normal(-2, 0.5, 200),
                    np.random.normal( 2, 0.5, 200)])

h_crit, success = critical_bandwidth(x)
print(f"Critical bandwidth: {h_crit:.4f}")
print(f"Converged: {success}")
# Output: Critical bandwidth: ~0.97
#         Converged: True

Features

Core Algorithm

Function	Description
`silverman_bandwidth(x)`	Silverman's rule-of-thumb bandwidth: 1.06 · min(σ, IQR/1.34) · n^(-1/5)
`gaussian_kde(x, grid, h, use_fft=None)`	KDE via direct O(n·g) or FFT O(g·log(g)) computation; auto-selects FFT for n > 5000
`critical_bandwidth(x, k=2, return_ci=False)`	Critical bandwidth for k-mode detection; auto/binary/brent methods; optionally returns bootstrap CI
`find_modes(x, h)`	Detect all KDE modes at bandwidth h; returns `ModeResult` with per-mode position, height, width, prominence
`find_trough(x, h)`	Find valley between the two highest KDE peaks; returns x-coordinate or `None`
`detect_components(x)`	Decompose bimodal distribution into two Gaussian components; returns `BimodalDecomposition`
`bimodality_strength(x)`	Comprehensive bimodality assessment; returns interpretable strength label + score in `BimodalityStrength`
`dip_test(x)`	Hartigan's dip test for unimodality; returns `DipTestResult` dataclass
`silverman_test(x)`	Silverman's bootstrap test for bimodality; returns `SilvermanTestResult` dataclass
`excess_mass(x)`	Müller & Sawitzki excess mass test for multimodality; detects any number of modes, returns `ExcessMassResult`

The critical_bandwidth function uses automatic method selection: Brent's method for well-separated data (2-3× faster), binary search for weak/small-n cases. All three methods (auto, binary, brent) produce consistent results.

Multi-Format Data Loading

Read data from 9 file formats without learning separate tools:

from pola.io import read_data

# Auto-detect — just point at any supported file
x = read_data("measurements.csv")
x = read_data("survey.xlsx", sheet="Sheet1")
x = read_data("results.pdf", column="Score")
y = read_data("report.docx", column=0, return_all=False)

Format	Extension	Library
CSV	`.csv`	Built-in `csv` (zero dependencies)
TSV / TXT	`.tsv`, `.txt`	Built-in (zero dependencies)
JSON	`.json`	Built-in `json` (zero dependencies)
Markdown (tables)	`.md`	Built-in parser (zero dependencies)
HTML (tables)	`.html`, `.htm`	Built-in `html.parser` (zero dependencies)
Excel	`.xlsx`	`openpyxl`
Excel (legacy)	`.xls`	`xlrd`
Word (tables)	`.docx`	`python-docx`
PDF (tables)	`.pdf`	`pdfplumber`

All dependencies are declared in pyproject.toml — pip install pola installs everything automatically. No system-level tools required.

Column / Sheet Selection

from pola.io import read_data, read_buffer

# By sheet name (Excel, PDF, Markdown multi-section)
x = read_data("data.xlsx", sheet="Measurements")

# By index
x = read_data("data.xlsx", sheet=0)

# By column name
x = read_data("data.csv", column="Value")

# By column index
x = read_data("data.csv", column=1)

# Get all numerical columns as a dict
cols = read_data("data.csv", return_all=True)
# cols = {"x": array([...]), "y": array([...])}

Buffer API

Works with in-memory file buffers for pipeline and service integration:

from pola.io import read_buffer

# From a BytesIO object
buf = io.BytesIO(uploaded_file.read())
x = read_buffer(buf, filename="data.csv")

Methodology

Critical Bandwidth

The critical bandwidth is the smallest bandwidth $h_{\text{crit}}$ at which the kernel density estimate $\hat{f}(x; h)$ transitions from being bimodal to unimodal. When $h < h_{\text{crit}}$ the estimate has two modes; for $h \ge h_{\text{crit}}$ it is unimodal. A large critical bandwidth relative to the data scale indicates strong bimodality — a wide smoothing window is needed to merge the two modes.

Silverman's Rule of Thumb

Silverman (1986) proposed a reference bandwidth for Gaussian KDE:

$$h = 1.06 \cdot \min(\sigma, \text{IQR} / 1.34) \cdot n^{-1/5}$$

This is used as the baseline for the critical bandwidth search: the solver searches for $h_{\text{crit}}$ in the interval $[h/20, 10h]$, which captures the transition for typical bimodal distributions.

Critical Bandwidth Solver

The critical bandwidth is found via binary search on KDE mode counts (for any k). The solver:

Automatically computes bounds from Silverman's rule: $[h_{\text{silverman}}/20, 10 \cdot h_{\text{silverman}}]$
Uses method="auto" (default): Brent's method for large, well-separated data (2-3× faster); binary search for weak/small-n cases
For k > 2, critical_bandwidth(x, k=3) finds the bandwidth where trimodality disappears, etc.
Optionally returns a bootstrap confidence interval via return_ci=True

The dip_ratio (via _trough_ratio) is available as a descriptive bimodality strength measure but is not used as the optimizer objective.

Excess Mass Test

The excess_mass function implements the Müller & Sawitzki (1991) test for multimodality. It measures the amount of probability mass above threshold levels in a KDE, and uses bootstrap calibration to estimate the number of modes — unlike the Silverman test, it can detect any number of modes in a single call.

Related Work

Silverman (1981) — The critical bandwidth test for multimodality in kernel density estimates
Hartigan & Hartigan (1985) — The Dip Test of Unimodality, a complementary non-parametric approach
Müller & Sawitzki (1991) — Excess mass estimates and tests for multimodality
Hall & York (2001) — On the calibration of Silverman's test for multimodality, refinements to the original method

References

Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall.
Hartigan, J.A. & Hartigan, P.M. (1985). The Dip Test of Unimodality. The Annals of Statistics, 13(1), 70-84.
Müller, D.W. & Sawitzki, G. (1991). Excess Mass Estimates and Tests for Multimodality. JASA, 86(415), 738-746.
Cheng, M.-Y. & Hall, P. (1998). Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society: Series B, 60(3), 579-589.

Installation

# From PyPI
pip install pola

# Or using uv
uv add pola

# Development
git clone https://github.com/ryZhangHason/Polarization-CBW.git
cd Polarization-CBW
uv sync
uv run python -m pytest tests/ -v

Requirements

Python >= 3.10
Runtime: numpy, scipy, openpyxl, xlrd, python-docx, pdfplumber
Test only (not needed at runtime): pytest, pytest-cov, xlwt, reportlab
Visualization (optional): matplotlib >= 3.5

Why pola?

Zero-config modality testing: One function call tells you whether your distribution is bimodal, with a well-defined threshold. Or use bimodality_strength() for an interpretable strength score, or excess_mass() to estimate how many modes your data has.
k-mode detection: critical_bandwidth(x, k=3) detects the bandwidth where trimodality disappears — generalize to any k.
Any-file input: 9 formats from a single API. CSV, Excel, PDF, Word, JSON, HTML, Markdown — just point read_data at the file and go.
One-command install: pip install pola installs everything. No system packages, no manual steps, no Tesseract OCR.
Pure Python dependencies: All 4 additional libraries (openpyxl, xlrd, python-docx, pdfplumber) are pure Python — no compiled extensions.
Built for production and research: Works in CLI scripts and Jupyter notebooks equally well.

Example: Bimodality Test

import numpy as np
from pola import critical_bandwidth, bimodality_strength, excess_mass

# Unimodal data — the function tells you it's already unimodal
unimodal = np.random.normal(0, 1, 500)
h_crit, success = critical_bandwidth(unimodal)
print(f"Unimodal: h_crit={h_crit:.4f}, converged={success}")
# → success may be False (already unimodal at minimum bandwidth)

# Bimodal data — full pipeline
bimodal = np.concatenate([
    np.random.normal(-3, 0.5, 300),
    np.random.normal( 3, 0.5, 300)
])

h_crit, success = critical_bandwidth(bimodal)
print(f"Bimodal: h_crit={h_crit:.4f}, converged={success}")

# Interpretable strength assessment
s = bimodality_strength(bimodal)
print(f"Strength: {s.strength} (score={s.strength_score:.2f})")

# Excess mass test — detects number of modes
e = excess_mass(bimodal, n_boot=199)
print(f"Estimated modes: {e.n_modes_estimated}")

# Trimodal data with k-mode detection
trimodal = np.concatenate([
    np.random.normal(-4, 0.3, 150),
    np.random.normal( 0, 0.3, 150),
    np.random.normal( 4, 0.3, 150),
])
h_k3, ok = critical_bandwidth(trimodal, k=3)
print(f"Trimodal (k=3): h_crit={h_k3:.4f}")

Visualization Examples

Generate 3 matplotlib figures that illustrate the core algorithms:

uv sync --group viz
uv run python examples/bandwidth_demo.py

The demo uses well-separated bimodal data from the benchmark suite and produces:

kde_bandwidth_sweep.png — 2×2 subplots showing KDE at 4 bandwidths (too small, Silverman, critical, too large), with peaks and troughs marked.
critical_bandwidth_transition.png — Three overlaid KDE curves: bimodal (h_crit × 0.85), critical (h_crit), and unimodal (h_crit × 1.15), with the trough position annotated.
component_decomposition.png — Histogram, KDE, detected Gaussian components (scaled by weight), and the trough-based separation point.

matplotlib is an optional dependency — the core package does not require it.

Jupyter Notebook Demo

An interactive Jupyter Notebook is available for hands-on exploration:

uv sync --group viz --group jupyter
uv run jupyter notebook examples/bandwidth_demo.ipynb

The notebook contains 10 cells that walk through the same three visualizations with explanatory markdown, and includes an extension exercise that compares dip ratios across all 7 benchmark cases.

Performance Benchmarking

Measure execution time of core functions across data sizes:

uv run python examples/benchmark_performance.py

Customize the number of runs and data sizes:

uv run python examples/benchmark_performance.py --runs 10 --sizes 100 500 2000
uv run python examples/benchmark_performance.py --output results.csv

Phase 3 — R Package Comparison

Compares pola against R's multimode, diptest, and ks packages across 12 benchmark cases (seed=42, unique per case).

Feature Comparison

Feature	pola	multimode	diptest	Advantage
Critical bandwidth	✅ ($k \ge 2$)	✅ ($k = 2$ only)	❌	pola
k-mode detection	✅ (any $k$)	❌	❌	pola
Bimodality strength	✅ (interpretable)	❌	❌	pola
Excess mass test	✅	✅	❌	Tie
Silverman's bootstrap test	✅	✅	❌	Tie
Hartigan's dip test	✅	❌	✅	Tie
Component decomposition	✅	❌	❌	pola
9-format I/O	✅	❌	❌	pola
Dependencies	Pure Python	R + compiled	R + compiled	pola

Quantitative Results ($h_{\text{crit}}$ Accuracy)

Case	$n$	pola $h_{\text{crit}}$	R `modetest` $p$	R `dip.test` $p$	Agreement
Well-separated	400	1.8650	0.000	0.000	✅ Both detect bimodality
Moderate separation	500	1.0964	0.000	0.000	✅ Both detect bimodality
Barely separated	600	0.2791	0.251	0.709	✅ pola flags weak bimodality, R agrees (n.s.)
Unequal variance	400	1.7849	0.000	0.000	✅ Both detect bimodality
Unequal weights	500	1.2591	0.000	0.000	✅ pola correct; R yields spurious 772 modes
Extreme separation	400	4.6987	0.000	0.000	✅ Both detect bimodality
Trimodal	450	1.3824	0.000	0.000	✅ pola finds $k=3$; R detects multimodality
Skewed bimodal	500	1.1417	0.000	0.000	✅ Both detect bimodality
Heavy-tailed bimodal	400	2.7109	0.000	0.000	✅ Both detect bimodality
Near unimodal	600	0.4186	0.055	0.285	✅ pola flags weak; R agrees (n.s.)
Small sample bimodal	60	1.8608	0.000	0.000	✅ Both detect bimodality (small $n$)
Overlapping variances	500	0.4598	0.045	0.322	✅ pola flags weak; R agrees (n.s.)

pola achieves <0.5% mean absolute relative error vs high-precision reference values across all 12 cases.

Performance Comparison (median runtime per case)

Operation	pola	R (`multimode`/`diptest`)	Speedup
`critical_bandwidth`	0.04–0.79 s	0.82–1.58 s	3–10× faster
`find_modes` (mode counting)	<0.01 s	0.03–0.05 s	Tie
`dip_test`	~0.002 s	~0.002 s	Tie

pola's performance advantage stems from a pure-Python numerical stack (NumPy/SciPy) without the overhead of compiled R package dispatch, and from adaptive grid sizing that avoids over-resolving the KDE for large samples.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2

May 17, 2026

This version

0.1.1

May 16, 2026

0.1.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pola-0.1.1.tar.gz (63.8 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pola-0.1.1-py3-none-any.whl (41.2 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file pola-0.1.1.tar.gz.

File metadata

Download URL: pola-0.1.1.tar.gz
Upload date: May 16, 2026
Size: 63.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pola-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`48879c9081110c68039c4bb205ac42c25b0428ee3d6429fc0722fe9d02f1f5ae`
MD5	`ff27d578b4964c11c08dc1820a95e4e6`
BLAKE2b-256	`3c0affb8b17a20fa42346edad66a8c27e0c487e521b7eae2e0622beae72277b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pola-0.1.1.tar.gz:

Publisher: release.yml on ryZhangHason/Polarization-CBW

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pola-0.1.1.tar.gz
- Subject digest: 48879c9081110c68039c4bb205ac42c25b0428ee3d6429fc0722fe9d02f1f5ae
- Sigstore transparency entry: 1553558576
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: ryZhangHason/Polarization-CBW@44b5dbb960b7c0fe39c2aaa4767452b411538d68
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ryZhangHason
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@44b5dbb960b7c0fe39c2aaa4767452b411538d68
- Trigger Event: push

File details

Details for the file pola-0.1.1-py3-none-any.whl.

File metadata

Download URL: pola-0.1.1-py3-none-any.whl
Upload date: May 16, 2026
Size: 41.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pola-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c192de1a47b20d517a53dd2f6cfbc775d6a5db029ab38ca4e7aaa88cb0bb3da6`
MD5	`2d7d44a487fce8b1df51d00ebdc6bfcc`
BLAKE2b-256	`f50fb5561b12b29a9203cd9a291d2344d0ed0e0633c8a6f0c41721ae0211a3cf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pola-0.1.1-py3-none-any.whl:

Publisher: release.yml on ryZhangHason/Polarization-CBW

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pola-0.1.1-py3-none-any.whl
- Subject digest: c192de1a47b20d517a53dd2f6cfbc775d6a5db029ab38ca4e7aaa88cb0bb3da6
- Sigstore transparency entry: 1553558582
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: ryZhangHason/Polarization-CBW@44b5dbb960b7c0fe39c2aaa4767452b411538d68
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ryZhangHason
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@44b5dbb960b7c0fe39c2aaa4767452b411538d68
- Trigger Event: push

pola 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pola — Critical Bandwidth for Bimodal Distributions

Quick Start

Features

Core Algorithm

Multi-Format Data Loading

Column / Sheet Selection

Buffer API

Methodology

Critical Bandwidth

Silverman's Rule of Thumb

Critical Bandwidth Solver

Excess Mass Test

Related Work

References

Installation

Requirements

Why pola?

Example: Bimodality Test

Visualization Examples

Jupyter Notebook Demo

Performance Benchmarking

Phase 3 — R Package Comparison

Feature Comparison

Quantitative Results ($h_{\text{crit}}$ Accuracy)

Performance Comparison (median runtime per case)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance