Beta Kernel Density Estimation with automatic bandwidth selection, compatible with Scikit-learn
Project description
Beta Kernel Density Estimation (beta-kde)
Fast, Boundary-Corrected Density Estimation for [0, 1] Data.
beta-kde is a Python library for Kernel Density Estimation (KDE) using the Beta kernel approach proposed by Chen (1999). It is designed to be a drop-in replacement for Scikit-learn's density estimators but optimized for bounded data (e.g., probabilities, percentages, rates) where standard Gaussian KDE suffers from boundary bias.
This package serves as the official implementation for the paper:
A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator Johan Hallberg Szabadváry (2025) Submitted to Journal of Computational and Graphical Statistics
Features
- Boundary Correction: Eliminates boundary bias naturally—no more "leaking" probability mass or artificial hard stops.
- Scikit-learn API: Drop-in replacement for
KernelDensity, fully compatible with pipelines and cross-validation. - Custom Support: While optimized for $[0, 1]$ data (probabilities, rates), it supports any bounded interval $[a, b]$ (e.g., 0 to 100) via automatic scaling.
- Automated Bandwidth Selection:
- MISE Rule (Proposed): Fast, $\mathcal{O}(1)$ rule-of-thumb from Szabadváry (2025).
- LCV / LSCV: Robust cross-validation methods.
Installation
pip install .
For development (editable install):
pip install -e .[dev]
Quickstart
import numpy as np
from beta_kernel import BetaKernelKDE
import matplotlib.pyplot as plt
# 1. Generate bounded data
np.random.seed(42)
data = np.random.beta(a=2, b=5, size=200)
# 2. Fit the estimator
# bandwidth="MISE_rule" is the default fast solver
kde = BetaKernelKDE(bandwidth="MISE_rule")
kde.fit(data)
print(f"Selected Bandwidth: {kde.bandwidth_:.4f}")
# 3. Score new samples (returns log-density)
# The standard score_samples returns un-normalized log-density (asymptotically consistent).
# To get a strictly normalized PDF (integrates to exactly 1.0), set normalized=True.
log_density = kde.score_samples(np.array([0.1, 0.5, 0.9]), normalized=True)
# 4. Plotting convenience method
kde.plot()
plt.show()
Running Tests
pytest tests/
Or use the included helper script if the package is not installed:
python run_tests.py
References
- Chen, S. X. (1999). Beta kernel estimators for density functions. Computational Statistics & Data Analysis, 31(2), 131-145.
- Szabadváry, J. H. (2025). A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator. Journal of Computational and Graphical Statistics (Submitted).
Citation
If you use this software in your research, please cite:
TODO: Add citation here!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file beta_kde-0.1.1.tar.gz.
File metadata
- Download URL: beta_kde-0.1.1.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b5502019240483f06d1427db93315211ddfca8a833b85877d5ed8da103bd67c
|
|
| MD5 |
155d760ff8d0dc9243d94326eb2297ea
|
|
| BLAKE2b-256 |
62334766581112d910a58aebd7cefe5c9169282d18a458a08bad143b8373786d
|
File details
Details for the file beta_kde-0.1.1-py3-none-any.whl.
File metadata
- Download URL: beta_kde-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85c4c1299ac64939c6e483edaa8db0ac0f5a8d6e96593a519edd63ac44aa4635
|
|
| MD5 |
252f51923327f5afd7ac8daafb3df022
|
|
| BLAKE2b-256 |
460a46ab903e21ccc1d228b134ff4f7cfb72bbbd93b1faec180185d5b43aa1d9
|