Skip to main content

Beta Kernel Density Estimation with automatic bandwidth selection, compatible with Scikit-learn

Project description

Beta Kernel Density Estimation (beta-kde)

Fast, Boundary-Corrected Density Estimation for [0, 1] Data.

beta-kde is a Python library for Kernel Density Estimation (KDE) using the Beta kernel approach proposed by Chen (1999). It is designed to be a drop-in replacement for Scikit-learn's density estimators but optimized for bounded data (e.g., probabilities, percentages, rates) where standard Gaussian KDE suffers from boundary bias.

This package serves as the official implementation for the paper:

A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator Johan Hallberg Szabadváry (2025) Submitted to Journal of Computational and Graphical Statistics

Features

  • Boundary Correction: Eliminates boundary bias naturally—no more "leaking" probability mass or artificial hard stops.
  • Scikit-learn API: Drop-in replacement for KernelDensity, fully compatible with pipelines and cross-validation.
  • Custom Support: While optimized for $[0, 1]$ data (probabilities, rates), it supports any bounded interval $[a, b]$ (e.g., 0 to 100) via automatic scaling.
  • Automated Bandwidth Selection:
    • MISE Rule (Proposed): Fast, $\mathcal{O}(1)$ rule-of-thumb from Szabadváry (2025).
    • LCV / LSCV: Robust cross-validation methods.

Installation

pip install .

For development (editable install):

pip install -e .[dev]

Quickstart

import numpy as np
from beta_kernel import BetaKernelKDE
import matplotlib.pyplot as plt

# 1. Generate bounded data
np.random.seed(42)
data = np.random.beta(a=2, b=5, size=200)

# 2. Fit the estimator
# bandwidth="MISE_rule" is the default fast solver
kde = BetaKernelKDE(bandwidth="MISE_rule")
kde.fit(data)

print(f"Selected Bandwidth: {kde.bandwidth_:.4f}")

# 3. Score new samples (returns log-density)
# The standard score_samples returns un-normalized log-density (asymptotically consistent).
# To get a strictly normalized PDF (integrates to exactly 1.0), set normalized=True.
log_density = kde.score_samples(np.array([0.1, 0.5, 0.9]), normalized=True)

# 4. Plotting convenience method
kde.plot()
plt.show()

Running Tests

pytest tests/

Or use the included helper script if the package is not installed:

python run_tests.py

References

  • Chen, S. X. (1999). Beta kernel estimators for density functions. Computational Statistics & Data Analysis, 31(2), 131-145.
  • Szabadváry, J. H. (2025). A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator. Journal of Computational and Graphical Statistics (Submitted).

Citation

If you use this software in your research, please cite:

TODO: Add citation here!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beta_kde-0.1.1.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

beta_kde-0.1.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file beta_kde-0.1.1.tar.gz.

File metadata

  • Download URL: beta_kde-0.1.1.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for beta_kde-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3b5502019240483f06d1427db93315211ddfca8a833b85877d5ed8da103bd67c
MD5 155d760ff8d0dc9243d94326eb2297ea
BLAKE2b-256 62334766581112d910a58aebd7cefe5c9169282d18a458a08bad143b8373786d

See more details on using hashes here.

File details

Details for the file beta_kde-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: beta_kde-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for beta_kde-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85c4c1299ac64939c6e483edaa8db0ac0f5a8d6e96593a519edd63ac44aa4635
MD5 252f51923327f5afd7ac8daafb3df022
BLAKE2b-256 460a46ab903e21ccc1d228b134ff4f7cfb72bbbd93b1faec180185d5b43aa1d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page