Fast kernel bandwidth selection via analytic Hessian Newton optimization
Project description
hbw
Fast kernel bandwidth selection via analytic Hessian Newton optimization.
Installation
pip install hbw
Quick Start
import numpy as np
from hbw import kde_bandwidth, nw_bandwidth
# KDE bandwidth selection
x = np.random.randn(1000)
h = kde_bandwidth(x)
print(f"Optimal KDE bandwidth: {h:.4f}")
# Nadaraya-Watson regression bandwidth
x = np.linspace(-2, 2, 500)
y = np.sin(2 * x) + 0.3 * np.random.randn(len(x))
h = nw_bandwidth(x, y)
print(f"Optimal NW bandwidth: {h:.4f}")
# Large datasets: automatic subsampling
x_large = np.random.randn(100_000)
h = kde_bandwidth(x_large, max_n=5000, seed=42) # Uses 5000 random points
API Reference
kde_bandwidth(x, kernel="gauss", h0=None, max_n=5000, seed=None)
Select optimal KDE bandwidth via LSCV minimization.
| Parameter | Type | Description |
|---|---|---|
x |
array-like | Sample data |
kernel |
str | "gauss" or "epan" (Epanechnikov) |
h0 |
float | Initial bandwidth (default: Silverman's rule) |
max_n |
int | Subsample size for large data (None to disable) |
seed |
int | Random seed for reproducible subsampling |
Returns: float - optimal bandwidth
nw_bandwidth(x, y, kernel="gauss", h0=None, max_n=5000, seed=None)
Select optimal Nadaraya-Watson bandwidth via LOOCV-MSE minimization.
| Parameter | Type | Description |
|---|---|---|
x |
array-like | Predictor values |
y |
array-like | Response values |
kernel |
str | "gauss" or "epan" |
h0 |
float | Initial bandwidth (default: Silverman's rule) |
max_n |
int | Subsample size for large data |
seed |
int | Random seed |
Returns: float - optimal bandwidth
lscv(x, h, kernel="gauss")
Compute LSCV score, gradient, and Hessian for KDE.
Returns: tuple[float, float, float] - (score, gradient, hessian)
loocv_mse(x, y, h, kernel="gauss")
Compute LOOCV-MSE, gradient, and Hessian for NW regression.
Returns: tuple[float, float, float] - (loss, gradient, hessian)
How It Works
Problem: Cross-validation bandwidth selection requires O(n²) per evaluation. Grid search needs 50-100 evaluations.
Solution: We derive closed-form gradients and Hessians for the LSCV (KDE) and LOOCV-MSE (NW) objectives. This enables Newton optimization that converges in 6-12 evaluations—same optimum, 4-10x fewer evaluations.
Supported kernels:
- Gaussian:
K(u) = exp(-u²/2) / √(2π) - Epanechnikov:
K(u) = 0.75(1-u²)for |u| ≤ 1
For full mathematical details, see the paper.
Results
Newton-Armijo with analytic Hessian achieves identical accuracy to grid search with 2-2.5× wall-clock speedup:
| Method | Evaluations | Wall-clock (n=500) | Optimum |
|---|---|---|---|
| Grid search | 50 | 71 ms | ✓ |
| Brent | 10-12 | 46 ms | ✓ |
| Analytic Newton | 6-12 | 38 ms | ✓ |
| Silverman's rule | 1 | 0.08 ms | approximate |
Bootstrap use case: For 200 bootstrap resamples at n=500, Newton saves 75 seconds (125s → 50s).
Tested across sample sizes (100-500), noise levels, four DGPs (bimodal, unimodal, skewed, heavy-tailed), and both Gaussian/Epanechnikov kernels. See ms/ for full details.
Citation
@misc{hbw2024,
author = {Sood, Gaurav},
title = {Analytic-Hessian Bandwidth Selection for Kernel Density Estimation and Nadaraya-Watson Regression},
year = {2024},
url = {https://github.com/finite-sample/hbw}
}
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hbw-0.1.0.tar.gz.
File metadata
- Download URL: hbw-0.1.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c574e891627526f4246385918adc49e210b737e592bc3f9d27fa1d325b10725f
|
|
| MD5 |
544578d1d17978ef2b36dbe3cb95a134
|
|
| BLAKE2b-256 |
5e782f79c9f87414733fcb4f40e70fc78dd10208ac11dccff23645d8983e59a9
|
File details
Details for the file hbw-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hbw-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6de1d17a23c5d930c54eee73f26c36ffb5a23b5d3345e8f67635fbf5ed7e53f1
|
|
| MD5 |
66b7b3a932f965c28e36e9cf6edc41e2
|
|
| BLAKE2b-256 |
9428815ac7bea86f2042314661a4e44b450a0753d1dcd50638cf77dc06c7c608
|