Fast kernel bandwidth selection via analytic Hessian Newton optimization
Project description
hbw
Fast kernel bandwidth selection via analytic Hessian Newton optimization.
Installation
pip install hbw
Quick Start
import numpy as np
from hbw import kde_bandwidth, nw_bandwidth
# KDE bandwidth selection
x = np.random.randn(1000)
h = kde_bandwidth(x)
print(f"Optimal KDE bandwidth: {h:.4f}")
# Nadaraya-Watson regression bandwidth
x = np.linspace(-2, 2, 500)
y = np.sin(2 * x) + 0.3 * np.random.randn(len(x))
h = nw_bandwidth(x, y)
print(f"Optimal NW bandwidth: {h:.4f}")
# Large datasets: automatic subsampling
x_large = np.random.randn(100_000)
h = kde_bandwidth(x_large, max_n=5000, seed=42) # Uses 5000 random points
# Multivariate KDE (2D example)
from hbw import kde_bandwidth_mv
X = np.random.randn(500, 2)
h = kde_bandwidth_mv(X)
print(f"Optimal 2D bandwidth: {h:.4f}")
API Reference
kde_bandwidth(x, kernel="gauss", h0=None, max_n=5000, seed=None)
Select optimal KDE bandwidth via LSCV minimization.
| Parameter | Type | Description |
|---|---|---|
x |
array-like | Sample data |
kernel |
str | "gauss", "epan", "unif", "biweight", "triweight", or "cosine" |
h0 |
float | Initial bandwidth (default: Silverman's rule) |
max_n |
int | Subsample size for large data (None to disable) |
seed |
int | Random seed for reproducible subsampling |
Returns: float - optimal bandwidth
nw_bandwidth(x, y, kernel="gauss", h0=None, max_n=5000, seed=None)
Select optimal Nadaraya-Watson bandwidth via LOOCV-MSE minimization.
| Parameter | Type | Description |
|---|---|---|
x |
array-like | Predictor values |
y |
array-like | Response values |
kernel |
str | "gauss", "epan", "unif", "biweight", "triweight", or "cosine" |
h0 |
float | Initial bandwidth (default: Silverman's rule) |
max_n |
int | Subsample size for large data |
seed |
int | Random seed |
Returns: float - optimal bandwidth
lscv(x, h, kernel="gauss")
Compute LSCV score, gradient, and Hessian for KDE.
Returns: tuple[float, float, float] - (score, gradient, hessian)
loocv_mse(x, y, h, kernel="gauss")
Compute LOOCV-MSE, gradient, and Hessian for NW regression.
Returns: tuple[float, float, float] - (loss, gradient, hessian)
kde_bandwidth_mv(data, kernel="gauss", h0=None, max_n=3000, seed=None, standardize=True)
Select optimal multivariate KDE bandwidth via LSCV minimization with product kernel.
| Parameter | Type | Description |
|---|---|---|
data |
array-like | Sample data, shape (n, d) |
kernel |
str | "gauss", "epan", "unif", "biweight", "triweight", or "cosine" |
h0 |
float | Initial bandwidth (default: Scott's rule) |
max_n |
int | Subsample size for large data |
seed |
int | Random seed |
standardize |
bool | Standardize each dimension to unit variance |
Returns: float - optimal isotropic bandwidth
lscv_mv(data, h, kernel="gauss")
Compute LSCV score, gradient, and Hessian for multivariate KDE.
Returns: tuple[float, float, float] - (score, gradient, hessian)
How It Works
Problem: Cross-validation bandwidth selection requires O(n²) per evaluation. Grid search needs 50-100 evaluations.
Solution: We derive closed-form gradients and Hessians for the LSCV (KDE) and LOOCV-MSE (NW) objectives. This enables Newton optimization that converges in 6-12 evaluations—same optimum, 4-10x fewer evaluations.
Supported kernels:
- Gaussian:
K(u) = exp(-u²/2) / √(2π) - Epanechnikov:
K(u) = 0.75(1-u²)for |u| ≤ 1 - Uniform:
K(u) = 0.5for |u| ≤ 1 - Biweight:
K(u) = (15/16)(1-u²)²for |u| ≤ 1 - Triweight:
K(u) = (35/32)(1-u²)³for |u| ≤ 1 - Cosine:
K(u) = (π/4)cos(πu/2)for |u| ≤ 1
For full mathematical details, see the paper.
Results
Newton-Armijo with analytic Hessian achieves identical accuracy to grid search with significant speedups:
KDE (n=500):
| Kernel | Newton | Grid (50 pts) | Speedup |
|---|---|---|---|
| Gaussian | 54 ms | 70 ms | 1.3× |
| Epanechnikov | 124 ms | 213 ms | 1.7× |
| Biweight | 205 ms | 294 ms | 1.4× |
| Triweight | 354 ms | 497 ms | 1.4× |
| Cosine | 150 ms | 239 ms | 1.6× |
NW Regression (n=500):
| Kernel | Newton | Grid (50 pts) | Speedup |
|---|---|---|---|
| Gaussian | 16 ms | 39 ms | 2.5× |
| Epanechnikov | 21 ms | 52 ms | 2.5× |
| Biweight | 20 ms | 53 ms | 2.6× |
| Triweight | 21 ms | 82 ms | 3.9× |
| Cosine | 11 ms | 75 ms | 6.9× |
Bootstrap use case: For 200 bootstrap resamples at n=500, Newton saves significant computation time.
Tested across sample sizes (100-500), noise levels, four DGPs (bimodal, unimodal, skewed, heavy-tailed), and all six kernels. See ms/ for full details.
Citation
@misc{hbw2024,
author = {Sood, Gaurav},
title = {Analytic-Hessian Bandwidth Selection for Kernel Density Estimation and Nadaraya-Watson Regression},
year = {2024},
url = {https://github.com/finite-sample/hbw}
}
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hbw-0.3.1.tar.gz.
File metadata
- Download URL: hbw-0.3.1.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69dac86b32d1fe6fdc27998119adaf61138accadedbe8682e387ed875ac9abbb
|
|
| MD5 |
ed643e171f9920fad1ad62d163d86037
|
|
| BLAKE2b-256 |
cfbb149f24371e9623eb630bccc37aab633849cb4466f278f91f4e2948f78a3b
|
Provenance
The following attestation bundles were made for hbw-0.3.1.tar.gz:
Publisher:
python-publish.yml on finite-sample/hbw
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hbw-0.3.1.tar.gz -
Subject digest:
69dac86b32d1fe6fdc27998119adaf61138accadedbe8682e387ed875ac9abbb - Sigstore transparency entry: 1228074084
- Sigstore integration time:
-
Permalink:
finite-sample/hbw@ce49b9d3a780f1abb4a1e1e38f575b157e059f20 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@ce49b9d3a780f1abb4a1e1e38f575b157e059f20 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file hbw-0.3.1-py3-none-any.whl.
File metadata
- Download URL: hbw-0.3.1-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2492268d6111700270cee3380abf693d4e4ff204ef5faeef7f6a237d748febac
|
|
| MD5 |
f43de15205d319e5a1a3a621a88f0fc5
|
|
| BLAKE2b-256 |
8d264be27d163b39b56441ebb35fe15fa5fdd7d1175cefb9c8369b48c8009c67
|
Provenance
The following attestation bundles were made for hbw-0.3.1-py3-none-any.whl:
Publisher:
python-publish.yml on finite-sample/hbw
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hbw-0.3.1-py3-none-any.whl -
Subject digest:
2492268d6111700270cee3380abf693d4e4ff204ef5faeef7f6a237d748febac - Sigstore transparency entry: 1228074113
- Sigstore integration time:
-
Permalink:
finite-sample/hbw@ce49b9d3a780f1abb4a1e1e38f575b157e059f20 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@ce49b9d3a780f1abb4a1e1e38f575b157e059f20 -
Trigger Event:
workflow_dispatch
-
Statement type: