Gradient boosting with kernel regression base learners

These details have not been verified by PyPI

Project links

Repository

Project description

KernelBoost

Gradient boosting with kernel-based local constant estimators

Python NumPy GPU License Version

KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:

Support for regression, classification and quantile regression tasks.
sklearn style API (fit, predict).
CPU (via C) and GPU (via CuPy/CUDA) backends.

Installation

# Basic installation
pip install kernelboost

# With GPU support (requires CUDA)
pip install cupy-cuda12x  # for CUDA 12

Dependencies: NumPy only. CuPy optional for GPU acceleration.

Quick Start

from kernelboost import KernelBooster, MulticlassBooster
from kernelboost.objectives import MSEObjective, EntropyObjective

# Regression
booster = KernelBooster(objective=MSEObjective()).fit(X_train, y_train)
predictions = booster.predict(X_test)

# Binary classification
booster = KernelBooster(objective=EntropyObjective()).fit(X_train, y_train)
logits = booster.predict(X_test)
probabilities = booster.predict_proba(X_test)

# Multiclass classification (fits one booster per class)
booster = MulticlassBooster().fit(X_train, y_train)
class_labels = booster.predict(X_test)

How it works

KernelBooster uses gradient boosting with kernel-based local constant estimators instead of decision trees. Each boosting round fits a KernelTree that partitions the data into regions, then applies Nadaraya-Watson kernel regression at each leaf to predict pseudo-residuals. Unlike tree-based boosters where splits implicitly select features, KernelBooster selects features explicitly at the boosting stage before tree construction.

What it delivers

With suitable preprocessing, KernelBooster can match popular gradient boosters like XGBoost and LightGBM on prediction accuracy while outperforming traditional kernel methods (KernelRidge, SVR, Gaussian Processes). Training time is comparable to other kernel methods. See Benchmarks for detailed comparisons.

Architecture

There are three main components to KernelBooster: KernelBooster class that does the boosting, KernelTree class that does the splitting and KernelEstimator class that implements the local constant estimation. As kernel methods are computationally expensive, the guiding principle has been computational efficiency.

After calling fit, KernelBooster starts a training loop which is mostly identical to the algorithm described in Friedman (2001). The main difference is that KernelTree does not choose features through its splits but is instead given them by the booster class. Default feature selection is random with increasing kernel sizes in terms of number of features. Random feature selection naturally creates randomness to training results, which can be mitigated with a lower learning rate and more rounds. Similarly to Friedman (2001), KernelBooster can fit several different objective functions, which are passed in as an Objective class.

KernelTree splits numerical data by density and categorical data by MSE. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a whopping 90% of compute.

The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction, but these are at this moment still experimental as they use a "naive" single kernel method whose precision is optimized for mean prediction.

Notable features

Beyond the core boosting algorithm, KernelBooster includes a few features worth highlighting:

Smart Feature Selection

While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on correlations between features and pseudo-residuals and performance in previous boosting rounds.

from kernelboost.feature_selection import SmartSelector

selector = SmartSelector(
    redundancy_penalty=0.4,
    relevance_alpha=0.7,
    recency_penalty=0.3,
)

booster = KernelBooster(
    objective=MSEObjective(),
    feature_selector=selector,
)

Early Stopping

Training stops automatically if evaluation loss doesn't improve for consecutive rounds (controlled by early_stopping_rounds parameter).

booster.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=20)

RhoOptimizer

RhoOptimizer performs post-hoc optimization of step sizes, often improving predictions at minimal additional cost. It can also back out optimal regularization parameters (L1 penalty and learning rate) — useful when unsure what level of regularization to use.

from kernelboost.rho_optimizer import RhoOptimizer

opt = RhoOptimizer(booster, lambda_reg=1.0)
opt.fit(X_val, y_val)
opt.update_booster()

# Back out optimal hyperparameters
lambda1, learning_rate = opt.find_hyperparameters()

Uncertainty Quantification (Experimental)

KernelBooster has both prediction intervals and conditional variance prediction based on kernel estimation. These come for "free" on top of training and require no extra data. Still work in progress.

# Prediction intervals (90% by default)
lower, upper = booster.predict_intervals(X, alpha=0.1)

# Conditional variance estimates
variance = booster.predict_variance(X)

Both interval coverage and conditional variance have a tendency to be underestimated, but this depends on the data and how well boosting has converged. No special tuning required: settings that optimize MSE also give reasonable uncertainty estimates. See benchmarks for a comparison with Gaussian Processes.

Data Preprocessing

Scaling data is a good idea for kernel estimation methods. The package includes a simple RankTransformer that often works well.

from kernelboost.utilities import RankTransformer

scaler = RankTransformer(pct=True)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Like other kernel methods, KernelBooster works best with continuous, smooth features. For datasets with many categorical features, tree-based methods are often better suited—they handle splits on categories naturally.

API Reference

Class	Purpose
`KernelBooster`	Main booster for regression/binary classification
`MulticlassBooster`	One-vs-rest multiclass wrapper
`MSEObjective`	Mean squared error (regression)
`EntropyObjective`	Cross-entropy (binary classification)
`QuantileObjective`	Pinball loss (quantile regression)
`SmartSelector`	mRMR-style feature selection
`RandomSelector`	Random feature selection
`RhoOptimizer`	Post-hoc step size optimization
`RankTransformer`	Percentile normalization

KernelBooster Main Parameters

Parameter	Default	Description
`objective`	Required	Loss function: `MSEObjective()`, `EntropyObjective()`, `QuantileObjective()`
`rounds`	auto	Boosting iterations (auto = n_features * 10)
`max_features`	auto	Max features per estimator (auto = min(10, n_features))
`min_features`	1	Min features per estimator
`kernel_type`	'laplace'	Kernel function: 'laplace' or 'gaussian'
`learning_rate`	0.5	Step size shrinkage factor
`lambda1`	0.0	L1 regularization
`use_gpu`	False	Enable GPU acceleration

Benchmarks

Results have inherent randomness due to feature selection and subsampling. Scripts available in benchmarks/.

Regression (California Housing)

=================================================================
Model                       MSE        MAE         R²       Time
-----------------------------------------------------------------
KernelBooster            0.2053     0.2985     0.8452      11.0s
sklearn HGBR             0.2247     0.3146     0.8306       0.1s
XGBoost                  0.2155     0.3050     0.8376       0.1s
LightGBM                 0.2097     0.3047     0.8419       0.1s
=================================================================

Binary Classification (Breast Cancer)

=================================================================
Model                  Accuracy    AUC-ROC         F1       Time
-----------------------------------------------------------------
KernelBooster            0.9825     0.9984     0.9861       1.6s
sklearn HGBC             0.9649     0.9944     0.9722       0.1s
XGBoost                  0.9561     0.9938     0.9650       0.0s
LightGBM                 0.9649     0.9925     0.9722       0.0s
=================================================================

Comparison with Kernel Methods (California Housing)

=================================================================
Kernel Methods Benchmark (n_train=10000)
=================================================================
Model                       MSE        MAE         R²       Time
-----------------------------------------------------------------
KernelBooster            0.2091     0.3054     0.8430       6.5s
KernelRidge              0.4233     0.4835     0.6822       1.7s
SVR                      0.3136     0.3780     0.7646       3.5s
GP (n=5000)              0.3297     0.4061     0.7524      67.7s
=================================================================

Uncertainty Quantification (California Housing)

Prediction intervals and conditional variance estimates compared to Gaussian Process (sklearn) regression:

=================================================================
Uncertainty Quantification (90% intervals, alpha=0.1)
=================================================================
Model                  Coverage    Width    Var Corr   Var Ratio
-----------------------------------------------------------------
KernelBooster            88.1%    1.235      0.206       1.621
GP (n=5000)              90.9%    1.863      0.157       1.026
=================================================================

Var Corr is the correlation between predicted variance and squared errors. Var Ratio is the ratio between mean of squared_errors and predicted variance.

CPU/GPU training time comparison (California Housing)

=================================================================
GPU vs CPU Training Time (California Housing, n=10000)
=================================================================
Backend                                                  Time
-----------------------------------------------------------------
CPU (C/OpenMP)                                          38.6s
GPU (CuPy/CUDA)                                          4.6s
=================================================================
GPU speedup: 8.3x

All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.

References

Fan, J., & Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall.
Fan, J., & Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645–660.
Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189-1232.
Hansen, B. E. (2004). Nonparametric Conditional Density Estimation. Working paper, University of Wisconsin.

About

KernelBoost is a hobby project exploring alternatives to tree-based gradient boosting. Currently v0.1.0. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.3.2

Apr 26, 2026

0.3.1

Mar 11, 2026

0.3.0

Mar 7, 2026

0.2.1

Mar 1, 2026

0.2.0

Feb 21, 2026

This version

0.1.0

Feb 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kernelboost-0.1.0.tar.gz (60.3 kB view details)

Uploaded Feb 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kernelboost-0.1.0-py3-none-any.whl (68.2 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file kernelboost-0.1.0.tar.gz.

File metadata

Download URL: kernelboost-0.1.0.tar.gz
Upload date: Feb 10, 2026
Size: 60.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for kernelboost-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`871b10c8f0345d3ef53b136d13449f4ea22b0ab32a42c4dd9d0334ce89f1b4db`
MD5	`4c406f4f98f54bae7def42c4f9a6b538`
BLAKE2b-256	`954b8f1a26161844b574813c58f2fc6b809b54eccfd507dbc35a47280508931e`

See more details on using hashes here.

File details

Details for the file kernelboost-0.1.0-py3-none-any.whl.

File metadata

Download URL: kernelboost-0.1.0-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 68.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for kernelboost-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2a760bff81bb49cd9077b0978a76f4c2d2cc27343101380b89872b37ea19f68`
MD5	`0696b17f8897f18d9b93377b5467424a`
BLAKE2b-256	`96c59a453acdda88d7f267c0ac42d804265bc0a2594aa5d267d5f3434f29cef9`

See more details on using hashes here.

kernelboost 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

KernelBoost

Installation

Quick Start

How it works

What it delivers

Architecture

Notable features

Smart Feature Selection

Early Stopping

RhoOptimizer

Uncertainty Quantification (Experimental)

Data Preprocessing

API Reference

KernelBooster Main Parameters

Benchmarks

Regression (California Housing)

Binary Classification (Breast Cancer)

Comparison with Kernel Methods (California Housing)

Uncertainty Quantification (California Housing)

CPU/GPU training time comparison (California Housing)

References

About

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes