Skip to main content

Gradient boosting with kernel regression base learners

Project description

KernelBoost

Gradient boosting with kernel-based local constant estimators

Python NumPy C GPU License Version

KernelBoost is a gradient boosting algorithm that uses Nadaraya-Watson (local constant) kernel estimators as base learners instead of decision trees. It has:

  • Support for regression, classification and quantile regression tasks.
  • sklearn style API (fit, predict).
  • CPU (via C) and GPU (via CuPy/CUDA) backends.

Installation

# Basic installation
pip install kernelboost

# With GPU support (requires CUDA)
pip install cupy-cuda12x  # for CUDA 12

Dependencies: NumPy only. CuPy optional for GPU acceleration.

Quick Start

from kernelboost import KernelBooster, MulticlassBooster
from kernelboost.objectives import MSEObjective, EntropyObjective

# Regression
booster = KernelBooster(objective=MSEObjective()).fit(X_train, y_train)
predictions = booster.predict(X_test)

# Binary classification
booster = KernelBooster(objective=EntropyObjective()).fit(X_train, y_train)
logits = booster.predict(X_test)
probabilities = booster.predict_proba(X_test)

# Multiclass classification (fits one booster per class)
booster = MulticlassBooster().fit(X_train, y_train)
class_labels = booster.predict(X_test)

How it works

KernelBooster uses gradient boosting with kernel-based local constant estimators instead of decision trees. Each boosting round fits a KernelTree that partitions the data into regions, then applies Nadaraya-Watson kernel regression at each leaf to predict pseudo-residuals. Unlike tree-based boosters where splits implicitly select features, KernelBooster selects features explicitly at the boosting stage before tree construction.

What it delivers

With suitable preprocessing, KernelBooster can match popular gradient boosters like XGBoost and LightGBM on prediction accuracy while outperforming traditional kernel methods (KernelRidge, SVR, Gaussian Processes). Training time is comparable to other kernel methods. See Benchmarks for detailed comparisons.

Architecture

There are three main components to KernelBooster: KernelBooster class that does the boosting, KernelTree class that does the splitting and KernelEstimator class that implements the local constant estimation. As kernel methods are computationally expensive, the guiding principle has been computational efficiency.

After calling fit, KernelBooster starts a training loop which is mostly identical to the algorithm described in Friedman (2001). The main difference is that KernelTree does not choose features through its splits but is instead given them by the booster class. Default feature selection is random with increasing kernel sizes in terms of number of features. Random feature selection naturally creates randomness to training results, which can be mitigated with a lower learning rate and more rounds. Similarly to Friedman (2001), KernelBooster can fit several different objective functions, which are passed in as an Objective class.

KernelTree splits numerical data by density and categorical data by MSE. The idea here is that the kernel bandwidth should largely depend on how dense the data is. For numerical data, KernelTree splits until number of observations is below the 'max_sample' parameter. Besides finding regions which would be well served by the same bandwidth, this has the benefit of speeding up computation significantly in calculating the kernel matrices for the kernel estimator. For example, with ten splits we go from computing a (n, n) matrix to computing ten (n/10, n/10) matrices with n²/10 operations instead of n² (assuming equal splits). This saves a whopping 90% of compute.

The actual estimation is handled by KernelEstimator. It optimizes a scalar precision (inverse bandwidth) for the local constant estimator using leave-one-out cross validation and random search between given bounds. It has both Gaussian and (isotropic) Laplace kernels with default being the Laplace kernel. KernelEstimator also has uncertainty quantification methods for quantile and conditional variance prediction, but these are at this moment still experimental as they use a "naive" single kernel method whose precision is optimized for mean prediction.

Notable features

Beyond the core boosting algorithm, KernelBooster includes a few features worth highlighting:

Smart Feature Selection

While the default feature selection is random (RandomSelector), the package includes an mRMR style probabilistic algorithm (SmartSelector) based on correlations between features and pseudo-residuals and performance in previous boosting rounds.

from kernelboost.feature_selection import SmartSelector

selector = SmartSelector(
    redundancy_penalty=0.4,
    relevance_alpha=0.7,
    recency_penalty=0.3,
)

booster = KernelBooster(
    objective=MSEObjective(),
    feature_selector=selector,
)

Early Stopping

Training stops automatically if evaluation loss doesn't improve for consecutive rounds (controlled by early_stopping_rounds parameter).

booster.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=20)

RhoOptimizer

RhoOptimizer performs post-hoc optimization of step sizes, often improving predictions at minimal additional cost. It can also back out optimal regularization parameters (L1 penalty and learning rate) — useful when unsure what level of regularization to use.

from kernelboost.rho_optimizer import RhoOptimizer

opt = RhoOptimizer(booster, lambda_reg=1.0)
opt.fit(X_val, y_val)
opt.update_booster()

# Back out optimal hyperparameters
lambda1, learning_rate = opt.find_hyperparameters()

Uncertainty Quantification (Experimental)

KernelBooster has both prediction intervals and conditional variance prediction based on kernel estimation. These come for "free" on top of training and require no extra data. Still work in progress.

# Prediction intervals (90% by default)
lower, upper = booster.predict_intervals(X, alpha=0.1)

# Conditional variance estimates
variance = booster.predict_variance(X)

Both interval coverage and conditional variance have a tendency to be underestimated, but this depends on the data and how well boosting has converged. No special tuning required: settings that optimize MSE also give reasonable uncertainty estimates. See benchmarks for a comparison with Gaussian Processes.

Data Preprocessing

Scaling data is a good idea for kernel estimation methods. The package includes a simple RankTransformer that often works well.

from kernelboost.utilities import RankTransformer

scaler = RankTransformer(pct=True)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Like other kernel methods, KernelBooster works best with continuous, smooth features. For datasets with many categorical features, tree-based methods are often better suited—they handle splits on categories naturally.

API Reference

Class Purpose
KernelBooster Main booster for regression/binary classification
MulticlassBooster One-vs-rest multiclass wrapper
MSEObjective Mean squared error (regression)
EntropyObjective Cross-entropy (binary classification)
QuantileObjective Pinball loss (quantile regression)
SmartSelector mRMR-style feature selection
RandomSelector Random feature selection
RhoOptimizer Post-hoc step size optimization
RankTransformer Percentile normalization

KernelBooster Main Parameters

Parameter Default Description
objective Required Loss function: MSEObjective(), EntropyObjective(), QuantileObjective()
rounds auto Boosting iterations (auto = n_features * 10)
max_features auto Max features per estimator (auto = min(10, n_features))
min_features 1 Min features per estimator
kernel_type 'laplace' Kernel function: 'laplace' or 'gaussian'
learning_rate 0.5 Step size shrinkage factor
lambda1 0.0 L1 regularization
use_gpu False Enable GPU acceleration

Benchmarks

Results have inherent randomness due to feature selection and subsampling. Scripts available in benchmarks/.

Regression (California Housing)

=================================================================
Model                       MSE        MAE         R²       Time
-----------------------------------------------------------------
KernelBooster            0.2053     0.2985     0.8452      11.0s
sklearn HGBR             0.2247     0.3146     0.8306       0.1s
XGBoost                  0.2155     0.3050     0.8376       0.1s
LightGBM                 0.2097     0.3047     0.8419       0.1s
=================================================================

Binary Classification (Breast Cancer)

=================================================================
Model                  Accuracy    AUC-ROC         F1       Time
-----------------------------------------------------------------
KernelBooster            0.9825     0.9984     0.9861       1.6s
sklearn HGBC             0.9649     0.9944     0.9722       0.1s
XGBoost                  0.9561     0.9938     0.9650       0.0s
LightGBM                 0.9649     0.9925     0.9722       0.0s
=================================================================

Comparison with Kernel Methods (California Housing)

=================================================================
Kernel Methods Benchmark (n_train=10000)
=================================================================
Model                       MSE        MAE         R²       Time
-----------------------------------------------------------------
KernelBooster            0.2091     0.3054     0.8430       6.5s
KernelRidge              0.4233     0.4835     0.6822       1.7s
SVR                      0.3136     0.3780     0.7646       3.5s
GP (n=5000)              0.3297     0.4061     0.7524      67.7s
=================================================================

Uncertainty Quantification (California Housing)

Prediction intervals and conditional variance estimates compared to Gaussian Process (sklearn) regression:

=================================================================
Uncertainty Quantification (90% intervals, alpha=0.1)
=================================================================
Model                  Coverage    Width    Var Corr   Var Ratio
-----------------------------------------------------------------
KernelBooster            88.1%    1.235      0.206       1.621
GP (n=5000)              90.9%    1.863      0.157       1.026
=================================================================

Var Corr is the correlation between predicted variance and squared errors. Var Ratio is the ratio between mean of squared_errors and predicted variance.

CPU/GPU training time comparison (California Housing)

=================================================================
GPU vs CPU Training Time (California Housing, n=10000)
=================================================================
Backend                                                  Time
-----------------------------------------------------------------
CPU (C/OpenMP)                                          38.6s
GPU (CuPy/CUDA)                                          4.6s
=================================================================
GPU speedup: 8.3x

All benchmarks run on Ubuntu 22.04 with Ryzen 7700 and RTX 3090.

References

  • Fan, J., & Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall.
  • Fan, J., & Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645–660.
  • Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189-1232.
  • Hansen, B. E. (2004). Nonparametric Conditional Density Estimation. Working paper, University of Wisconsin.

About

KernelBoost is a hobby project exploring alternatives to tree-based gradient boosting. Currently v0.1.0. Pre-compiled binaries included for Linux and Windows. Contributions and feedback welcome.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kernelboost-0.1.0.tar.gz (60.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kernelboost-0.1.0-py3-none-any.whl (68.2 kB view details)

Uploaded Python 3

File details

Details for the file kernelboost-0.1.0.tar.gz.

File metadata

  • Download URL: kernelboost-0.1.0.tar.gz
  • Upload date:
  • Size: 60.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for kernelboost-0.1.0.tar.gz
Algorithm Hash digest
SHA256 871b10c8f0345d3ef53b136d13449f4ea22b0ab32a42c4dd9d0334ce89f1b4db
MD5 4c406f4f98f54bae7def42c4f9a6b538
BLAKE2b-256 954b8f1a26161844b574813c58f2fc6b809b54eccfd507dbc35a47280508931e

See more details on using hashes here.

File details

Details for the file kernelboost-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kernelboost-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 68.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for kernelboost-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2a760bff81bb49cd9077b0978a76f4c2d2cc27343101380b89872b37ea19f68
MD5 0696b17f8897f18d9b93377b5467424a
BLAKE2b-256 96c59a453acdda88d7f267c0ac42d804265bc0a2594aa5d267d5f3434f29cef9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page