HRBoost: Hierarchical Refined Boost - GBDT with Non-monotonic Bayesian Hierarchical Clustering
Project description
HRBoost (Hierarchical Refined Boost)
HRBoost is a fast, lightweight Gradient Boosting Decision Tree (GBDT) library built in C++ and Python. It introduces a Non-monotonic Bayesian Hierarchical Clustering (LNM-BHC, $k=3$) algorithm inside its core engine to find optimal splits for high-cardinality categorical variables with zero manual parameter tuning.
It is designed to be 100% compliant with the scikit-learn API, offering both HRBoostClassifier and HRBoostRegressor.
Key Features
- Optimal Categorical Splitting (LNM-BHC): Implements non-monotonic Bayesian Hierarchical Clustering to capture categorical structure under noise without sorting artifacts.
- Zero-Parameter Diet: Slimmed-down hyperparameter interface where BHC regularization uses a robust fixed sliding window size $k=3$ and falls back to
reg_lambda. - Scikit-Learn Compliant: Direct replacement for
LGBMClassifier/RegressororXGBClassifier/Regressorin python pipelines. - COHESION_REG Tuning: Keep control of dynamic regularization sensitivity via the
COHESION_REGenvironment variable (default:0.3).
Installation
From PyPI
pip install hrboost
From Source
Ensure you have a C++ compiler supporting C++17.
git clone https://github.com/yourusername/hrboost.git
cd hrboost
sh build.sh
pip install -e .
Quick Start
1. Classification (HRBoostClassifier)
HRBoostClassifier supports binary and multiclass tasks natively.
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from hrboost import HRBoostClassifier
# Load digits dataset (10 classes)
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
# Initialize & fit
clf = HRBoostClassifier(
n_estimators=100,
learning_rate=0.1,
max_depth=4,
random_state=42,
objective="multiclass"
)
clf.fit(X_train, y_train)
# Predict probabilities and classes
probs = clf.predict_proba(X_test)
preds = clf.predict(X_test)
accuracy = np.mean(preds == y_test)
print(f"Accuracy: {accuracy:.4f}")
2. Regression (HRBoostRegressor)
HRBoostRegressor models continuous target values with Mean Squared Error (MSE) objective.
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error
from hrboost import HRBoostRegressor
# Load diabetes dataset
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(
diabetes.data, diabetes.target, test_size=0.2, random_state=42
)
# Initialize & fit
reg = HRBoostRegressor(
n_estimators=150,
learning_rate=0.08,
max_depth=4,
random_state=42
)
reg.fit(X_train, y_train)
# Predict
preds = reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print(f"MSE: {mse:.4f}")
3. Dynamic Regularization Sensitivity (COHESION_REG)
You can tune BHC's dynamic regularization cohesion penalty via the environment variable:
export COHESION_REG=0.5
python your_script.py
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hrboost-0.1.0.tar.gz.
File metadata
- Download URL: hrboost-0.1.0.tar.gz
- Upload date:
- Size: 47.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7745c4e659c04607c2ff52e935f82a8e10e649c349e963dd3b7e723b7161e34c
|
|
| MD5 |
f23a3d23fea7b49c70ff44fa2d99835d
|
|
| BLAKE2b-256 |
a715bd774c687f259c8bcb788201fd0e6d4b6c5cf7bffda7b243d7ab7b757f96
|
File details
Details for the file hrboost-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hrboost-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
742e9ae0f697e4215a86c0852c5c8c5921f481510bf06f74a01e77551fe8d1a5
|
|
| MD5 |
2bf7bfe7f6021071ac6955295b62cb66
|
|
| BLAKE2b-256 |
9bb1c93cbdc68ec179a32333f5c3aaadc0443df028d1d9dc56852910332f32ff
|