HRBoost: Hierarchical Refined Boost - GBDT with Non-monotonic Bayesian Hierarchical Clustering
Project description
HRBoost (Hierarchical Refined Boost)
HRBoost is a fast, lightweight Gradient Boosting Decision Tree (GBDT) library built in C++ and Python. It introduces a Non-monotonic Bayesian Hierarchical Clustering (LNM-BHC, $k=3$) algorithm inside its core engine to find optimal splits for high-cardinality categorical variables with zero manual parameter tuning.
HRBoost is 100% compliant with the scikit-learn API, offering both HRBoostClassifier and HRBoostRegressor.
Installation
pip install hrboost
Hyperparameter Reference
HRBoostClassifier and HRBoostRegressor accept the following parameters in their constructors:
Core GBDT Parameters
n_estimators(int, default=200): The number of boosting rounds (trees to build).learning_rate(float, default=0.1): Shrinkage rate applied to each tree's update to prevent overfitting.max_depth(int, default=4): Maximum depth of each decision tree.max_leaves(int, default=64): Maximum number of leaves allowed per tree.reg_lambda(float, default=1.0): L2 regularization term on weights. It also scales the baseline regularization for Bayesian Hierarchical Clustering.subsample(float, default=0.8): Fraction of training samples randomly chosen to train each tree.colsample_bytree(float, default=1.0): Fraction of features randomly selected for building each tree.n_bins(int, default=32): Maximum number of discrete bins to bucket continuous features.
Split Constraints
min_child_weight(float, default=0.1): Minimum sum of instance Hessian needed in a child node.gamma(float, default=0.0): Minimum loss reduction required to make a split.max_delta_step(float, default=0.0): Maximum delta step allowed for each tree's leaf output (useful for highly unbalanced classes).
System & Features
cat_features(list of int, default=None): List of feature indices to be treated as categorical features.random_state(int, default=0): Seed for random number generators (subsampling, colsample).verbose(bool, default=True): Controls C++ engine logging during training.
Environment Variables for Advanced Tuning
HRBoost exposes internal engine dynamics through system environment variables to avoid hyperparameter inflation:
COHESION_REG(float, default=0.3):- Controls the intensity of the Dynamic Cohesion Regularization during tree splitting.
- A cohesion penalty factor is computed dynamically based on the difference in predicted leaf values between prospective children. If child leaf predictions diverge excessively, L2 regularization is dynamically increased.
- Set
export COHESION_REG=0.0to disable this penalty. High-noise categorical settings benefit from higher values (e.g.,0.5or1.0).
MIN_CAT_COUNT(float, default=automatically scaled):- The minimum count required for a categorical bin to participate in BHC clustering. It helps filter out extremely rare categorical values.
Quick Start
1. Classification
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from hrboost import HRBoostClassifier
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
clf = HRBoostClassifier(n_estimators=100, learning_rate=0.1, max_depth=4)
clf.fit(X_train, y_train)
print(f"Test Accuracy: {clf.score(X_test, y_test):.4f}")
2. Regression
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from hrboost import HRBoostRegressor
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(
diabetes.data, diabetes.target, test_size=0.2, random_state=42
)
reg = HRBoostRegressor(n_estimators=150, learning_rate=0.08, max_depth=4)
reg.fit(X_train, y_train)
print(f"Test R2 Score: {reg.score(X_test, y_test):.4f}")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hrboost-0.1.1.tar.gz.
File metadata
- Download URL: hrboost-0.1.1.tar.gz
- Upload date:
- Size: 49.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d84ef2d01871846ca936b262c9f9a69376ea0eb68eb7f93898d84df31564f6ff
|
|
| MD5 |
3d9ffcd2a48ac49839ffce63acc18fc8
|
|
| BLAKE2b-256 |
4345e1708162aac7e16324a625f7cf658ff597e080602ad8efc7c4fe9eb98aa6
|
File details
Details for the file hrboost-0.1.1-py3-none-any.whl.
File metadata
- Download URL: hrboost-0.1.1-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01d03e51789feaf9a3f7f23c60e6f932c71fbaac8be6aef8f211d15af85dcefa
|
|
| MD5 |
84d16fe185ed37ae5787cd8b83b38026
|
|
| BLAKE2b-256 |
a1c14a96d72cf6ec306545fe11371aefe6a6578b258872a222b1e5048e0f5a8d
|