HRBoost: Hierarchical Refined Boost - GBDT with Non-monotonic Bayesian Hierarchical Clustering

Project description

HRBoost (Hierarchical Refined Boost)

HRBoost is a fast, lightweight Gradient Boosting Decision Tree (GBDT) library built in C++ and Python. It introduces a Non-monotonic Bayesian Hierarchical Clustering (LNM-BHC, $k=3$) algorithm inside its core engine to find optimal splits for high-cardinality categorical variables with zero manual parameter tuning.

HRBoost is 100% compliant with the scikit-learn API, offering both HRBoostClassifier and HRBoostRegressor.

Installation

pip install hrboost

Hyperparameter Reference

HRBoostClassifier and HRBoostRegressor accept the following parameters in their constructors:

Core GBDT Parameters

n_estimators (int, default=200): The number of boosting rounds (trees to build).
learning_rate (float, default=0.1): Shrinkage rate applied to each tree's update to prevent overfitting.
max_depth (int, default=4): Maximum depth of each decision tree.
max_leaves (int, default=64): Maximum number of leaves allowed per tree.
reg_lambda (float, default=1.0): L2 regularization term on weights. It also scales the baseline regularization for Bayesian Hierarchical Clustering.
subsample (float, default=0.8): Fraction of training samples randomly chosen to train each tree.
colsample_bytree (float, default=1.0): Fraction of features randomly selected for building each tree.
n_bins (int, default=32): Maximum number of discrete bins to bucket continuous features.

Split Constraints

min_child_weight (float, default=0.1): Minimum sum of instance Hessian needed in a child node.
gamma (float, default=0.0): Minimum loss reduction required to make a split.
max_delta_step (float, default=0.0): Maximum delta step allowed for each tree's leaf output (useful for highly unbalanced classes).

System & Features

cat_features (list of int, default=None): List of feature indices to be treated as categorical features.
random_state (int, default=0): Seed for random number generators (subsampling, colsample).
verbose (bool, default=True): Controls C++ engine logging during training.

Environment Variables for Advanced Tuning

HRBoost exposes internal engine dynamics through system environment variables to avoid hyperparameter inflation:

COHESION_REG (float, default=0.3):
- Controls the intensity of Dynamic Cohesion Regularization during tree splitting.
- Cohesion measures how similar the two prospective child nodes are in terms of their leaf weight estimates ($dL = G_L/H_L$, $dR = G_R/H_R$). When children are similar (high cohesion — uninformative split), L2 regularization is dynamically increased to penalize the split. When children diverge (low cohesion — informative split), regularization stays at the base reg_lambda.
- Set export COHESION_REG=0.0 to disable and revert to standard XGBoost-style gain. High-noise or high-cardinality categorical settings benefit from higher values (e.g., 0.5 or 1.0).
MIN_CAT_COUNT (float, default=automatically scaled):
- The minimum count required for a categorical bin to participate in BHC clustering. It helps filter out extremely rare categorical values.

Quick Start

1. Classification

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from hrboost import HRBoostClassifier

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

clf = HRBoostClassifier(n_estimators=100, learning_rate=0.1, max_depth=4)
clf.fit(X_train, y_train)

print(f"Test Accuracy: {clf.score(X_test, y_test):.4f}")

2. Regression

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from hrboost import HRBoostRegressor

diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(
    diabetes.data, diabetes.target, test_size=0.2, random_state=42
)

reg = HRBoostRegressor(n_estimators=150, learning_rate=0.08, max_depth=4)
reg.fit(X_train, y_train)

print(f"Test R2 Score: {reg.score(X_test, y_test):.4f}")

Project details

Release history Release notifications | RSS feed

This version

0.1.4

Jun 21, 2026

0.1.3

Jun 21, 2026

0.1.2

Jun 21, 2026

0.1.1

Jun 21, 2026

0.1.0

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hrboost-0.1.4.tar.gz (51.3 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hrboost-0.1.4-py3-none-any.whl (46.7 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file hrboost-0.1.4.tar.gz.

File metadata

Download URL: hrboost-0.1.4.tar.gz
Upload date: Jun 21, 2026
Size: 51.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for hrboost-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`10621a8e55410b0a7f733882c36e6bfd556bc8cee74f82c1b507719e26e7e12d`
MD5	`4b5577b0c17a85010434a10135e4705f`
BLAKE2b-256	`aefedc8b9530f9f0ccc3b09ff734c8efe3b25ad40b0164172a8a09c371f88056`

See more details on using hashes here.

File details

Details for the file hrboost-0.1.4-py3-none-any.whl.

File metadata

Download URL: hrboost-0.1.4-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 46.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for hrboost-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a02b3933163a6400c33cb84b3714d46ddaa5eeaabfae758afcf4a5a87ad5211b`
MD5	`d0fc97e00af19551595ca7c16fdbb619`
BLAKE2b-256	`0f52dc62b1af71e18dd59d8e094e53f9624455100576dd9da4ef717f941e0ff7`

See more details on using hashes here.

hrboost 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

HRBoost (Hierarchical Refined Boost)

Installation

Hyperparameter Reference

Core GBDT Parameters

Split Constraints

System & Features

Environment Variables for Advanced Tuning

Quick Start

1. Classification

2. Regression

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes