Skip to main content

HRBoost: Hierarchical Refined Boost - GBDT with Non-monotonic Bayesian Hierarchical Clustering

Project description

HRBoost (Hierarchical Refined Boost)

HRBoost is a fast, lightweight Gradient Boosting Decision Tree (GBDT) library built in C++ and Python. It introduces a Non-monotonic Bayesian Hierarchical Clustering (LNM-BHC, $k=3$) algorithm inside its core engine to find optimal splits for high-cardinality categorical variables with zero manual parameter tuning.

It is designed to be 100% compliant with the scikit-learn API, offering both HRBoostClassifier and HRBoostRegressor.


Key Features

  • Optimal Categorical Splitting (LNM-BHC): Implements non-monotonic Bayesian Hierarchical Clustering to capture categorical structure under noise without sorting artifacts.
  • Zero-Parameter Diet: Slimmed-down hyperparameter interface where BHC regularization uses a robust fixed sliding window size $k=3$ and falls back to reg_lambda.
  • Scikit-Learn Compliant: Direct replacement for LGBMClassifier/Regressor or XGBClassifier/Regressor in python pipelines.
  • COHESION_REG Tuning: Keep control of dynamic regularization sensitivity via the COHESION_REG environment variable (default: 0.3).

Installation

From PyPI

pip install hrboost

From Source

Ensure you have a C++ compiler supporting C++17.

git clone https://github.com/yourusername/hrboost.git
cd hrboost
sh build.sh
pip install -e .

Quick Start

1. Classification (HRBoostClassifier)

HRBoostClassifier supports binary and multiclass tasks natively.

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from hrboost import HRBoostClassifier

# Load digits dataset (10 classes)
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

# Initialize & fit
clf = HRBoostClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=4,
    random_state=42,
    objective="multiclass"
)
clf.fit(X_train, y_train)

# Predict probabilities and classes
probs = clf.predict_proba(X_test)
preds = clf.predict(X_test)

accuracy = np.mean(preds == y_test)
print(f"Accuracy: {accuracy:.4f}")

2. Regression (HRBoostRegressor)

HRBoostRegressor models continuous target values with Mean Squared Error (MSE) objective.

from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error
from hrboost import HRBoostRegressor

# Load diabetes dataset
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(
    diabetes.data, diabetes.target, test_size=0.2, random_state=42
)

# Initialize & fit
reg = HRBoostRegressor(
    n_estimators=150,
    learning_rate=0.08,
    max_depth=4,
    random_state=42
)
reg.fit(X_train, y_train)

# Predict
preds = reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print(f"MSE: {mse:.4f}")

3. Dynamic Regularization Sensitivity (COHESION_REG)

You can tune BHC's dynamic regularization cohesion penalty via the environment variable:

export COHESION_REG=0.5
python your_script.py

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hrboost-0.1.0.tar.gz (47.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hrboost-0.1.0-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file hrboost-0.1.0.tar.gz.

File metadata

  • Download URL: hrboost-0.1.0.tar.gz
  • Upload date:
  • Size: 47.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for hrboost-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7745c4e659c04607c2ff52e935f82a8e10e649c349e963dd3b7e723b7161e34c
MD5 f23a3d23fea7b49c70ff44fa2d99835d
BLAKE2b-256 a715bd774c687f259c8bcb788201fd0e6d4b6c5cf7bffda7b243d7ab7b757f96

See more details on using hashes here.

File details

Details for the file hrboost-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hrboost-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for hrboost-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 742e9ae0f697e4215a86c0852c5c8c5921f481510bf06f74a01e77551fe8d1a5
MD5 2bf7bfe7f6021071ac6955295b62cb66
BLAKE2b-256 9bb1c93cbdc68ec179a32333f5c3aaadc0443df028d1d9dc56852910332f32ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page