Rust-first gradient boosting for regression, classification, and ranking with time-aware validation and Python bindings
Project description
AlloyGBM
AlloyGBM is a Rust-first gradient boosting library with Python bindings, supporting regression, binary and multi-class classification, and learning-to-rank. It is built for fast native execution, deterministic training, and time-aware tabular workflows.
AlloyGBM is strongest on panel and finance-style problems where leakage-aware validation and practical iteration speed matter. It also performs competitively on general tabular benchmarks and includes native artifact prediction, TreeSHAP explanations, and purged time-series split helpers.
When To Use AlloyGBM
AlloyGBM is a good fit when you want:
- a native Rust-backed gradient boosting library with regression, classification, and ranking
- deterministic CPU training and inference
- sklearn-compatible estimators (
GBMRegressor,GBMClassifier,GBMRanker) - time-aware validation helpers for forecasting or panel-style workflows
- native prediction from serialized artifacts
- TreeSHAP explanations and global feature importances
- NaN/missing value support out of the box
- model persistence via pickle, save/load, or artifact export
Installation
PyPI:
pip install alloygbm
From source:
python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release
AlloyGBM targets Python 3.11+ and uses a native Rust extension module.
Wheel targets for 0.5.0:
- macOS
arm64 - Linux
x86_64(manylinux) - source distribution for other platforms
Quick Examples
Regression
from alloygbm import GBMRegressor, rmse
model = GBMRegressor(
learning_rate=0.05,
max_depth=6,
n_estimators=1200,
deterministic=True,
seed=7,
)
model.fit(X_train, y_train, eval_set=(X_valid, y_valid))
print(rmse(y_test, model.predict(X_test)))
Binary Classification
from alloygbm import GBMClassifier, accuracy, log_loss
model = GBMClassifier(
learning_rate=0.05,
max_depth=6,
n_estimators=500,
deterministic=True,
seed=7,
)
model.fit(X_train, y_train)
labels = model.predict(X_test) # [0, 1, 1, 0, ...]
probas = model.predict_proba(X_test) # [[P(0), P(1)], ...]
print("accuracy:", accuracy(y_test, labels))
print("log_loss:", log_loss(y_test, probas[:, 1]))
Learning-to-Rank
from alloygbm import GBMRanker, ndcg
model = GBMRanker(
ranking_objective="rank:ndcg",
learning_rate=0.05,
max_depth=6,
n_estimators=300,
deterministic=True,
seed=7,
)
model.fit(X_train, y_train, group=query_ids_train)
scores = model.predict(X_test)
print("NDCG@10:", ndcg(y_test, scores, group=query_ids_test, k=10))
MorphBoost (Adaptive Split Criterion)
MorphBoost is an opt-in training mode that blends the standard gradient gain
with a normalized information-theoretic term. Across rounds, the blend ramps
in via a tanh(iter/20) warmup, an EMA over per-class gradient statistics
shapes split selection, and leaf magnitudes are scaled by a depth penalty
and per-iteration shrinkage. See the
MorphBoost paper for the formulation.
from alloygbm import GBMRegressor
# Constant LR (default) with morph adaptive split criterion
model = GBMRegressor(
n_estimators=1200,
max_depth=6,
learning_rate=0.05,
training_mode="morph", # opt in
morph_rate=0.1, # per-round leaf shrinkage
info_score_weight=0.3, # blend weight for info-theoretic term
depth_penalty_base=0.9, # multiplier per depth level
balance_penalty=True, # penalize highly imbalanced splits
seed=7,
)
model.fit(X_train, y_train)
# With warmup-cosine LR schedule (good fit for very-low-LR runs)
model = GBMRegressor(
n_estimators=5000,
learning_rate=0.01,
training_mode="morph",
lr_schedule="warmup_cosine",
lr_warmup_frac=0.1, # fraction of n_estimators spent in warmup
seed=7,
)
training_mode="morph" works with GBMClassifier and GBMRanker too, with
identical parameter semantics.
Piecewise-Linear Leaves
Set leaf_model="linear" on any estimator to replace scalar leaves with small
closed-form linear models (f_s(x) = b_s + Σ α_j x_j). Weights are solved via
ridge regression α* = -(XᵀHX + λI)⁻¹ Xᵀg regularised by lambda_l2. This
typically converges in fewer rounds on data with linear within-node residual
structure (e.g. California Housing), at a 2–8× per-round training overhead.
from alloygbm import GBMRegressor
model = GBMRegressor(
n_estimators=300,
max_depth=6,
learning_rate=0.05,
leaf_model="linear",
lambda_l2=0.01, # recommended >= 0.01 with linear leaves
seed=7,
)
model.fit(X_train, y_train)
leaf_model="linear" works with GBMClassifier and GBMRanker, and composes
with training_mode="morph". SHAP currently requires leaf_model="constant".
Time-Aware Validation
from alloygbm import GBMRegressor, purged_time_series_splits, rmse
splits = purged_time_series_splits(time_index, n_splits=5, purge_gap=1, embargo=0)
for train_idx, test_idx in splits:
model = GBMRegressor(deterministic=True, seed=7)
model.fit(
[rows[i] for i in train_idx],
[targets[i] for i in train_idx],
)
score = rmse(
[targets[i] for i in test_idx],
model.predict([rows[i] for i in test_idx]),
)
For panel data, use purged_panel_splits(...).
Model Persistence
import pickle
# Pickle round-trip
with open("model.pkl", "wb") as f:
pickle.dump(model, f)
with open("model.pkl", "rb") as f:
model = pickle.load(f)
# Native save/load
model.save_model("model.agbm")
loaded = GBMRegressor.load_model("model.agbm")
# Artifact export for deployment
artifact_bytes = model.artifact_bytes
Feature Summary
Estimators
GBMRegressor-- squared-error regression with dataset-awaretraining_policyGBMClassifier-- binary classification with log-loss objective,predict_proba, sklearnClassifierMixinGBMRanker-- learning-to-rank with 5 objectives:rank:pairwise,rank:ndcg,rank:xendcg,queryrmse,yetirank- All estimators are sklearn-compatible (
get_params,set_params,score, pipeline integration)
Training Features
- NaN/missing value support with learned split direction
- Sample weights via
fit(..., sample_weight=...) - Monotone constraints via
monotone_constraints - Feature importance weighting via
feature_weights - Leaf-wise (best-first) tree growth via
tree_growth="leaf" - Warm-starting / incremental training via
warm_start=True - Up to 65,535 bins per feature (
continuous_binning_max_bins) - Multiple categorical column support via
categorical_feature_indices - Early stopping with
best_iteration_,best_score_,evals_result_ - Objective-aware training metric tracking (RMSE, log-loss, accuracy, NDCG)
- Adaptive split criterion via
training_mode="morph"(MorphBoost) - Per-iteration learning-rate schedules:
lr_schedule="constant"(default) or"warmup_cosine" - Piecewise-linear leaves via
leaf_model="linear"(closed-form ridge solve, faster convergence on linear-trend data)
Inference and Explanations
- Zero-copy numpy prediction from native artifacts
- TreeSHAP explanations via
shap_values(...)(polynomial-time, no feature limit) - Global feature importance via
feature_importances(...) - Artifact-backed prediction via
predict_from_artifact(...)
Validation Helpers
purged_time_series_splits(...)-- leakage-aware time-series cross-validationpurged_panel_splits(...)-- panel-data cross-validation
Metrics
- Regression:
rmse,mae,r2_score - Classification:
accuracy,log_loss - Ranking:
ndcg - Finance:
pearson_correlation,rank_ic,hit_rate,icir
Benchmark Snapshot
The benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost across regression, classification, and ranking tasks.
Regression:
- AlloyGBM is strongest on
panel_time_series - AlloyGBM is strong on
dow_jones_financial - AlloyGBM is competitive on
dense_numeric, trails oncalifornia_housingandbike_sharing
Classification:
- AlloyGBM is competitive with established libraries on
breast_cancerandsynthetic_classification
Ranking:
- AlloyGBM competes on
synthetic_rankingusing its native LambdaMART implementation
Benchmark tooling and methodology live in benchmarks/README.md.
Current Limitations
- CPU-only runtime (GPU backend is architecturally planned but not implemented)
- No interaction constraints
- No dart/goss boosting modes
- SHAP not yet supported with
leaf_model="linear"(use"constant"for now)
Documentation
- Docs index: docs/README.md
- Benchmark guide: benchmarks/README.md
- Current roadmap: docs/roadmap/current.md
- Archive: docs/archive/README.md
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alloygbm-0.5.0.tar.gz.
File metadata
- Download URL: alloygbm-0.5.0.tar.gz
- Upload date:
- Size: 258.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b7a6a7bc6218f30343c4d63f32bbacbe37bedf8726481daeaa26a587b088953
|
|
| MD5 |
3fdc8b468bed96f6656e924e58778ece
|
|
| BLAKE2b-256 |
1edc33d4027e89109ca5a9b73f0bb4aba9da03a03e827be45e0bf4bd8e85ddbc
|
Provenance
The following attestation bundles were made for alloygbm-0.5.0.tar.gz:
Publisher:
publish.yml on LGA-Personal/AlloyGBM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloygbm-0.5.0.tar.gz -
Subject digest:
9b7a6a7bc6218f30343c4d63f32bbacbe37bedf8726481daeaa26a587b088953 - Sigstore transparency entry: 1474350523
- Sigstore integration time:
-
Permalink:
LGA-Personal/AlloyGBM@870be397851ebb9748362dd3f0d830b16d411b69 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/LGA-Personal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@870be397851ebb9748362dd3f0d830b16d411b69 -
Trigger Event:
release
-
Statement type:
File details
Details for the file alloygbm-0.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: alloygbm-0.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d324614a71c2938c36edbc0dda87fa76c69a8730ca62bb0c1a2644ed901b8e7
|
|
| MD5 |
9b75621ff19b05d244da508511d411a6
|
|
| BLAKE2b-256 |
760771adf22bca5791c39056bc0c916b86faccf8198a2245817472ec1b4012ef
|
Provenance
The following attestation bundles were made for alloygbm-0.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on LGA-Personal/AlloyGBM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloygbm-0.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
0d324614a71c2938c36edbc0dda87fa76c69a8730ca62bb0c1a2644ed901b8e7 - Sigstore transparency entry: 1474350622
- Sigstore integration time:
-
Permalink:
LGA-Personal/AlloyGBM@870be397851ebb9748362dd3f0d830b16d411b69 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/LGA-Personal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@870be397851ebb9748362dd3f0d830b16d411b69 -
Trigger Event:
release
-
Statement type:
File details
Details for the file alloygbm-0.5.0-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: alloygbm-0.5.0-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 901.2 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
887889560e7d31b501a98083072161d23eae21e0e6b6d45a14930373afa53416
|
|
| MD5 |
eccea5ab511489674a61cdca01c8345d
|
|
| BLAKE2b-256 |
5a433f868dc2faf21ce8a22508e9353a05271d08fb3257d28be16e8cda159f04
|
Provenance
The following attestation bundles were made for alloygbm-0.5.0-cp311-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on LGA-Personal/AlloyGBM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloygbm-0.5.0-cp311-abi3-macosx_11_0_arm64.whl -
Subject digest:
887889560e7d31b501a98083072161d23eae21e0e6b6d45a14930373afa53416 - Sigstore transparency entry: 1474350572
- Sigstore integration time:
-
Permalink:
LGA-Personal/AlloyGBM@870be397851ebb9748362dd3f0d830b16d411b69 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/LGA-Personal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@870be397851ebb9748362dd3f0d830b16d411b69 -
Trigger Event:
release
-
Statement type: