Rust-first gradient boosting for regression, classification, and ranking with time-aware validation and Python bindings
Project description
AlloyGBM
AlloyGBM is a Rust-first gradient boosting library with Python bindings, supporting regression, binary and multi-class classification, and learning-to-rank. It is built for fast native execution, deterministic training, and time-aware tabular workflows.
AlloyGBM is strongest on panel and finance-style problems where leakage-aware validation and practical iteration speed matter. It also performs competitively on general tabular benchmarks and includes native artifact prediction, TreeSHAP explanations, and purged time-series split helpers.
When To Use AlloyGBM
AlloyGBM is a good fit when you want:
- a native Rust-backed gradient boosting library with regression, classification, and ranking
- deterministic CPU training and inference
- sklearn-compatible estimators (
GBMRegressor,GBMClassifier,GBMRanker) - time-aware validation helpers for forecasting or panel-style workflows
- native prediction from serialized artifacts
- TreeSHAP explanations and global feature importances
- NaN/missing value support out of the box
- model persistence via pickle, save/load, or artifact export
Installation
PyPI:
pip install alloygbm
From source:
python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release
AlloyGBM targets Python 3.11+ and uses a native Rust extension module.
Wheel targets for 0.3.0:
- macOS
arm64 - Linux
x86_64(manylinux) - source distribution for other platforms
Quick Examples
Regression
from alloygbm import GBMRegressor, rmse
model = GBMRegressor(
learning_rate=0.05,
max_depth=6,
n_estimators=1200,
deterministic=True,
seed=7,
)
model.fit(X_train, y_train, eval_set=(X_valid, y_valid))
print(rmse(y_test, model.predict(X_test)))
Binary Classification
from alloygbm import GBMClassifier, accuracy, log_loss
model = GBMClassifier(
learning_rate=0.05,
max_depth=6,
n_estimators=500,
deterministic=True,
seed=7,
)
model.fit(X_train, y_train)
labels = model.predict(X_test) # [0, 1, 1, 0, ...]
probas = model.predict_proba(X_test) # [[P(0), P(1)], ...]
print("accuracy:", accuracy(y_test, labels))
print("log_loss:", log_loss(y_test, probas[:, 1]))
Learning-to-Rank
from alloygbm import GBMRanker, ndcg
model = GBMRanker(
ranking_objective="rank:ndcg",
learning_rate=0.05,
max_depth=6,
n_estimators=300,
deterministic=True,
seed=7,
)
model.fit(X_train, y_train, group=query_ids_train)
scores = model.predict(X_test)
print("NDCG@10:", ndcg(y_test, scores, group=query_ids_test, k=10))
Time-Aware Validation
from alloygbm import GBMRegressor, purged_time_series_splits, rmse
splits = purged_time_series_splits(time_index, n_splits=5, purge_gap=1, embargo=0)
for train_idx, test_idx in splits:
model = GBMRegressor(deterministic=True, seed=7)
model.fit(
[rows[i] for i in train_idx],
[targets[i] for i in train_idx],
)
score = rmse(
[targets[i] for i in test_idx],
model.predict([rows[i] for i in test_idx]),
)
For panel data, use purged_panel_splits(...).
Model Persistence
import pickle
# Pickle round-trip
with open("model.pkl", "wb") as f:
pickle.dump(model, f)
with open("model.pkl", "rb") as f:
model = pickle.load(f)
# Native save/load
model.save_model("model.agbm")
loaded = GBMRegressor.load_model("model.agbm")
# Artifact export for deployment
artifact_bytes = model.artifact_bytes
Feature Summary
Estimators
GBMRegressor-- squared-error regression with dataset-awaretraining_policyGBMClassifier-- binary classification with log-loss objective,predict_proba, sklearnClassifierMixinGBMRanker-- learning-to-rank with 5 objectives:rank:pairwise,rank:ndcg,rank:xendcg,queryrmse,yetirank- All estimators are sklearn-compatible (
get_params,set_params,score, pipeline integration)
Training Features
- NaN/missing value support with learned split direction
- Sample weights via
fit(..., sample_weight=...) - Monotone constraints via
monotone_constraints - Feature importance weighting via
feature_weights - Leaf-wise (best-first) tree growth via
tree_growth="leaf" - Warm-starting / incremental training via
warm_start=True - Up to 65,535 bins per feature (
continuous_binning_max_bins) - Multiple categorical column support via
categorical_feature_indices - Early stopping with
best_iteration_,best_score_,evals_result_ - Objective-aware training metric tracking (RMSE, log-loss, accuracy, NDCG)
Inference and Explanations
- Zero-copy numpy prediction from native artifacts
- TreeSHAP explanations via
shap_values(...)(polynomial-time, no feature limit) - Global feature importance via
feature_importances(...) - Artifact-backed prediction via
predict_from_artifact(...)
Validation Helpers
purged_time_series_splits(...)-- leakage-aware time-series cross-validationpurged_panel_splits(...)-- panel-data cross-validation
Metrics
- Regression:
rmse,mae,r2_score - Classification:
accuracy,log_loss - Ranking:
ndcg - Finance:
pearson_correlation,rank_ic,hit_rate,icir
Benchmark Snapshot
The benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost across regression, classification, and ranking tasks.
Regression:
- AlloyGBM is strongest on
panel_time_series - AlloyGBM is strong on
dow_jones_financial - AlloyGBM is competitive on
dense_numeric, trails oncalifornia_housingandbike_sharing
Classification:
- AlloyGBM is competitive with established libraries on
breast_cancerandsynthetic_classification
Ranking:
- AlloyGBM competes on
synthetic_rankingusing its native LambdaMART implementation
Benchmark tooling and methodology live in benchmarks/README.md.
Current Limitations
- Binary classification only (no multi-class yet)
- CPU-only runtime (GPU backend is architecturally planned but not implemented)
- No custom objective / custom metric callbacks from Python
- No interaction constraints
- No dart/goss boosting modes
Documentation
- Docs index: docs/README.md
- Benchmark guide: benchmarks/README.md
- Current roadmap: docs/roadmap/current.md
- Archive: docs/archive/README.md
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alloygbm-0.3.0.tar.gz.
File metadata
- Download URL: alloygbm-0.3.0.tar.gz
- Upload date:
- Size: 203.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d3707ddb493dbeb26dfdbc90ae3fafa3ebef5641b6b77507edaf8cceaa3841e
|
|
| MD5 |
cd10ec62f7e208d287b5c75fad17c424
|
|
| BLAKE2b-256 |
3fdc485768ab6af310b560e4441fa98e61378e4728b6b99b2218137b99c8456a
|
Provenance
The following attestation bundles were made for alloygbm-0.3.0.tar.gz:
Publisher:
publish.yml on LGA-Personal/AlloyGBM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloygbm-0.3.0.tar.gz -
Subject digest:
9d3707ddb493dbeb26dfdbc90ae3fafa3ebef5641b6b77507edaf8cceaa3841e - Sigstore transparency entry: 1322681056
- Sigstore integration time:
-
Permalink:
LGA-Personal/AlloyGBM@9248326099eacc30925451d578b355202efa40a0 -
Branch / Tag:
refs/tags/V0.3.0 - Owner: https://github.com/LGA-Personal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9248326099eacc30925451d578b355202efa40a0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file alloygbm-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: alloygbm-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 938.9 kB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41cb19e640b18134a6e2c9614ccc232aeb2b6f1315db7d2dd8e548a5f92972fb
|
|
| MD5 |
398bda5723dbd5e732da090b9d8104d9
|
|
| BLAKE2b-256 |
4e0e4ce95f5630f735d56aa2b9de1430e75b59cfe098715fa1bbf879e474638d
|
Provenance
The following attestation bundles were made for alloygbm-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on LGA-Personal/AlloyGBM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloygbm-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
41cb19e640b18134a6e2c9614ccc232aeb2b6f1315db7d2dd8e548a5f92972fb - Sigstore transparency entry: 1322681263
- Sigstore integration time:
-
Permalink:
LGA-Personal/AlloyGBM@9248326099eacc30925451d578b355202efa40a0 -
Branch / Tag:
refs/tags/V0.3.0 - Owner: https://github.com/LGA-Personal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9248326099eacc30925451d578b355202efa40a0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file alloygbm-0.3.0-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: alloygbm-0.3.0-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 838.6 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e061e3223fa4b9bd3a54647f44c0481807e8d447565aaebc58d599516504084
|
|
| MD5 |
ae0554317c55912668a83b8fef110acb
|
|
| BLAKE2b-256 |
98b8dfadd5e45513641b058a79a9d347c7e1b2c10382501fc612d3e1d9f5b6a7
|
Provenance
The following attestation bundles were made for alloygbm-0.3.0-cp311-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on LGA-Personal/AlloyGBM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloygbm-0.3.0-cp311-abi3-macosx_11_0_arm64.whl -
Subject digest:
9e061e3223fa4b9bd3a54647f44c0481807e8d447565aaebc58d599516504084 - Sigstore transparency entry: 1322681168
- Sigstore integration time:
-
Permalink:
LGA-Personal/AlloyGBM@9248326099eacc30925451d578b355202efa40a0 -
Branch / Tag:
refs/tags/V0.3.0 - Owner: https://github.com/LGA-Personal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9248326099eacc30925451d578b355202efa40a0 -
Trigger Event:
release
-
Statement type: