Rust-first gradient boosting for regression, classification, and ranking with time-aware validation and Python bindings

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

LGA-Dev

These details have not been verified by PyPI

Project links

Documentation

Project description

AlloyGBM

AlloyGBM is a Rust-first gradient boosting library with Python bindings, supporting regression, binary and multi-class classification, and learning-to-rank. It is built for fast native execution, deterministic training, and time-aware tabular workflows.

AlloyGBM is strongest on panel and finance-style problems where leakage-aware validation and practical iteration speed matter. It also performs competitively on general tabular benchmarks and includes native artifact prediction, TreeSHAP explanations, and purged time-series split helpers.

When To Use AlloyGBM

AlloyGBM is a good fit when you want:

a native Rust-backed gradient boosting library with regression, classification, and ranking
deterministic CPU training and inference
sklearn-compatible estimators (GBMRegressor, GBMClassifier, GBMRanker)
time-aware validation helpers for forecasting or panel-style workflows
native prediction from serialized artifacts
TreeSHAP explanations and global feature importances
NaN/missing value support out of the box
model persistence via pickle, save/load, or artifact export

Installation

PyPI:

pip install alloygbm

From source:

python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release

AlloyGBM targets Python 3.11+ and uses a native Rust extension module.

Wheel targets for 0.5.0:

macOS arm64
Linux x86_64 (manylinux)
source distribution for other platforms

Quick Examples

Regression

from alloygbm import GBMRegressor, rmse

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train, eval_set=(X_valid, y_valid))
print(rmse(y_test, model.predict(X_test)))

Binary Classification

from alloygbm import GBMClassifier, accuracy, log_loss

model = GBMClassifier(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=500,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train)

labels = model.predict(X_test)            # [0, 1, 1, 0, ...]
probas = model.predict_proba(X_test)      # [[P(0), P(1)], ...]

print("accuracy:", accuracy(y_test, labels))
print("log_loss:", log_loss(y_test, probas[:, 1]))

Learning-to-Rank

from alloygbm import GBMRanker, ndcg

model = GBMRanker(
    ranking_objective="rank:ndcg",
    learning_rate=0.05,
    max_depth=6,
    n_estimators=300,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train, group=query_ids_train)

scores = model.predict(X_test)
print("NDCG@10:", ndcg(y_test, scores, group=query_ids_test, k=10))

MorphBoost (Adaptive Split Criterion)

MorphBoost is an opt-in training mode that blends the standard gradient gain with a normalized information-theoretic term. Across rounds, the blend ramps in via a tanh(iter/20) warmup, an EMA over per-class gradient statistics shapes split selection, and leaf magnitudes are scaled by a depth penalty and per-iteration shrinkage. See the MorphBoost paper for the formulation.

from alloygbm import GBMRegressor

# Constant LR (default) with morph adaptive split criterion
model = GBMRegressor(
    n_estimators=1200,
    max_depth=6,
    learning_rate=0.05,
    training_mode="morph",      # opt in
    morph_rate=0.1,             # per-round leaf shrinkage
    info_score_weight=0.3,      # blend weight for info-theoretic term
    depth_penalty_base=0.9,     # multiplier per depth level
    balance_penalty=True,       # penalize highly imbalanced splits
    seed=7,
)
model.fit(X_train, y_train)

# With warmup-cosine LR schedule (good fit for very-low-LR runs)
model = GBMRegressor(
    n_estimators=5000,
    learning_rate=0.01,
    training_mode="morph",
    lr_schedule="warmup_cosine",
    lr_warmup_frac=0.1,         # fraction of n_estimators spent in warmup
    seed=7,
)

training_mode="morph" works with GBMClassifier and GBMRanker too, with identical parameter semantics.

DRO Leaf Solver (Robust Scalar Leaves)

Set leaf_solver="dro" to use a fast Wasserstein-inspired robust Newton update for scalar leaves. The solver penalizes each candidate leaf by within-leaf gradient dispersion, reducing sensitivity to noisy or weak leaf signals while keeping prediction speed identical to standard constant leaves.

from alloygbm import GBMRegressor

model = GBMRegressor(
    n_estimators=600,
    max_depth=6,
    learning_rate=0.05,
    leaf_solver="dro",
    dro_radius=0.05,
    dro_metric="wasserstein",
    seed=7,
)
model.fit(X_train, y_train)

leaf_solver="dro" works with GBMRegressor, GBMClassifier, and GBMRanker, and composes with training_mode="morph". In v0.6.0 it requires leaf_model="constant"; piecewise-linear leaves still use the standard PL solver. dro_radius=0.0 preserves standard-leaf predictions while retaining DRO metadata in the artifact.

Piecewise-Linear Leaves

Set leaf_model="linear" on any estimator to replace scalar leaves with small closed-form linear models (f_s(x) = b_s + Σ α_j x_j). Weights are solved via ridge regression α* = -(XᵀHX + λI)⁻¹ Xᵀg regularised by lambda_l2. This typically converges in fewer rounds on data with linear within-node residual structure (e.g. California Housing), at a 2–8× per-round training overhead.

from alloygbm import GBMRegressor

model = GBMRegressor(
    n_estimators=300,
    max_depth=6,
    learning_rate=0.05,
    leaf_model="linear",
    lambda_l2=0.01,    # recommended >= 0.01 with linear leaves
    seed=7,
)
model.fit(X_train, y_train)

leaf_model="linear" works with GBMClassifier and GBMRanker, and composes with training_mode="morph". SHAP currently requires leaf_model="constant".

Time-Aware Validation

from alloygbm import GBMRegressor, purged_time_series_splits, rmse

splits = purged_time_series_splits(time_index, n_splits=5, purge_gap=1, embargo=0)

for train_idx, test_idx in splits:
    model = GBMRegressor(deterministic=True, seed=7)
    model.fit(
        [rows[i] for i in train_idx],
        [targets[i] for i in train_idx],
    )
    score = rmse(
        [targets[i] for i in test_idx],
        model.predict([rows[i] for i in test_idx]),
    )

For panel data, use purged_panel_splits(...).

Model Persistence

import pickle

# Pickle round-trip
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

# Native save/load
model.save_model("model.agbm")
loaded = GBMRegressor.load_model("model.agbm")

# Artifact export for deployment
artifact_bytes = model.artifact_bytes

Feature Summary

Estimators

GBMRegressor -- squared-error regression with dataset-aware training_policy
GBMClassifier -- binary classification with log-loss objective, predict_proba, sklearn ClassifierMixin
GBMRanker -- learning-to-rank with 5 objectives: rank:pairwise, rank:ndcg, rank:xendcg, queryrmse, yetirank
All estimators are sklearn-compatible (get_params, set_params, score, pipeline integration)

Training Features

NaN/missing value support with learned split direction
Sample weights via fit(..., sample_weight=...)
Monotone constraints via monotone_constraints
Feature importance weighting via feature_weights
Leaf-wise (best-first) tree growth via tree_growth="leaf"
Warm-starting / incremental training via warm_start=True
Up to 65,535 bins per feature (continuous_binning_max_bins)
Multiple categorical column support via categorical_feature_indices
Early stopping with best_iteration_, best_score_, evals_result_
Objective-aware training metric tracking (RMSE, log-loss, accuracy, NDCG)
Adaptive split criterion via training_mode="morph" (MorphBoost)
Per-iteration learning-rate schedules: lr_schedule="constant" (default) or "warmup_cosine"
DRO-style robust scalar leaves via leaf_solver="dro" (closed-form gradient-uncertainty penalty)
Piecewise-linear leaves via leaf_model="linear" (closed-form ridge solve, faster convergence on linear-trend data)

Inference and Explanations

Zero-copy numpy prediction from native artifacts
TreeSHAP explanations via shap_values(...) (polynomial-time, no feature limit)
Global feature importance via feature_importances(...)
Artifact-backed prediction via predict_from_artifact(...)

Validation Helpers

purged_time_series_splits(...) -- leakage-aware time-series cross-validation
purged_panel_splits(...) -- panel-data cross-validation

Metrics

Regression: rmse, mae, r2_score
Classification: accuracy, log_loss
Ranking: ndcg
Finance: pearson_correlation, rank_ic, hit_rate, icir

Benchmark Snapshot

The benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost across regression, classification, and ranking tasks.

Regression:

AlloyGBM is strongest on panel_time_series
AlloyGBM is strong on dow_jones_financial
AlloyGBM is competitive on dense_numeric, trails on california_housing and bike_sharing

Classification:

AlloyGBM is competitive with established libraries on breast_cancer and synthetic_classification

Ranking:

AlloyGBM competes on synthetic_ranking using its native LambdaMART implementation

Benchmark tooling and methodology live in benchmarks/README.md.

Current Limitations

CPU-only runtime (GPU backend is architecturally planned but not implemented)
No interaction constraints
No dart/goss boosting modes
SHAP not yet supported with leaf_model="linear" (use "constant" for now)
leaf_solver="dro" is a robust scalar leaf update, not a full raw-distribution Wasserstein DRO guarantee

Documentation

Docs index: docs/README.md
Benchmark guide: benchmarks/README.md
Current roadmap: docs/roadmap/current.md
Archive: docs/archive/README.md

License

MIT. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

LGA-Dev

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.7.0

May 13, 2026

This version

0.6.0

May 9, 2026

0.5.1

May 8, 2026

0.5.0

May 8, 2026

0.4.0

May 6, 2026

0.3.2

Apr 18, 2026

0.3.1

Apr 17, 2026

0.3.0

Apr 16, 2026

0.2.0

Apr 10, 2026

0.1.2

Mar 29, 2026

0.1.1

Mar 27, 2026

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alloygbm-0.6.0.tar.gz (270.0 kB view details)

Uploaded May 9, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alloygbm-0.6.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB view details)

Uploaded May 9, 2026 CPython 3.11+manylinux: glibc 2.17+ x86-64

alloygbm-0.6.0-cp311-abi3-macosx_11_0_arm64.whl (910.0 kB view details)

Uploaded May 9, 2026 CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file alloygbm-0.6.0.tar.gz.

File metadata

Download URL: alloygbm-0.6.0.tar.gz
Upload date: May 9, 2026
Size: 270.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alloygbm-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`b35b4a87248db30ef7b2ddc664a53d44d3372bde92630b4b9d8092f311710141`
MD5	`86a1239fda3f074eb18a84689fb4ac18`
BLAKE2b-256	`85f9de616a2f4cd4582dab3b1a28b397bcd3d01e3acd04b32dd2a3909c8f58ae`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.6.0.tar.gz:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloygbm-0.6.0.tar.gz
- Subject digest: b35b4a87248db30ef7b2ddc664a53d44d3372bde92630b4b9d8092f311710141
- Sigstore transparency entry: 1485858644
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: LGA-Personal/AlloyGBM@74b907cfedc2565c635ab2f97e9c51946e75e849
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/LGA-Personal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@74b907cfedc2565c635ab2f97e9c51946e75e849
- Trigger Event: release

File details

Details for the file alloygbm-0.6.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: alloygbm-0.6.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 9, 2026
Size: 1.0 MB
Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alloygbm-0.6.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e488a8be904e260a1f2db9355d66ecc545d2365972e99797d467e3fc865a7e20`
MD5	`3d67303d97bdc381999699f29c82cf1a`
BLAKE2b-256	`df1c5b84e6f9ad1e0837c3241bdf4ea29964e87a3bc2911d980b01b5b5568160`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.6.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloygbm-0.6.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: e488a8be904e260a1f2db9355d66ecc545d2365972e99797d467e3fc865a7e20
- Sigstore transparency entry: 1485858678
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: LGA-Personal/AlloyGBM@74b907cfedc2565c635ab2f97e9c51946e75e849
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/LGA-Personal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@74b907cfedc2565c635ab2f97e9c51946e75e849
- Trigger Event: release

File details

Details for the file alloygbm-0.6.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: alloygbm-0.6.0-cp311-abi3-macosx_11_0_arm64.whl
Upload date: May 9, 2026
Size: 910.0 kB
Tags: CPython 3.11+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alloygbm-0.6.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`e77773823cfa0659414e6886724fb8942c103a41898f9bdd6817350d164c11d3`
MD5	`c4afffb7ad262afd8a751f34295c5b3e`
BLAKE2b-256	`fd3aee9c3b6fab30fb5a73deac7204de9594b491f8df01a7dca730f50111cb39`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.6.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloygbm-0.6.0-cp311-abi3-macosx_11_0_arm64.whl
- Subject digest: e77773823cfa0659414e6886724fb8942c103a41898f9bdd6817350d164c11d3
- Sigstore transparency entry: 1485858732
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: LGA-Personal/AlloyGBM@74b907cfedc2565c635ab2f97e9c51946e75e849
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/LGA-Personal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@74b907cfedc2565c635ab2f97e9c51946e75e849
- Trigger Event: release

alloygbm 0.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AlloyGBM

When To Use AlloyGBM

Installation

Quick Examples

Regression

Binary Classification

Learning-to-Rank

MorphBoost (Adaptive Split Criterion)

DRO Leaf Solver (Robust Scalar Leaves)

Piecewise-Linear Leaves

Time-Aware Validation

Model Persistence

Feature Summary

Estimators

Training Features

Inference and Explanations

Validation Helpers

Metrics

Benchmark Snapshot

Current Limitations

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance