Rust-first gradient boosting for regression, classification, and ranking with time-aware validation and Python bindings

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

LGA-Dev

These details have not been verified by PyPI

Project links

Documentation

Project description

AlloyGBM

AlloyGBM is a Rust-first gradient boosting library with Python bindings, supporting regression, binary and multi-class classification, and learning-to-rank. It is built for fast native execution, deterministic training, and time-aware tabular workflows.

AlloyGBM is strongest on panel and finance-style problems where leakage-aware validation and practical iteration speed matter. It also performs competitively on general tabular benchmarks and includes native artifact prediction, TreeSHAP explanations, and purged time-series split helpers.

When To Use AlloyGBM

AlloyGBM is a good fit when you want:

a native Rust-backed gradient boosting library with regression, classification, and ranking
deterministic CPU training and inference
sklearn-compatible estimators (GBMRegressor, GBMClassifier, GBMRanker)
time-aware validation helpers for forecasting or panel-style workflows
native prediction from serialized artifacts
TreeSHAP explanations and global feature importances
NaN/missing value support out of the box
model persistence via pickle, save/load, or artifact export

Installation

PyPI:

pip install alloygbm

From source:

python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release

AlloyGBM targets Python 3.11+ and uses a native Rust extension module.

Wheel targets for 0.7.0:

macOS arm64
Linux x86_64 (manylinux)
source distribution for other platforms

Quick Examples

Regression

from alloygbm import GBMRegressor, rmse

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train, eval_set=(X_valid, y_valid))
print(rmse(y_test, model.predict(X_test)))

Binary Classification

from alloygbm import GBMClassifier, accuracy, log_loss

model = GBMClassifier(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=500,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train)

labels = model.predict(X_test)            # [0, 1, 1, 0, ...]
probas = model.predict_proba(X_test)      # [[P(0), P(1)], ...]

print("accuracy:", accuracy(y_test, labels))
print("log_loss:", log_loss(y_test, probas[:, 1]))

Learning-to-Rank

from alloygbm import GBMRanker, ndcg

model = GBMRanker(
    ranking_objective="rank:ndcg",
    learning_rate=0.05,
    max_depth=6,
    n_estimators=300,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train, group=query_ids_train)

scores = model.predict(X_test)
print("NDCG@10:", ndcg(y_test, scores, group=query_ids_test, k=10))

MorphBoost (Adaptive Split Criterion)

MorphBoost is an opt-in training mode that blends the standard gradient gain with a normalized information-theoretic term. Across rounds, the blend ramps in via a tanh(iter/20) warmup, an EMA over per-class gradient statistics shapes split selection, and leaf magnitudes are scaled by a depth penalty and per-iteration shrinkage. See the MorphBoost paper for the formulation.

from alloygbm import GBMRegressor

# Constant LR (default) with morph adaptive split criterion
model = GBMRegressor(
    n_estimators=1200,
    max_depth=6,
    learning_rate=0.05,
    training_mode="morph",      # opt in
    morph_rate=0.1,             # per-round leaf shrinkage
    info_score_weight=0.3,      # blend weight for info-theoretic term
    depth_penalty_base=0.9,     # multiplier per depth level
    balance_penalty=True,       # penalize highly imbalanced splits
    seed=7,
)
model.fit(X_train, y_train)

# With warmup-cosine LR schedule (good fit for very-low-LR runs)
model = GBMRegressor(
    n_estimators=5000,
    learning_rate=0.01,
    training_mode="morph",
    lr_schedule="warmup_cosine",
    lr_warmup_frac=0.1,         # fraction of n_estimators spent in warmup
    seed=7,
)

training_mode="morph" works with GBMClassifier and GBMRanker too, with identical parameter semantics.

DRO Leaf Solver (Robust Scalar Leaves)

Set leaf_solver="dro" to use a fast Wasserstein-inspired robust Newton update for scalar leaves. The solver penalizes each candidate leaf by within-leaf gradient dispersion, reducing sensitivity to noisy or weak leaf signals while keeping prediction speed identical to standard constant leaves.

from alloygbm import GBMRegressor

model = GBMRegressor(
    n_estimators=600,
    max_depth=6,
    learning_rate=0.05,
    leaf_solver="dro",
    dro_radius=0.05,
    dro_metric="wasserstein",
    seed=7,
)
model.fit(X_train, y_train)

leaf_solver="dro" works with GBMRegressor, GBMClassifier, and GBMRanker, and composes with training_mode="morph". In v0.7.0 it requires leaf_model="constant"; piecewise-linear leaves still use the standard PL solver. dro_radius=0.0 preserves standard-leaf predictions while retaining DRO metadata in the artifact.

Factor-Neutral Boosting

Use neutralization="per_round_gradient" with fit(..., factor_exposures=F) to project each boosting round's pseudo-residuals away from user-supplied nuisance factors. This is useful when common factors explain high-variance signal that you do not want the model to spend tree capacity learning.

This is a training-time regularization tool. It does not guarantee prediction-time zero exposure unless predictions are neutralized against evaluation-time factors outside the model.

Constructor parameters:

GBMRegressor(
    neutralization="none",                 # "none" | "pre_target" | "per_round_gradient" | "split_penalty"
    factor_neutralization_lambda=1e-6,      # finite, >= 0 ridge added to F^T W F
    factor_penalty=0.0,                     # finite, >= 0; only active for neutralization="split_penalty"
)

factor_exposures is dense, row-major, finite, and shaped (n_rows, n_factors). It is fit data, not constructor state, so sklearn cloning remains clean and large matrices are not embedded in estimator params.

Mode semantics:

neutralization="none" preserves current behavior and ignores factor_exposures unless a non-None matrix is provided with an inactive mode, in which case Python raises a clear validation error to prevent silent user mistakes.

neutralization="pre_target" residualizes the regression target once before training:

y_perp = y - F (F^T W F + lambda I)^-1 F^T W y

This mode is supported for GBMRegressor only. It is rejected for classification and ranking because target residualization is not well-defined for class labels or ranking relevance. eval_set is also rejected for pre_target in this release because the public API does not yet accept validation-set factor exposures to residualize validation targets consistently.

neutralization="per_round_gradient" projects objective gradients before each boosting round:

g_perp = g - F (F^T W F + lambda I)^-1 F^T W g

Hessians are unchanged. This mode is supported for regression, binary classification, multiclass, and ranking. For multiclass, each class-gradient column is projected independently against the same factor projector.

neutralization="split_penalty" includes per-round gradient projection and subtracts a factor-load penalty from split gain:

penalty = factor_penalty * || F_L^T update_L + F_R^T update_R ||^2 / max(row_count, 1)
gain_final = gain_after_existing_modes - penalty

For scalar leaves, update_L and update_R are the candidate scalar leaf values before any final MorphBoost depth/iteration leaf scaling. For DRO leaves, the scalar values use the DRO effective gradients. For MorphBoost, the order is: project gradients, compute standard/DRO gradient gain, blend MorphBoost information score, subtract factor penalty, then apply MorphBoost leaf scaling when storing leaves. split_penalty performs additional factor-exposure work during split search and should be treated as the slowest neutralization mode until production-scale benchmarks justify stronger claims.

Compatibility:

Feature	pre_target	per_round_gradient	split_penalty
`GBMRegressor`	supported	supported	supported
`GBMClassifier`	rejected	supported	supported
`GBMRanker`	rejected	supported	supported
`training_mode="morph"`	supported	supported	supported
`leaf_solver="dro"`	supported	supported	supported
`leaf_model="linear"`	supported	supported	rejected
warm start	rejected in this release	rejected in this release	rejected in this release

Exposure matrices are not persisted in the estimator or artifact. For this release, neutralized warm-start and init_model continuation are rejected because artifacts do not yet persist neutralization metadata needed to prove that the previous model and current estimator have matching neutralization contracts.

Piecewise-Linear Leaves

Set leaf_model="linear" on any estimator to replace scalar leaves with small closed-form linear models (f_s(x) = b_s + Σ α_j x_j). Weights are solved via ridge regression α* = -(XᵀHX + λI)⁻¹ Xᵀg regularised by lambda_l2. This typically converges in fewer rounds on data with linear within-node residual structure (e.g. California Housing), at a 2–8× per-round training overhead.

from alloygbm import GBMRegressor

model = GBMRegressor(
    n_estimators=300,
    max_depth=6,
    learning_rate=0.05,
    leaf_model="linear",
    lambda_l2=0.01,    # recommended >= 0.01 with linear leaves
    seed=7,
)
model.fit(X_train, y_train)

leaf_model="linear" works with GBMClassifier and GBMRanker, and composes with training_mode="morph". SHAP currently requires leaf_model="constant".

Time-Aware Validation

from alloygbm import GBMRegressor, purged_time_series_splits, rmse

splits = purged_time_series_splits(time_index, n_splits=5, purge_gap=1, embargo=0)

for train_idx, test_idx in splits:
    model = GBMRegressor(deterministic=True, seed=7)
    model.fit(
        [rows[i] for i in train_idx],
        [targets[i] for i in train_idx],
    )
    score = rmse(
        [targets[i] for i in test_idx],
        model.predict([rows[i] for i in test_idx]),
    )

For panel data, use purged_panel_splits(...).

Model Persistence

import pickle

# Pickle round-trip
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

# Native save/load
model.save_model("model.agbm")
loaded = GBMRegressor.load_model("model.agbm")

# Artifact export for deployment
artifact_bytes = model.artifact_bytes

Feature Summary

Estimators

GBMRegressor -- squared-error regression with dataset-aware training_policy
GBMClassifier -- binary classification with log-loss objective, predict_proba, sklearn ClassifierMixin
GBMRanker -- learning-to-rank with 5 objectives: rank:pairwise, rank:ndcg, rank:xendcg, queryrmse, yetirank
All estimators are sklearn-compatible (get_params, set_params, score, pipeline integration)

Training Features

NaN/missing value support with learned split direction
Sample weights via fit(..., sample_weight=...)
Monotone constraints via monotone_constraints
Feature importance weighting via feature_weights
Leaf-wise (best-first) tree growth via tree_growth="leaf"
Warm-starting / incremental training via warm_start=True
Up to 65,535 bins per feature (continuous_binning_max_bins)
Multiple categorical column support via categorical_feature_indices
Early stopping with best_iteration_, best_score_, evals_result_
Objective-aware training metric tracking (RMSE, log-loss, accuracy, NDCG)
Adaptive split criterion via training_mode="morph" (MorphBoost)
Per-iteration learning-rate schedules: lr_schedule="constant" (default) or "warmup_cosine"
DRO-style robust scalar leaves via leaf_solver="dro" (closed-form gradient-uncertainty penalty)
Piecewise-linear leaves via leaf_model="linear" (closed-form ridge solve, faster convergence on linear-trend data)

Inference and Explanations

Zero-copy numpy prediction from native artifacts
TreeSHAP explanations via shap_values(...) (polynomial-time, no feature limit)
Global feature importance via feature_importances(...)
Artifact-backed prediction via predict_from_artifact(...)

Validation Helpers

purged_time_series_splits(...) -- leakage-aware time-series cross-validation
purged_panel_splits(...) -- panel-data cross-validation

Metrics

Regression: rmse, mae, r2_score
Classification: accuracy, log_loss
Ranking: ndcg
Finance: pearson_correlation, rank_ic, hit_rate, icir

Benchmark Snapshot

The benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost across regression, classification, and ranking tasks.

Regression:

AlloyGBM is strongest on panel_time_series
AlloyGBM is strong on dow_jones_financial
AlloyGBM is competitive on dense_numeric, trails on california_housing and bike_sharing

Classification:

AlloyGBM is competitive with established libraries on breast_cancer and synthetic_classification

Ranking:

AlloyGBM competes on synthetic_ranking using its native LambdaMART implementation

Benchmark tooling and methodology live in benchmarks/README.md.

Current Limitations

CPU-only runtime (GPU backend is architecturally planned but not implemented)
No interaction constraints
No dart/goss boosting modes
SHAP not yet supported with leaf_model="linear" (use "constant" for now)
leaf_solver="dro" is a robust scalar leaf update, not a full raw-distribution Wasserstein DRO guarantee

Documentation

Docs index: docs/README.md
Benchmark guide: benchmarks/README.md
Current roadmap: docs/roadmap/current.md
Archive: docs/archive/README.md

License

MIT. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

LGA-Dev

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.7.0

May 13, 2026

0.6.0

May 9, 2026

0.5.1

May 8, 2026

0.5.0

May 8, 2026

0.4.0

May 6, 2026

0.3.2

Apr 18, 2026

0.3.1

Apr 17, 2026

0.3.0

Apr 16, 2026

0.2.0

Apr 10, 2026

0.1.2

Mar 29, 2026

0.1.1

Mar 27, 2026

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alloygbm-0.7.0.tar.gz (288.1 kB view details)

Uploaded May 13, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alloygbm-0.7.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB view details)

Uploaded May 13, 2026 CPython 3.11+manylinux: glibc 2.17+ x86-64

alloygbm-0.7.0-cp311-abi3-macosx_11_0_arm64.whl (941.9 kB view details)

Uploaded May 13, 2026 CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file alloygbm-0.7.0.tar.gz.

File metadata

Download URL: alloygbm-0.7.0.tar.gz
Upload date: May 13, 2026
Size: 288.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alloygbm-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`329ec1e369f8ec0bfb41824b19463b307299d5b6f9062a888ef73fe8fc2f6415`
MD5	`b7aa21f84837bb88bd01470d5e5ea8bb`
BLAKE2b-256	`a804d1789966bf787d59b0bf809f29e64a7be575e64c2eefdd19dcf6f6619e58`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.7.0.tar.gz:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloygbm-0.7.0.tar.gz
- Subject digest: 329ec1e369f8ec0bfb41824b19463b307299d5b6f9062a888ef73fe8fc2f6415
- Sigstore transparency entry: 1525464800
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: LGA-Personal/AlloyGBM@954ebec8398fcc3abaaec61ef5fb714590b8e1cd
- Branch / Tag: refs/tags/v0.7.0
- Owner: https://github.com/LGA-Personal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@954ebec8398fcc3abaaec61ef5fb714590b8e1cd
- Trigger Event: release

File details

Details for the file alloygbm-0.7.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: alloygbm-0.7.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 13, 2026
Size: 1.0 MB
Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alloygbm-0.7.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e4b3fe255e29aae45a0efaa683690df74e8ccc5c564b3726b8b97bb3ac13140b`
MD5	`dc5b6038e2b93a8039857d2ca9b5047d`
BLAKE2b-256	`296cd9b036dbbea7683083e92828691780663eb749c2d7c15a5b5c701c4e4d5d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.7.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloygbm-0.7.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: e4b3fe255e29aae45a0efaa683690df74e8ccc5c564b3726b8b97bb3ac13140b
- Sigstore transparency entry: 1525464846
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: LGA-Personal/AlloyGBM@954ebec8398fcc3abaaec61ef5fb714590b8e1cd
- Branch / Tag: refs/tags/v0.7.0
- Owner: https://github.com/LGA-Personal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@954ebec8398fcc3abaaec61ef5fb714590b8e1cd
- Trigger Event: release

File details

Details for the file alloygbm-0.7.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: alloygbm-0.7.0-cp311-abi3-macosx_11_0_arm64.whl
Upload date: May 13, 2026
Size: 941.9 kB
Tags: CPython 3.11+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alloygbm-0.7.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`81ca52be4405e0a5c7b1cbb41f72a599838d80f5308e0a8bf23735962efc1057`
MD5	`d486a74b94e41b88f714ea9d7e2d8078`
BLAKE2b-256	`e4b614ecf0c8cd90c6466e2b35ba08bced241088ed2c90b3703fe65d7edbf14c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.7.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloygbm-0.7.0-cp311-abi3-macosx_11_0_arm64.whl
- Subject digest: 81ca52be4405e0a5c7b1cbb41f72a599838d80f5308e0a8bf23735962efc1057
- Sigstore transparency entry: 1525464888
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: LGA-Personal/AlloyGBM@954ebec8398fcc3abaaec61ef5fb714590b8e1cd
- Branch / Tag: refs/tags/v0.7.0
- Owner: https://github.com/LGA-Personal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@954ebec8398fcc3abaaec61ef5fb714590b8e1cd
- Trigger Event: release

alloygbm 0.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AlloyGBM

When To Use AlloyGBM

Installation

Quick Examples

Regression

Binary Classification

Learning-to-Rank

MorphBoost (Adaptive Split Criterion)

DRO Leaf Solver (Robust Scalar Leaves)

Factor-Neutral Boosting

Piecewise-Linear Leaves

Time-Aware Validation

Model Persistence

Feature Summary

Estimators

Training Features

Inference and Explanations

Validation Helpers

Metrics

Benchmark Snapshot

Current Limitations

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance