Skip to main content

Rust-first gradient boosting for regression, classification, and ranking with time-aware validation and Python bindings

Project description

AlloyGBM

AlloyGBM is a Rust-first gradient boosting library with Python bindings, supporting regression, binary and multi-class classification, and learning-to-rank. It is built for fast native execution, deterministic training, and time-aware tabular workflows.

AlloyGBM is strongest on panel and finance-style problems where leakage-aware validation and practical iteration speed matter. It also performs competitively on general tabular benchmarks and includes native artifact prediction, TreeSHAP explanations, and purged time-series split helpers.

When To Use AlloyGBM

AlloyGBM is a good fit when you want:

  • a native Rust-backed gradient boosting library with regression, classification, and ranking
  • deterministic CPU training and inference
  • sklearn-compatible estimators (GBMRegressor, GBMClassifier, GBMRanker)
  • time-aware validation helpers for forecasting or panel-style workflows
  • native prediction from serialized artifacts
  • TreeSHAP explanations and global feature importances
  • NaN/missing value support out of the box
  • model persistence via pickle, save/load, or artifact export

Installation

PyPI:

pip install alloygbm

From source:

python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release

AlloyGBM targets Python 3.11+ and uses a native Rust extension module.

Wheel targets for 0.4.0:

  • macOS arm64
  • Linux x86_64 (manylinux)
  • source distribution for other platforms

Quick Examples

Regression

from alloygbm import GBMRegressor, rmse

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train, eval_set=(X_valid, y_valid))
print(rmse(y_test, model.predict(X_test)))

Binary Classification

from alloygbm import GBMClassifier, accuracy, log_loss

model = GBMClassifier(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=500,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train)

labels = model.predict(X_test)            # [0, 1, 1, 0, ...]
probas = model.predict_proba(X_test)      # [[P(0), P(1)], ...]

print("accuracy:", accuracy(y_test, labels))
print("log_loss:", log_loss(y_test, probas[:, 1]))

Learning-to-Rank

from alloygbm import GBMRanker, ndcg

model = GBMRanker(
    ranking_objective="rank:ndcg",
    learning_rate=0.05,
    max_depth=6,
    n_estimators=300,
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train, group=query_ids_train)

scores = model.predict(X_test)
print("NDCG@10:", ndcg(y_test, scores, group=query_ids_test, k=10))

MorphBoost (Adaptive Split Criterion)

MorphBoost is an opt-in training mode that blends the standard gradient gain with a normalized information-theoretic term. Across rounds, the blend ramps in via a tanh(iter/20) warmup, an EMA over per-class gradient statistics shapes split selection, and leaf magnitudes are scaled by a depth penalty and per-iteration shrinkage. See the MorphBoost paper for the formulation.

from alloygbm import GBMRegressor

# Constant LR (default) with morph adaptive split criterion
model = GBMRegressor(
    n_estimators=1200,
    max_depth=6,
    learning_rate=0.05,
    training_mode="morph",      # opt in
    morph_rate=0.1,             # per-round leaf shrinkage
    info_score_weight=0.3,      # blend weight for info-theoretic term
    depth_penalty_base=0.9,     # multiplier per depth level
    balance_penalty=True,       # penalize highly imbalanced splits
    seed=7,
)
model.fit(X_train, y_train)

# With warmup-cosine LR schedule (good fit for very-low-LR runs)
model = GBMRegressor(
    n_estimators=5000,
    learning_rate=0.01,
    training_mode="morph",
    lr_schedule="warmup_cosine",
    lr_warmup_frac=0.1,         # fraction of n_estimators spent in warmup
    seed=7,
)

training_mode="morph" works with GBMClassifier and GBMRanker too, with identical parameter semantics.

Time-Aware Validation

from alloygbm import GBMRegressor, purged_time_series_splits, rmse

splits = purged_time_series_splits(time_index, n_splits=5, purge_gap=1, embargo=0)

for train_idx, test_idx in splits:
    model = GBMRegressor(deterministic=True, seed=7)
    model.fit(
        [rows[i] for i in train_idx],
        [targets[i] for i in train_idx],
    )
    score = rmse(
        [targets[i] for i in test_idx],
        model.predict([rows[i] for i in test_idx]),
    )

For panel data, use purged_panel_splits(...).

Model Persistence

import pickle

# Pickle round-trip
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

# Native save/load
model.save_model("model.agbm")
loaded = GBMRegressor.load_model("model.agbm")

# Artifact export for deployment
artifact_bytes = model.artifact_bytes

Feature Summary

Estimators

  • GBMRegressor -- squared-error regression with dataset-aware training_policy
  • GBMClassifier -- binary classification with log-loss objective, predict_proba, sklearn ClassifierMixin
  • GBMRanker -- learning-to-rank with 5 objectives: rank:pairwise, rank:ndcg, rank:xendcg, queryrmse, yetirank
  • All estimators are sklearn-compatible (get_params, set_params, score, pipeline integration)

Training Features

  • NaN/missing value support with learned split direction
  • Sample weights via fit(..., sample_weight=...)
  • Monotone constraints via monotone_constraints
  • Feature importance weighting via feature_weights
  • Leaf-wise (best-first) tree growth via tree_growth="leaf"
  • Warm-starting / incremental training via warm_start=True
  • Up to 65,535 bins per feature (continuous_binning_max_bins)
  • Multiple categorical column support via categorical_feature_indices
  • Early stopping with best_iteration_, best_score_, evals_result_
  • Objective-aware training metric tracking (RMSE, log-loss, accuracy, NDCG)
  • Adaptive split criterion via training_mode="morph" (MorphBoost)
  • Per-iteration learning-rate schedules: lr_schedule="constant" (default) or "warmup_cosine"

Inference and Explanations

  • Zero-copy numpy prediction from native artifacts
  • TreeSHAP explanations via shap_values(...) (polynomial-time, no feature limit)
  • Global feature importance via feature_importances(...)
  • Artifact-backed prediction via predict_from_artifact(...)

Validation Helpers

  • purged_time_series_splits(...) -- leakage-aware time-series cross-validation
  • purged_panel_splits(...) -- panel-data cross-validation

Metrics

  • Regression: rmse, mae, r2_score
  • Classification: accuracy, log_loss
  • Ranking: ndcg
  • Finance: pearson_correlation, rank_ic, hit_rate, icir

Benchmark Snapshot

The benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost across regression, classification, and ranking tasks.

Regression:

  • AlloyGBM is strongest on panel_time_series
  • AlloyGBM is strong on dow_jones_financial
  • AlloyGBM is competitive on dense_numeric, trails on california_housing and bike_sharing

Classification:

  • AlloyGBM is competitive with established libraries on breast_cancer and synthetic_classification

Ranking:

  • AlloyGBM competes on synthetic_ranking using its native LambdaMART implementation

Benchmark tooling and methodology live in benchmarks/README.md.

Current Limitations

  • Binary classification only (no multi-class yet)
  • CPU-only runtime (GPU backend is architecturally planned but not implemented)
  • No custom objective / custom metric callbacks from Python
  • No interaction constraints
  • No dart/goss boosting modes

Documentation

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alloygbm-0.4.0.tar.gz (235.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

alloygbm-0.4.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (962.6 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

alloygbm-0.4.0-cp311-abi3-macosx_11_0_arm64.whl (867.2 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file alloygbm-0.4.0.tar.gz.

File metadata

  • Download URL: alloygbm-0.4.0.tar.gz
  • Upload date:
  • Size: 235.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alloygbm-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a42dacfb10af29ede210cbfc6b040ad6ef619f36bf87fe352188054906231432
MD5 f21b5042c5799586f766ac6d4bff15ec
BLAKE2b-256 6f20c1e43431e5bb2e7555de7a537db733823b2f26e66c0d37a3f15de22ddd0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.4.0.tar.gz:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.4.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for alloygbm-0.4.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 21b18603ba854e9b3232576a2f82069a8b4b991dbca4b5140d515be52b841326
MD5 a0a2284eaa15bd56ce9b3db62f7a7c67
BLAKE2b-256 82aad8f0c3801bb1b151b37c96af8a47525ab47a7119360ede9fc50f23e3e78f

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.4.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.4.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for alloygbm-0.4.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 81b18853bea58c558d7e811b5491b2467c1afbf1082cdc06cc41f843f9d0815a
MD5 234c09f5accff5ae367743e19cf334e8
BLAKE2b-256 ef441a3c80fbbce504c5693d7e016ab2b1aff1a7abf0b3cc6f59f858b03ef0c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.4.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page