Skip to main content

Rust-first gradient boosting for structured regression with time-aware validation utilities and Python bindings

Project description

AlloyGBM

AlloyGBM is a Rust-first gradient boosting library for structured regression, with a Python API focused on fast native execution, deterministic training, and time-aware tabular workflows.

It is currently strongest on panel and finance-style regression problems where leakage-aware validation and practical iteration speed matter. It also includes native artifact prediction, SHAP explanations, and purged time-series split helpers in the Python package.

When To Use AlloyGBM

AlloyGBM is a good fit when you want:

  • a native-backed gradient boosting regressor with a small Python API surface
  • deterministic CPU training and inference
  • time-aware validation helpers for forecasting or panel-style workflows
  • native prediction from serialized artifacts
  • SHAP-based local explanations and global feature importances

If you need the broadest possible objective support, classification, ranking, multiple categorical columns, or the strongest out-of-the-box results on generic tabular benchmarks, you should still expect XGBoost, LightGBM, or CatBoost to be stronger today.

Installation

PyPI:

pip install alloygbm

From source:

python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release

AlloyGBM currently targets Python 3.11+ and uses a native Rust extension module.

Initial 0.1.x packaging policy:

  • tested directly on macOS Apple Silicon
  • planned wheel targets: macOS arm64 and Linux x86_64
  • Windows support is deferred until after 0.1.x
  • source distribution remains the fallback for unsupported environments

Minimal Example

from alloygbm import GBMRegressor, rmse

X_train = [
    [0.0, 1.0],
    [1.0, 0.0],
    [2.0, 1.0],
    [3.0, 0.0],
]
y_train = [0.2, 0.9, 1.8, 2.7]

X_test = [
    [1.5, 1.0],
    [2.5, 0.0],
]
y_test = [1.3, 2.3]

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    training_policy="auto",
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(predictions)
print(rmse(y_test, predictions))

Time-Aware Validation Example

from alloygbm import GBMRegressor, purged_time_series_splits, rmse

rows = [
    [0.1, 1.0],
    [0.2, 1.1],
    [0.4, 0.9],
    [0.6, 1.2],
    [0.8, 1.3],
    [1.0, 1.4],
]
targets = [0.0, 0.1, 0.2, 0.5, 0.8, 1.0]
time_index = [0, 0, 1, 1, 2, 2]

splits = purged_time_series_splits(
    time_index,
    n_splits=3,
    purge_gap=0,
    embargo=0,
)

fold_scores = []
for train_idx, test_idx in splits:
    model = GBMRegressor(
        learning_rate=0.05,
        max_depth=6,
        n_estimators=400,
        deterministic=True,
        seed=7,
    )
    X_train = [rows[i] for i in train_idx]
    y_train = [targets[i] for i in train_idx]
    X_test = [rows[i] for i in test_idx]
    y_test = [targets[i] for i in test_idx]

    model.fit(X_train, y_train)
    fold_scores.append(rmse(y_test, model.predict(X_test)))

print(fold_scores)

For panel data, use purged_panel_splits(...).

Validation And Early Stopping

from alloygbm import GBMRegressor

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    early_stopping_rounds=50,
    min_validation_improvement=1e-4,
    min_data_in_leaf=32,
    lambda_l2=1.0,
    deterministic=True,
    seed=7,
)

model.fit(
    X_train,
    y_train,
    eval_set=(X_valid, y_valid),
)

print(model.best_iteration_)
print(model.best_score_)
print(model.n_estimators_)
print(model.evals_result_)
print(model.fit_timing_)

early_stopping_rounds is explicit-only: pass eval_set=(X_valid, y_valid) when you enable it.

Feature Summary

  • Native Rust-backed training with zero-copy numpy prediction from Python
  • GBMRegressor with deterministic training controls and dataset-aware training_policy
  • Explicit validation support via fit(..., eval_set=..., eval_time_index=...)
  • Early stopping with fitted summaries: best_iteration_, best_score_, n_estimators_, evals_result_
  • Leaf and split controls: min_data_in_leaf, lambda_l1, lambda_l2, min_child_hessian
  • Continuous-feature binning strategies: linear, rank, quantile
  • Optional single-column categorical encoding path
  • Artifact-backed prediction via predict_from_artifact(...)
  • SHAP row explanations via shap_values(...)
  • SHAP global feature importance via feature_importances(...)
  • Time-aware validation helpers:
    • purged_time_series_splits(...)
    • purged_panel_splits(...)
  • Metric helpers:
    • rmse, mae, r2_score
    • pearson_correlation, rank_ic, hit_rate, icir

Benchmark Snapshot

The current public benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost on synthetic and real regression datasets.

Current headline results from the expanded suite:

  • AlloyGBM is best on the panel_time_series benchmark across the tested profiles.
  • AlloyGBM is strong on dow_jones_financial, with its best showing under the deeper low-learning-rate profile.
  • AlloyGBM is competitive on dense_numeric, but still trails XGBoost and CatBoost on RMSE.
  • AlloyGBM currently lags all three libraries on california_housing and bike_sharing.
  • In the latest recorded public-suite refresh, AlloyGBM was also the fastest trainer on most scenario/profile rows.

The latest recorded benchmark refresh after moving dense continuous-feature preprocessing into Rust did not show an RMSE collapse for AlloyGBM, and fit time on the public suite improved materially versus the previous stored comparison.

Furthermore, recent native zero-copy numpy capability for float thresholding has eliminated prediction bottlenecks, delivering up to 75-105x faster predictions on large datasets.

The benchmark runner now also reports stage timings for:

  • Python input adaptation
  • native bridge preparation
  • native training
  • total fit time
  • predict time

The honest short version is:

  • strong on panel_time_series
  • strong on dow_jones_financial
  • weaker on california_housing and bike_sharing

Benchmark tooling and methodology live in benchmarks/README.md.

Current Limitations

  • Regression-only. Classification and ranking are not implemented yet.
  • CPU-only runtime today.
  • Single categorical feature support only.
  • Best performance is still concentrated in time-aware and finance-style structured regression, not broad tabular dominance.
  • The API is intentionally small and still evolving toward a more complete 0.x user-facing surface.

Documentation

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alloygbm-0.1.2.tar.gz (110.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

alloygbm-0.1.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (725.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

alloygbm-0.1.2-cp311-abi3-macosx_11_0_arm64.whl (640.5 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file alloygbm-0.1.2.tar.gz.

File metadata

  • Download URL: alloygbm-0.1.2.tar.gz
  • Upload date:
  • Size: 110.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for alloygbm-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7c58e4c93a01f60d5136a39a9bb170a9231ce68a008d680c7441f681bf1abd79
MD5 b5741340cb8766515abc569c0221026c
BLAKE2b-256 37080d0453212977d8ddcd4c5690e755ee221b355535f6213268caf59cd4f5dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.2.tar.gz:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.1.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for alloygbm-0.1.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7a495c8e4c2b33b6b72cf7f17d2f615fa9fdd4b06538bab715d3b8f6b11017f4
MD5 7d957ecac9ea8c7fc36218eb207f8117
BLAKE2b-256 64f325156e42960637b7ec99c4e500122dafa7acd1feb7ce66ad293e3a210a2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.2-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.1.2-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for alloygbm-0.1.2-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2faa4fbd1af8dcac7204020751dba953dfdddf187a157448d3f08ecfe5eac76f
MD5 c71a9b2d9a19974d4a354a449199cf24
BLAKE2b-256 3e800ee069c30b08b68694828861ea5832e88e1c515e7a21f17f7bf1e73e93d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.2-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page