Skip to main content

Rust-first gradient boosting for structured regression with time-aware validation utilities and Python bindings

Project description

AlloyGBM

AlloyGBM is a Rust-first gradient boosting library for structured regression, with a Python API focused on fast native execution, deterministic training, and time-aware tabular workflows.

It is currently strongest on panel and finance-style regression problems where leakage-aware validation and practical iteration speed matter. It also includes native artifact prediction, SHAP explanations, and purged time-series split helpers in the Python package.

When To Use AlloyGBM

AlloyGBM is a good fit when you want:

  • a native-backed gradient boosting regressor with a small Python API surface
  • deterministic CPU training and inference
  • time-aware validation helpers for forecasting or panel-style workflows
  • native prediction from serialized artifacts
  • SHAP-based local explanations and global feature importances

If you need the broadest possible objective support, classification, ranking, multiple categorical columns, or the strongest out-of-the-box results on generic tabular benchmarks, you should still expect XGBoost, LightGBM, or CatBoost to be stronger today.

Installation

PyPI:

pip install alloygbm

From source:

python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release

AlloyGBM currently targets Python 3.10+ and uses a native Rust extension module.

Initial 0.1.0 packaging policy:

  • tested directly on macOS Apple Silicon
  • planned wheel targets: macOS arm64 and Linux x86_64
  • Windows support is deferred until after 0.1.0
  • source distribution remains the fallback for unsupported environments

Minimal Example

from alloygbm import GBMRegressor, rmse

X_train = [
    [0.0, 1.0],
    [1.0, 0.0],
    [2.0, 1.0],
    [3.0, 0.0],
]
y_train = [0.2, 0.9, 1.8, 2.7]

X_test = [
    [1.5, 1.0],
    [2.5, 0.0],
]
y_test = [1.3, 2.3]

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    training_policy="auto",
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(predictions)
print(rmse(y_test, predictions))

Time-Aware Validation Example

from alloygbm import GBMRegressor, purged_time_series_splits, rmse

rows = [
    [0.1, 1.0],
    [0.2, 1.1],
    [0.4, 0.9],
    [0.6, 1.2],
    [0.8, 1.3],
    [1.0, 1.4],
]
targets = [0.0, 0.1, 0.2, 0.5, 0.8, 1.0]
time_index = [0, 0, 1, 1, 2, 2]

splits = purged_time_series_splits(
    time_index,
    n_splits=3,
    purge_gap=0,
    embargo=0,
)

fold_scores = []
for train_idx, test_idx in splits:
    model = GBMRegressor(
        learning_rate=0.05,
        max_depth=6,
        n_estimators=400,
        deterministic=True,
        seed=7,
    )
    X_train = [rows[i] for i in train_idx]
    y_train = [targets[i] for i in train_idx]
    X_test = [rows[i] for i in test_idx]
    y_test = [targets[i] for i in test_idx]

    model.fit(X_train, y_train)
    fold_scores.append(rmse(y_test, model.predict(X_test)))

print(fold_scores)

For panel data, use purged_panel_splits(...).

Validation And Early Stopping

from alloygbm import GBMRegressor

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    early_stopping_rounds=50,
    min_validation_improvement=1e-4,
    min_data_in_leaf=32,
    lambda_l2=1.0,
    deterministic=True,
    seed=7,
)

model.fit(
    X_train,
    y_train,
    eval_set=(X_valid, y_valid),
)

print(model.best_iteration_)
print(model.best_score_)
print(model.n_estimators_)
print(model.evals_result_)
print(model.fit_timing_)

early_stopping_rounds is explicit-only: pass eval_set=(X_valid, y_valid) when you enable it.

Feature Summary

  • Native Rust-backed training and prediction from Python
  • GBMRegressor with deterministic training controls and dataset-aware training_policy
  • Explicit validation support via fit(..., eval_set=..., eval_time_index=...)
  • Early stopping with fitted summaries: best_iteration_, best_score_, n_estimators_, evals_result_
  • Leaf and split controls: min_data_in_leaf, lambda_l1, lambda_l2, min_child_hessian
  • Continuous-feature binning strategies: linear, rank, quantile
  • Optional single-column categorical encoding path
  • Artifact-backed prediction via predict_from_artifact(...)
  • SHAP row explanations via shap_values(...)
  • SHAP global feature importance via feature_importances(...)
  • Time-aware validation helpers:
    • purged_time_series_splits(...)
    • purged_panel_splits(...)
  • Metric helpers:
    • rmse, mae, r2_score
    • pearson_correlation, rank_ic, hit_rate, icir

Benchmark Snapshot

The current public benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost on synthetic and real regression datasets.

Current headline results from the expanded suite:

  • AlloyGBM is best on the panel_time_series benchmark across the tested profiles.
  • AlloyGBM is strong on dow_jones_financial, with its best showing under the deeper low-learning-rate profile.
  • AlloyGBM is competitive on dense_numeric, but still trails XGBoost and CatBoost on RMSE.
  • AlloyGBM currently lags all three libraries on california_housing and bike_sharing.
  • In the latest recorded public-suite refresh, AlloyGBM was also the fastest trainer on most scenario/profile rows.

The latest recorded benchmark refresh after moving dense continuous-feature preprocessing into Rust did not show an RMSE collapse for AlloyGBM, and fit time on the public suite improved materially versus the previous stored comparison. The benchmark runner now also reports stage timings for:

  • Python input adaptation
  • native bridge preparation
  • native training
  • total fit time
  • predict time

The honest short version is:

  • strong on panel_time_series
  • strong on dow_jones_financial
  • weaker on california_housing and bike_sharing

Benchmark tooling and methodology live in benchmarks/README.md.

Current Limitations

  • Regression-only. Classification and ranking are not implemented yet.
  • CPU-only runtime today.
  • Single categorical feature support only.
  • Best performance is still concentrated in time-aware and finance-style structured regression, not broad tabular dominance.
  • The API is intentionally small and still evolving toward a more complete 0.x user-facing surface.

Documentation

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alloygbm-0.1.1.tar.gz (100.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

alloygbm-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (613.1 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

alloygbm-0.1.1-cp310-abi3-macosx_11_0_arm64.whl (539.6 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file alloygbm-0.1.1.tar.gz.

File metadata

  • Download URL: alloygbm-0.1.1.tar.gz
  • Upload date:
  • Size: 100.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for alloygbm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5ba21a7669865e9aa4577589d7a8835ee2b621e8e6aa17094d479c73e72389c6
MD5 7e83c4f4c136ed07e889cc89c2f20948
BLAKE2b-256 59eee9be43207a96b62d35d34665e1f66427cdd55eb34e4fc692c88b6b157ab5

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.1.tar.gz:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for alloygbm-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 37e2d800ae2a16c43604dad332a13fdba1821ffa9c9a934c60bef1be1f7da28b
MD5 213295f4f014891cfb188eae3e5f2726
BLAKE2b-256 0fde909b9e508518d48869c0a656199953421f9aa6461b8099ed5d9d1a013da0

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.1.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for alloygbm-0.1.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 62620036182a7a930dcb00191d9f05c23729b1544758989ce3c03b64d21d53ba
MD5 b21ce772e76d6089821fbcd9e8dcd468
BLAKE2b-256 997982b80364908d7b2902f360ac3ae9a58e7374f53659c493af91642e4b79ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.1-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page