Skip to main content

Rust-first gradient boosting for structured regression with time-aware validation utilities and Python bindings

Project description

AlloyGBM

AlloyGBM is a Rust-first gradient boosting library for structured regression, with a Python API focused on fast native execution, deterministic training, and time-aware tabular workflows.

It is currently strongest on panel and finance-style regression problems where leakage-aware validation and practical iteration speed matter. It also includes native artifact prediction, SHAP explanations, and purged time-series split helpers in the Python package.

When To Use AlloyGBM

AlloyGBM is a good fit when you want:

  • a native-backed gradient boosting regressor with a small Python API surface
  • deterministic CPU training and inference
  • time-aware validation helpers for forecasting or panel-style workflows
  • native prediction from serialized artifacts
  • SHAP-based local explanations and global feature importances

If you need the broadest possible objective support, classification, ranking, multiple categorical columns, or the strongest out-of-the-box results on generic tabular benchmarks, you should still expect XGBoost, LightGBM, or CatBoost to be stronger today.

Installation

PyPI:

pip install alloygbm

From source:

python -m pip install --upgrade maturin
maturin develop --manifest-path bindings/python/Cargo.toml --release

AlloyGBM currently targets Python 3.10+ and uses a native Rust extension module.

Initial 0.1.0 packaging policy:

  • tested directly on macOS Apple Silicon
  • planned wheel targets: macOS arm64 and Linux x86_64
  • Windows support is deferred until after 0.1.0
  • source distribution remains the fallback for unsupported environments

Minimal Example

from alloygbm import GBMRegressor, rmse

X_train = [
    [0.0, 1.0],
    [1.0, 0.0],
    [2.0, 1.0],
    [3.0, 0.0],
]
y_train = [0.2, 0.9, 1.8, 2.7]

X_test = [
    [1.5, 1.0],
    [2.5, 0.0],
]
y_test = [1.3, 2.3]

model = GBMRegressor(
    learning_rate=0.05,
    max_depth=6,
    n_estimators=1200,
    training_policy="auto",
    deterministic=True,
    seed=7,
)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(predictions)
print(rmse(y_test, predictions))

Time-Aware Validation Example

from alloygbm import GBMRegressor, purged_time_series_splits, rmse

rows = [
    [0.1, 1.0],
    [0.2, 1.1],
    [0.4, 0.9],
    [0.6, 1.2],
    [0.8, 1.3],
    [1.0, 1.4],
]
targets = [0.0, 0.1, 0.2, 0.5, 0.8, 1.0]
time_index = [0, 0, 1, 1, 2, 2]

splits = purged_time_series_splits(
    time_index,
    n_splits=3,
    purge_gap=0,
    embargo=0,
)

fold_scores = []
for train_idx, test_idx in splits:
    model = GBMRegressor(
        learning_rate=0.05,
        max_depth=6,
        n_estimators=400,
        deterministic=True,
        seed=7,
    )
    X_train = [rows[i] for i in train_idx]
    y_train = [targets[i] for i in train_idx]
    X_test = [rows[i] for i in test_idx]
    y_test = [targets[i] for i in test_idx]

    model.fit(X_train, y_train)
    fold_scores.append(rmse(y_test, model.predict(X_test)))

print(fold_scores)

For panel data, use purged_panel_splits(...).

Feature Summary

  • Native Rust-backed training and prediction from Python
  • GBMRegressor with deterministic training controls and dataset-aware training_policy
  • Continuous-feature binning strategies: linear, rank, quantile
  • Optional single-column categorical encoding path
  • Artifact-backed prediction via predict_from_artifact(...)
  • SHAP row explanations via shap_values(...)
  • SHAP global feature importance via feature_importances(...)
  • Time-aware validation helpers:
    • purged_time_series_splits(...)
    • purged_panel_splits(...)
  • Metric helpers:
    • rmse, mae, r2_score
    • pearson_correlation, rank_ic, hit_rate, icir

Benchmark Snapshot

The current public benchmark suite compares AlloyGBM against XGBoost, LightGBM, and CatBoost on synthetic and real regression datasets.

Current headline results from the expanded suite:

  • AlloyGBM is best on the panel_time_series benchmark across the tested profiles.
  • AlloyGBM is strong on dow_jones_financial, with its best showing under the deeper low-learning-rate profile.
  • AlloyGBM is competitive on dense_numeric, but still trails XGBoost and CatBoost on RMSE.
  • AlloyGBM currently lags all three libraries on california_housing and bike_sharing.
  • LightGBM is usually the fastest trainer in the comparison set.

The honest short version is:

  • strong on panel_time_series
  • strong on dow_jones_financial
  • weaker on california_housing and bike_sharing

Benchmark tooling and methodology live in benchmarks/README.md.

Current Limitations

  • Regression-only. Classification and ranking are not implemented yet.
  • CPU-only runtime today.
  • Single categorical feature support only.
  • Best performance is still concentrated in time-aware and finance-style structured regression, not broad tabular dominance.
  • The API is intentionally small and still evolving toward a more complete 0.x user-facing surface.

Documentation

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alloygbm-0.1.0.tar.gz (92.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

alloygbm-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (571.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

alloygbm-0.1.0-cp310-abi3-macosx_11_0_arm64.whl (502.5 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file alloygbm-0.1.0.tar.gz.

File metadata

  • Download URL: alloygbm-0.1.0.tar.gz
  • Upload date:
  • Size: 92.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for alloygbm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f3cd50d882ef37defa2eba379845fa8207d22f875ace2bec17219b6f2da379a1
MD5 2ffaad3987c0bd5fcc73948288eb3575
BLAKE2b-256 5f176a911a5a9c0294e9e0f23f606fe7be1a23ad2a998dfa6d7766446a3eff8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.0.tar.gz:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for alloygbm-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 43ab95a2d2a78d9a4d160fc717a29ce8d8bcc754147ed38042b0df2da49e031b
MD5 d6b7f25af23afed77d4a0d61b0fa682a
BLAKE2b-256 fd442a90802b9c437bc7cf809068f409a02e97058cfff2f2f897d04c24c16afa

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alloygbm-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for alloygbm-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 48e19b125661153b8eaf708d76bb8540c269c75b132a2e863b733ec94054764a
MD5 5c96d8342092999372f61d6bd626f6e7
BLAKE2b-256 441d10a314afc5cd67b4d790054aa0eacca05f1bebe4711d021513b8aa1c57c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloygbm-0.1.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on LGA-Personal/AlloyGBM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page