Skip to main content

Signal-Adaptive Residual Boosting — two-phase gradient boosting with per-tree OOB step optimization

Project description

sarb — Signal-Adaptive Residual Boosting

PyPI License: MIT Python

Two-phase gradient boosting with per-tree out-of-bag step optimization and residual-anchored feature selection. A scikit-learn compatible Python package mirroring the R package from Yatawara (2026).

Installation

# Basic install
pip install sarb

# With optional boosting backends
pip install sarb[boost]

# Everything
pip install sarb[all]

Quick Start

from sarb import SARBRegressor
from sklearn.datasets import make_friedman1
from sklearn.model_selection import train_test_split

# Generate data
X, y = make_friedman1(n_samples=500, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Fit
model = SARBRegressor(n_trees=500, random_state=42)
model.fit(X_train, y_train)

# Predict
preds = model.predict(X_test)
print(f"R²: {model.score(X_test, y_test):.3f}")

# Feature importance (lambda-weighted)
print("Importances:", model.feature_importances_)

# Anchor frequency (unique to SARB)
print("Anchor freq:", model.anchor_frequency_)

sklearn-Compatible

Works seamlessly with Pipeline, GridSearchCV, cross_val_score:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("sarb", SARBRegressor(random_state=42)),
])

grid = GridSearchCV(pipe, {
    "sarb__warmup_frac": [0.1, 0.25, 0.5],
    "sarb__n_anchors": [1, 2, 3],
}, cv=5, scoring="r2")
grid.fit(X, y)

Unified Wrappers for Other Methods

Same syntax for every method — just change method=:

from sarb import boost_trees, forest_trees

# Boosting
m1 = boost_trees(X, y, method="sarb")
m2 = boost_trees(X, y, method="gbm")
m3 = boost_trees(X, y, method="xgboost")     # needs xgboost installed
m4 = boost_trees(X, y, method="lightgbm")    # needs lightgbm
m5 = boost_trees(X, y, method="catboost")    # needs catboost
m6 = boost_trees(X, y, method="histgbm")

# Forests
m7 = forest_trees(X, y, method="rf")
m8 = forest_trees(X, y, method="extratrees")

# Same .predict() interface for all
m1.predict(X_test)
m3.predict(X_test)

Benchmark Multiple Methods

One call runs CV for all methods + Wilcoxon tests:

from sarb import benchmark

results = benchmark(X, y, methods=["sarb", "gbm", "xgboost", "rf"])
results.print_summary()
BENCHMARK RESULTS (5-fold CV, n=500, p=10)
==========================================================
  Method        RMSE      MAE        R²  Rank     Time
  ────────────────────────────────────────────────────
  sarb        1.9903   1.5234    0.842     1     2.1s ★
  xgboost     2.1542   1.6823    0.818     2     0.4s
  gbm         2.2529   1.7845    0.803     3     1.2s
  rf          2.6626   2.0923    0.726     4     0.8s

Statistical tests (vs sarb):
  vs xgboost    : p = 0.0234 *
  vs gbm        : p = 0.0043 **
  vs rf         : p = 0.0001 ***

Reproduce the Paper

from sarb import reproduce_paper, friedman1

df = friedman1(n_samples=500)
X, y = df.iloc[:, :10].values, df["y"].values

# Same settings as the paper's 907-dataset benchmark
result = reproduce_paper(X, y)
result.print_summary()

Hyperparameter Tuning

from sarb import tune_sarb, sensitivity

# Grid search with CV
best = tune_sarb(X, y, param_grid={
    "warmup_frac": [0.1, 0.25, 0.5],
    "n_anchors":   [1, 2, 3],
    "colsample":   [0.6, 0.8, 1.0],
})
print(best["best_params"])

# How sensitive is SARB to warmup_frac?
sens = sensitivity(X, y, param="warmup_frac")

How SARB Works

Phase 1 (Warmup): Standard gradient boosting with all features and a fixed learning rate. Captures dominant main effects.

Phase 2 (Explore): Each tree is fit on a feature subset anchored by the predictors most correlated with current residuals. Per-tree step size is determined by out-of-bag line search. Uninformative trees receive step size zero and are rejected — typically 40-60% of Phase 2 trees are rejected.

Citation

Yatawara, A. (2026). Signal-Adaptive Residual Boosting for Regression.
Computational Statistics & Data Analysis.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sarb-0.1.0.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sarb-0.1.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file sarb-0.1.0.tar.gz.

File metadata

  • Download URL: sarb-0.1.0.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for sarb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1894ff0d32a7c4337083e3cc8a5d7f63ffe711b0e940bb29475c7a4f2266fda8
MD5 0d7a1a6833c26f3d742592107d6861bc
BLAKE2b-256 cb30c80d639632a20a8aa900002ff10ebe245c303125325743c4ea408851d1c5

See more details on using hashes here.

File details

Details for the file sarb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sarb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for sarb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b23834b2a6de295d7762b7f50140fd3bc96f072d8fb72d6f6da7d561a88f4c97
MD5 dc44158cf0e7206988bd9d37319d0230
BLAKE2b-256 93ed0ad3882f2778d2bab5e14ed47967fafabcc6f0238946dd4613584eb40064

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page