Signal-Adaptive Residual Boosting — two-phase gradient boosting with per-tree OOB step optimization
Project description
sarb — Signal-Adaptive Residual Boosting
Two-phase gradient boosting with per-tree out-of-bag step optimization and residual-anchored feature selection. A scikit-learn compatible Python package mirroring the R package from Yatawara (2026).
Installation
# Basic install
pip install sarb
# With optional boosting backends
pip install sarb[boost]
# Everything
pip install sarb[all]
Quick Start
from sarb import SARBRegressor
from sklearn.datasets import make_friedman1
from sklearn.model_selection import train_test_split
# Generate data
X, y = make_friedman1(n_samples=500, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# Fit
model = SARBRegressor(n_trees=500, random_state=42)
model.fit(X_train, y_train)
# Predict
preds = model.predict(X_test)
print(f"R²: {model.score(X_test, y_test):.3f}")
# Feature importance (lambda-weighted)
print("Importances:", model.feature_importances_)
# Anchor frequency (unique to SARB)
print("Anchor freq:", model.anchor_frequency_)
sklearn-Compatible
Works seamlessly with Pipeline, GridSearchCV, cross_val_score:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
pipe = Pipeline([
("scaler", StandardScaler()),
("sarb", SARBRegressor(random_state=42)),
])
grid = GridSearchCV(pipe, {
"sarb__warmup_frac": [0.1, 0.25, 0.5],
"sarb__n_anchors": [1, 2, 3],
}, cv=5, scoring="r2")
grid.fit(X, y)
Unified Wrappers for Other Methods
Same syntax for every method — just change method=:
from sarb import boost_trees, forest_trees
# Boosting
m1 = boost_trees(X, y, method="sarb")
m2 = boost_trees(X, y, method="gbm")
m3 = boost_trees(X, y, method="xgboost") # needs xgboost installed
m4 = boost_trees(X, y, method="lightgbm") # needs lightgbm
m5 = boost_trees(X, y, method="catboost") # needs catboost
m6 = boost_trees(X, y, method="histgbm")
# Forests
m7 = forest_trees(X, y, method="rf")
m8 = forest_trees(X, y, method="extratrees")
# Same .predict() interface for all
m1.predict(X_test)
m3.predict(X_test)
Benchmark Multiple Methods
One call runs CV for all methods + Wilcoxon tests:
from sarb import benchmark
results = benchmark(X, y, methods=["sarb", "gbm", "xgboost", "rf"])
results.print_summary()
BENCHMARK RESULTS (5-fold CV, n=500, p=10)
==========================================================
Method RMSE MAE R² Rank Time
────────────────────────────────────────────────────
sarb 1.9903 1.5234 0.842 1 2.1s ★
xgboost 2.1542 1.6823 0.818 2 0.4s
gbm 2.2529 1.7845 0.803 3 1.2s
rf 2.6626 2.0923 0.726 4 0.8s
Statistical tests (vs sarb):
vs xgboost : p = 0.0234 *
vs gbm : p = 0.0043 **
vs rf : p = 0.0001 ***
Reproduce the Paper
from sarb import reproduce_paper, friedman1
df = friedman1(n_samples=500)
X, y = df.iloc[:, :10].values, df["y"].values
# Same settings as the paper's 907-dataset benchmark
result = reproduce_paper(X, y)
result.print_summary()
Hyperparameter Tuning
from sarb import tune_sarb, sensitivity
# Grid search with CV
best = tune_sarb(X, y, param_grid={
"warmup_frac": [0.1, 0.25, 0.5],
"n_anchors": [1, 2, 3],
"colsample": [0.6, 0.8, 1.0],
})
print(best["best_params"])
# How sensitive is SARB to warmup_frac?
sens = sensitivity(X, y, param="warmup_frac")
How SARB Works
Phase 1 (Warmup): Standard gradient boosting with all features and a fixed learning rate. Captures dominant main effects.
Phase 2 (Explore): Each tree is fit on a feature subset anchored by the predictors most correlated with current residuals. Per-tree step size is determined by out-of-bag line search. Uninformative trees receive step size zero and are rejected — typically 40-60% of Phase 2 trees are rejected.
Citation
Yatawara, A. (2026). Signal-Adaptive Residual Boosting for Regression.
Computational Statistics & Data Analysis.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sarb-0.1.0.tar.gz.
File metadata
- Download URL: sarb-0.1.0.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1894ff0d32a7c4337083e3cc8a5d7f63ffe711b0e940bb29475c7a4f2266fda8
|
|
| MD5 |
0d7a1a6833c26f3d742592107d6861bc
|
|
| BLAKE2b-256 |
cb30c80d639632a20a8aa900002ff10ebe245c303125325743c4ea408851d1c5
|
File details
Details for the file sarb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sarb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b23834b2a6de295d7762b7f50140fd3bc96f072d8fb72d6f6da7d561a88f4c97
|
|
| MD5 |
dc44158cf0e7206988bd9d37319d0230
|
|
| BLAKE2b-256 |
93ed0ad3882f2778d2bab5e14ed47967fafabcc6f0238946dd4613584eb40064
|