Overfitting detection for Gradient Boosting models using λ-Guard methodology.
Project description
Overfitting detection for Gradient Boosting — no validation set required
Detect the moment when your model stops learning signal and starts memorizing structure.
❓ Why λ-Guard?
In Gradient Boosting, overfitting often appears before the validation error rises.
By that point, the model is already:
- ✂️ Splitting features into extremely fine regions
- 🍃 Fitting leaves supported by very few observations
- 🌪 Sensitive to tiny perturbations
It’s no longer improving predictions, it’s memorizing the training dataset.
λ-Guard detects that moment automatically.
🧠 Core Intuition
A boosting model learns two things simultaneously:
| Component | Role |
|---|---|
| Geometry | partitions the feature space |
| Predictor | assigns values to each region |
Overfitting occurs when:
"Geometry keeps growing, but predictor stops extracting real information."
λ-Guard measures three key signals:
- 📦 Capacity → structural complexity
- 🎯 Alignment → extracted signal
- 🌊 Stability → fragility of predictions
🧩 Representation Matrix
Every tree divides the feature space into leaves.
We record where each observation falls:
Z[i,j] = 1 if sample i falls in leaf j
Z[i,j] = 0 otherwise
- Rows → observations
- Columns → leaves across all trees
Think of Z as the representation learned by the ensemble.
- Linear regression → hat matrix H
- Boosting → representation Z
📦 Capacity — Structural Complexity
- 🔹 Low C → few effective regions
- 🔹 High C → model fragments space
Late-stage boosting increases C quickly, often without improving predictions.
🎯 Alignment — Useful Information
- 🔹 High A → trees add real predictive signal
- 🔹 Low A → trees mostly refine boundaries
"After some trees, alignment saturates."
Boosting continues growing structure even if prediction stops improving.
🌊 Stability — Sensitivity to Perturbations
- 🔹 Low S → smooth, robust model
- 🔹 High S → brittle, sensitive model
Stability is the first signal to explode during overfitting.
🔥 The Overfitting Index λ
| Situation | λ |
|---|---|
| Compact structure + stable predictions | low |
| Many regions + weak signal | high |
| Unstable predictions | very high |
Interpretation: measures how much structural complexity is wasted.
Normalized λ ∈ [0,1] can be used to compare models.
🧪 Structural Overfitting Test
Detect if a few training points dominate the model using approximate leverage: H_ii ≈ Σ_trees (learning_rate / leaf_size) T1 = mean(H_ii) # global complexity T2 = max(H_ii)/mean(H_ii) # local memorization
Bootstrap procedure:
- Repeat B times: resample training data, recompute T1 & T2
- Compute p-values:
- p1 = P(T1_boot ≥ T1_obs)
- p2 = P(T2_boot ≥ T2_obs)
Reject structural stability if:
p1 < α OR p2 < α
📊 What λ-Guard Distinguishes
| Regime | Meaning |
|---|---|
| ✅ Stable | smooth generalization |
| 📈 Global overfitting | too many effective parameters |
| ⚠️ Local memorization | few points dominate |
| 💥 Extreme | interpolation behavior |
🧭 When to Use
- Monitor boosting during training
- Hyperparameter tuning
- Small datasets (no validation split)
- Diagnose late-stage performance collapse
⚙️ Installation
Install via GitHub:
pip install git+https://github.com/faberBI/lambdaguard.git
from sklearn.ensemble import GradientBoostingRegressor
from lambdaguard.ofi import generalization_index, instability_index, create_model
from lambdaguard.lambda_guard import lambda_guard_test, interpret
from lambdaguard.cusum import lambda_detect
import pandas as pd
# Fit a model
model = GradientBoostingRegressor(n_estimators=50, max_depth=3)
model.fit(X_train, y_train)
# Generalization index
GI, A, C = overfitting_index(model, X_train, y_train)
print('Generalization index: ", GI)
# Lambda-guard test
lg_res = lambda_guard_test(model, X_train)
print(interpret(lg_res))
# CUSUM-based detection
df = pd.DataFrame([
{"model": "GBR", "n_estimators": 50, "max_depth": 3, "A": 0.8, "OFI_norm": 0.2},
{"model": "GBR", "n_estimators": 100, "max_depth": 5, "A": 0.85, "OFI_norm": 0.3},
])
cusum_res = lambda_detect(
df,
model_name,
complexity_metric="combined",
lambda_col="OFI_norm",
alignment_col="A",
smooth_window=3,
cusum_threshold_factor=1.5,
baseline_points=10
)
📜 Citation
If you use λ-Guard in your research or projects, please cite the following:
Fabrizio Di Sciorio, PhD
Universidad de Almeria — Business and Economics Department
"λ-Guard: Structural Overfitting Detection for Gradient Boosting Models"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lambda_guard_boosting-0.2.6.tar.gz.
File metadata
- Download URL: lambda_guard_boosting-0.2.6.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
839440bfcc574613573780e774b48fcd852f15f49dc428a5cb8c782c99b02da5
|
|
| MD5 |
f3eb714b839a4b7c2163838000132884
|
|
| BLAKE2b-256 |
8805594eca100034a618863ed3c6c3a47708d0fbbc16588c0330cd6021b2fc78
|
Provenance
The following attestation bundles were made for lambda_guard_boosting-0.2.6.tar.gz:
Publisher:
python-publish.yml on faberBI/lambdaguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lambda_guard_boosting-0.2.6.tar.gz -
Subject digest:
839440bfcc574613573780e774b48fcd852f15f49dc428a5cb8c782c99b02da5 - Sigstore transparency entry: 1096843913
- Sigstore integration time:
-
Permalink:
faberBI/lambdaguard@1166e771be79ef7ea8f5a4fa00de7806febaac32 -
Branch / Tag:
refs/tags/v02.5.2 - Owner: https://github.com/faberBI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1166e771be79ef7ea8f5a4fa00de7806febaac32 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lambda_guard_boosting-0.2.6-py3-none-any.whl.
File metadata
- Download URL: lambda_guard_boosting-0.2.6-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6ac62625da67ad877ddc96d9a0000a90368eacce649ef23ea4de73f9e39ff78
|
|
| MD5 |
6f601125ee491936745505bfb0fe5245
|
|
| BLAKE2b-256 |
e935f8472b44670cba3f534ddb7ed1e87f41939fb74aa2df203e837fa8d06b89
|
Provenance
The following attestation bundles were made for lambda_guard_boosting-0.2.6-py3-none-any.whl:
Publisher:
python-publish.yml on faberBI/lambdaguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lambda_guard_boosting-0.2.6-py3-none-any.whl -
Subject digest:
f6ac62625da67ad877ddc96d9a0000a90368eacce649ef23ea4de73f9e39ff78 - Sigstore transparency entry: 1096843917
- Sigstore integration time:
-
Permalink:
faberBI/lambdaguard@1166e771be79ef7ea8f5a4fa00de7806febaac32 -
Branch / Tag:
refs/tags/v02.5.2 - Owner: https://github.com/faberBI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1166e771be79ef7ea8f5a4fa00de7806febaac32 -
Trigger Event:
push
-
Statement type: