Skip to main content

Safe, delay-aware SARIMAX with rolling evaluation and AIC-based lag selection

Project description

🧭 dynamic-sarimax

PyPI Version Python Versions License Tests


Delay-aware SARIMAX wrapper that fixes the common pitfalls of statsmodels.SARIMAX: proper lag alignment for exogenous variables, train-only scaling, and safe rolling-origin evaluation — all built-in.


✨ Why this exists

Plain SARIMAX requires you to hand-align exogenous regressors (e.g. lagged mobility, weather), risking leakage or off-by-one bugs. dynamic-sarimax makes this safe by construction.

Key guarantees

  • ✅ For delay b, trains only on valid pairs (y_t, x_{t-b}) — never imputes missing lags.
  • ✅ Scalers are fit only on training windows during CV.
  • ✅ Forecasting refuses to run if required future exogenous rows are missing.
  • ✅ Rolling-origin evaluation and AIC-based delay selection included.

🚀 Quickstart

# create venv and install deps
poetry install

# run example (uses example CSV under examples/)
poetry run python examples/ili_quickstart.py
from dynamic_sarimax import (
    SarimaxConfig,
    select_delay_by_aic,
    rolling_evaluate,
)

cfg = SarimaxConfig(order=(5,0,2), seasonal_order=(1,0,0,52))
best_b, best_aic = select_delay_by_aic(y_train, x_train, delays=[1,2,3], cfg=cfg)
print(f"Best lag = {best_b}  |  AIC = {best_aic:.2f}")

res = rolling_evaluate(y, x, cfg, delay=best_b, horizons=24, train_frac=0.8)
print(res.head())

📈 Example output

Chosen delay b (on 80% train): 2 | Train AIC: 1234.56

Per-horizon scores (rolling validation on last 20%):
 h  n_origins     MSE  sMAPE
 1         52   0.103   8.12
 2         51   0.109   8.54
 ...

Average MSE   = 0.124
Average sMAPE = 8.77 %

⚙️ Installation

pip install dynamic-sarimax
# or
poetry add dynamic-sarimax

Python ≥ 3.10, tested on 3.10–3.12.


🧩 Components

Module Purpose
config.py Parameter dataclasses for SARIMAX and lag spec
features.py Safe lagging + scaling transformer
model.py Wrapper around statsmodels.SARIMAX
selection.py Delay (lag) selection via AIC
evaluation.py Rolling-origin cross-validation (new v1.2)
metrics.py MSE & sMAPE helpers

🔁 Rolling validation — strategies & knobs

rolling_evaluate is the batteries-included, safe rolling-origin evaluator.

Signature

agg = rolling_evaluate(
    y, X, cfg,
    delay,                # int or None
    horizons,             # int > 0
    train_frac=0.8,
    min_train=30,
    *,
    # exogenous policy
    allow_future_exog=False,
    X_future_manual=None,
    # window strategy
    strategy="expanding",         # "expanding" | "sliding"
    window=None,                  # required if strategy="sliding"
    refit_every=1,                # >1 = refit every k origins
    return_details=False,         # if True returns (agg, details)
)

🧱 Strategies

Strategy Description
"expanding" Default. Train on [0..o-1] for origin o. The training window grows over time.
"sliding" Train on last window observations [o-window..o-1]. window must be ≥ min_train.

🔁 Refitting cadence

refit_every Behavior
1 (default) Refit at every origin (fully independent fits).
k>1 Refit every k origins; reuse parameters between refits. (Faster)

Future v2 roadmap: optional state reconditioning for partial re-use without full re-fit.


⚖️ Exogenous policy (no-peek by default)

Case Behavior
delay=None Univariate SARIMAX; forecasts all horizons.
delay=int, allow_future_exog=False Evaluate at most steps_eff = min(horizons, delay) per origin — prevents future X leakage.
delay=int, allow_future_exog=True Requires passing X_future_manual with the same columns as X. Allows full-horizon forecasting.

If delay=0 and allow_future_exog=False, no valid horizon exists → raises RuntimeError (explicitly to prevent silent misuse).


📤 Return values

Mode Description
Default Returns aggregate DataFrame (agg) with columns ["h", "n_origins", "MSE", "sMAPE"].
With return_details=True Returns tuple (agg, details), where details has ["origin", "h", "y_true", "y_hat"].

agg.attrs always contains:

{
    "macro_MSE": float,
    "macro_sMAPE": float
}

🧪 Usage patterns

1️⃣ Univariate (default expanding window)

cfg = SarimaxConfig(order=(2,0,1), seasonal_order=(0,0,0,0))
agg = rolling_evaluate(y, X=None, cfg=cfg, delay=None, horizons=12, train_frac=0.8)

2️⃣ With exogenous (no-peek, delay-limited)

cfg = SarimaxConfig(order=(1,0,1), seasonal_order=(0,0,0,0))
agg = rolling_evaluate(y, X, cfg, delay=2, horizons=12, allow_future_exog=False)
# => Evaluates only h=1..2 per origin

3️⃣ With exogenous (opt-in future X)

X_future_manual = pd.DataFrame({...})  # Future exogenous block
agg = rolling_evaluate(
    y, X, cfg,
    delay=2, horizons=12,
    allow_future_exog=True,
    X_future_manual=X_future_manual,
)

4️⃣ Sliding window with refit cadence

agg = rolling_evaluate(
    y, X, cfg,
    delay=1, horizons=6,
    strategy="sliding",
    window=96,
    refit_every=4,
)

5️⃣ Detailed results for plotting

agg, details = rolling_evaluate(
    y, X=None, cfg=cfg,
    delay=None, horizons=8,
    return_details=True,
)
# details has origin, h, y_true, y_hat

⚠️ Common errors (by design)

Error Reason
ValueError("horizons must be positive") Invalid horizons.
ValueError("window must be provided when strategy='sliding'") Missing window for sliding mode.
ValueError("allow_future_exog=True but X_future_manual was not provided.") Required future exog missing.
ValueError("Exogenous columns mismatch...") Column mismatch between X and X_future_manual.
RuntimeError("No evaluations produced...") All origins skipped (e.g., delay=0 with no-peek).

📊 Example: Comparing rolling strategies

cfg = SarimaxConfig(order=(2,0,1), seasonal_order=(0,0,0,0))

agg1 = rolling_evaluate(y, X, cfg, delay=1, horizons=6, strategy="expanding")
agg2 = rolling_evaluate(y, X, cfg, delay=1, horizons=6, strategy="sliding", window=80)
agg3 = rolling_evaluate(y, X, cfg, delay=1, horizons=6, strategy="expanding", refit_every=4)

Plot macro averages or per-horizon curves to compare trade-offs between accuracy and runtime.


🧯 Testing

poetry run pytest -q

Comprehensive tests cover:

  • expanding vs sliding windows
  • refit cadence (refit_every)
  • no-peek & future-exog modes
  • input validation and error cases
  • optional return-details branch

🗺️ Roadmap (v2)

  • State reconditioning between refits (partial parameter reuse).
  • Parallel rolling origins for large datasets.
  • Custom metric hooks and progress callbacks.

🪞 Project links


📜 License

Apache-2.0 © 2025 Nirupom Bose Roy Contributions welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamic_sarimax-1.0.0.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dynamic_sarimax-1.0.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file dynamic_sarimax-1.0.0.tar.gz.

File metadata

  • Download URL: dynamic_sarimax-1.0.0.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.3 Linux/6.14.0-32-generic

File hashes

Hashes for dynamic_sarimax-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f8b66ecf46756d3a778f76adff6f96e20ef6a3b220805c0e80021361b2ec99bb
MD5 b8e769942a1647da5463f5bd1b9cddac
BLAKE2b-256 363692c31f6932081432bc0df88b8c7f6a2bf6109887b0ddb7723f8282ac7e67

See more details on using hashes here.

File details

Details for the file dynamic_sarimax-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dynamic_sarimax-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.3 Linux/6.14.0-32-generic

File hashes

Hashes for dynamic_sarimax-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 299b16dbe9be10e824206098e93ef8c59e929d9c5e847a7b16304bf889a598c2
MD5 e18138b49293ea1cb404b71cd94a7fff
BLAKE2b-256 9ef7cddd58fa3812bf81506b2fc0ba33a180d1fb472cc2722a10d60e6afa237d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page