A lightweight Python framework for rigorous and statistically grounded forecast evaluation, with baseline comparison, horizon-stratified analysis, and Diebold–Mariano testing.
Project description
forecastEval — A Lightweight Framework for Rigorous Forecast Evaluation
forecastEval is an open-source Python library that implements a lightweight, unified framework for rigorous forecast evaluation.
It is designed to reduce the practical barriers to adopting evaluation best practices by providing an accessible API, interpretative reporting, and statistically grounded comparison against appropriate baselines.
- Repository: https://github.com/gecad-group/BCADO_forecast-eval
- PyPI package: https://pypi.org/project/forecastEval/
Abstract
Forecast evaluation is frequently undermined by (i) insufficient baseline awareness, (ii) reliance on aggregated metrics that hide horizon-dependent failure modes, and (iii) absence of statistical validation for performance differences.
forecastEval operationalises a consolidated evaluation framework through a single interface, providing:
- Baseline-aware evaluation using MASE and skill scores, supporting persistence and seasonal naïve baselines;
- Horizon-stratified reporting to expose performance variation across lead times;
- Diebold–Mariano statistical testing with autocorrelation-adjusted variance estimation for principled significance assessment.
The library outputs both a detailed console report with interpretative guidance and an interactive HTML dashboard for transparent communication of results.
Scope
forecastEval is intended for:
- academic benchmarking and reproducible evaluation pipelines;
- practitioners validating model readiness for deployment;
- settings where point forecasts are produced for time series with potential trend, seasonality, and noise.
Current focus: point forecast evaluation (with planned extensions for probabilistic forecasting and regime-aware stratification).
Methodological Coverage
The framework is implemented via a unified ForecastEvaluator class.
Guideline 1 — Baseline-aware performance validation
Objective: ensure meaningful gains over defensible baselines.
- Automatic comparison against:
- Persistence (naïve)
- Seasonal naïve
- Primary error scaling and interpretability via MASE
- Skill scores to contextualise performance relative to baseline behaviour
- Explicit interpretative outcomes (e.g., PASS/FAIL recommendations)
Guideline 2 — Horizon-stratified reporting
Objective: avoid misleading conclusions from aggregated metrics.
- Horizon windows are user-definable (e.g.,
(0, 8), (8, 16), (16, 24)) - Produces horizon-specific metrics and summaries
- Architecture supports extension for:
- Guideline 2b: regime-aware stratification (domain-specific)
- Guideline 2c: uncertainty quantification / probabilistic evaluation
Guideline 3 — Statistical significance testing (Diebold–Mariano)
Objective: distinguish real performance differences from sampling noise.
- Diebold–Mariano test implemented for loss differentials
- Variance estimation adjusted for autocorrelation
- Enables principled acceptance/rejection of “model improves on baseline” claims
Outputs
forecastEval produces two complementary outputs:
-
Console report
- structured metrics, baseline comparison, DM test results
- interpretative guidance and deployment-oriented conclusions
-
Interactive HTML dashboard
- collapsible sections, colour-coded status badges
- baseline comparison tables and horizon-wise breakdowns
- intended for communication and auditability
The HTML report is generated with
generate_html_report(...).
Installation
pip install forecastEval
Dependencies
Minimal runtime depencendies:
numpyscikit-learnpandas
Quick Start
from forecast_eval import ForecastEvaluator
evaluator = ForecastEvaluator(seasonal_period=12)
results = evaluator.evaluate(
y_true=y_test,
y_pred=y_pred,
y_train=y_train,
seasonal=True,
return_loss_series=True,
stratify_by_horizon=True,
horizon_indices=[(0, 8), (8, 16), (16, 24)]
)
print(evaluator.summary_report())
evaluator.generate_html_report("report.html")
Required inputs:
- y_train: training time series;
- y_true: test observations;
- y_pred: model predictions;
- seasonal_period: seasonal cycle length.
A complete end-to-end example script is provided in the repository: example.py. It demonstrates synthetic data generation, model forecasting, baseline comparison, statistical testing, and report generation.
Development Status
The project is under active development.
Current coverage:
- point forecast evaluation;
- Guidelines 1, 2a, and 3.
Planned extensions:
- probabilistic forecasting evaluation (Guideline 2c);
- automated and regime-aware stratification (Guideline 2b).
License
This project is licensed under the GNU General Public License v3.0 (GPL-3.0).
- Redistribution and modification are permitted under GPL-3.0 terms.
- Derivative works must remain under the same license.
- The software is provided without warranty.
See the LICENSE file for full details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forecasteval-0.2.3.tar.gz.
File metadata
- Download URL: forecasteval-0.2.3.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7fd341f9da010e54376fd5a742d876bf5e02104f6ea3abca0cefd9c4b9446ae
|
|
| MD5 |
a641e65b1d7bcd1e2a3329e6826f2655
|
|
| BLAKE2b-256 |
b4ab539d04ca89817ee34285c972ece2735feda8358b24509217f71b57237e4a
|
File details
Details for the file forecasteval-0.2.3-py3-none-any.whl.
File metadata
- Download URL: forecasteval-0.2.3-py3-none-any.whl
- Upload date:
- Size: 29.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33022a344c3f790d1bb29370da29b5633ef31d86f29e612dab69e6aa336a573d
|
|
| MD5 |
2d99abb45906b4af465da22ac6b08ced
|
|
| BLAKE2b-256 |
9d1c8a491961e72f24f0acadd484a2c3fd7674a8561851765569fb6165df7c75
|