Skip to main content

A lightweight Python framework for rigorous and statistically grounded forecast evaluation, with baseline comparison, horizon-stratified analysis, and Diebold–Mariano testing.

Project description

forecastEval — A Lightweight Framework for Rigorous Forecast Evaluation

PyPI Python License: GPLv3 Scope Evaluation Reports Status

forecastEval is an open-source Python library that implements a lightweight, unified framework for rigorous forecast evaluation.
It is designed to reduce the practical barriers to adopting evaluation best practices by providing an accessible API, interpretative reporting, and statistically grounded comparison against appropriate baselines.


Abstract

Forecast evaluation is frequently undermined by (i) insufficient baseline awareness, (ii) reliance on aggregated metrics that hide horizon-dependent failure modes, and (iii) absence of statistical validation for performance differences.
forecastEval operationalises a consolidated evaluation framework through a single interface, providing:

  1. Baseline-aware evaluation using MASE and skill scores, supporting persistence and seasonal naïve baselines;
  2. Horizon-stratified reporting to expose performance variation across lead times;
  3. Diebold–Mariano statistical testing with autocorrelation-adjusted variance estimation for principled significance assessment.

The library outputs both a detailed console report with interpretative guidance and an interactive HTML dashboard for transparent communication of results.


Scope

forecastEval is intended for:

  • academic benchmarking and reproducible evaluation pipelines;
  • practitioners validating model readiness for deployment;
  • settings where point forecasts are produced for time series with potential trend, seasonality, and noise.

Current focus: point forecast evaluation (with planned extensions for probabilistic forecasting and regime-aware stratification).


Methodological Coverage

The framework is implemented via a unified ForecastEvaluator class.

Guideline 1 — Baseline-aware performance validation

Objective: ensure meaningful gains over defensible baselines.

  • Automatic comparison against:
    • Persistence (naïve)
    • Seasonal naïve
  • Primary error scaling and interpretability via MASE
  • Skill scores to contextualise performance relative to baseline behaviour
  • Explicit interpretative outcomes (e.g., PASS/FAIL recommendations)

Guideline 2 — Horizon-stratified reporting

Objective: avoid misleading conclusions from aggregated metrics.

  • Horizon windows are user-definable (e.g., (0, 8), (8, 16), (16, 24))
  • Produces horizon-specific metrics and summaries
  • Architecture supports extension for:
    • Guideline 2b: regime-aware stratification (domain-specific)
    • Guideline 2c: uncertainty quantification / probabilistic evaluation

Guideline 3 — Statistical significance testing (Diebold–Mariano)

Objective: distinguish real performance differences from sampling noise.

  • Diebold–Mariano test implemented for loss differentials
  • Variance estimation adjusted for autocorrelation
  • Enables principled acceptance/rejection of “model improves on baseline” claims

Outputs

forecastEval produces two complementary outputs:

  1. Console report

    • structured metrics, baseline comparison, DM test results
    • interpretative guidance and deployment-oriented conclusions
  2. Interactive HTML dashboard

    • collapsible sections, colour-coded status badges
    • baseline comparison tables and horizon-wise breakdowns
    • intended for communication and auditability

The HTML report is generated with generate_html_report(...).


Installation

pip install forecastEval

Dependencies

Minimal runtime depencendies:

  • numpy
  • scikit-learn
  • pandas

Quick Start

from forecast_eval import ForecastEvaluator

evaluator = ForecastEvaluator(seasonal_period=12)

results = evaluator.evaluate(
    y_true=y_test,
    y_pred=y_pred,
    y_train=y_train,
    seasonal=True,
    return_loss_series=True,
    stratify_by_horizon=True,
    horizon_indices=[(0, 8), (8, 16), (16, 24)]
)

print(evaluator.summary_report())
evaluator.generate_html_report("report.html")

Required inputs:

  • y_train: training time series;
  • y_true: test observations;
  • y_pred: model predictions;
  • seasonal_period: seasonal cycle length.

A complete end-to-end example script is provided in the repository: example.py. It demonstrates synthetic data generation, model forecasting, baseline comparison, statistical testing, and report generation.

Development Status

The project is under active development.

Current coverage:

  • point forecast evaluation;
  • Guidelines 1, 2a, and 3.

Planned extensions:

  • probabilistic forecasting evaluation (Guideline 2c);
  • automated and regime-aware stratification (Guideline 2b).

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

  • Redistribution and modification are permitted under GPL-3.0 terms.
  • Derivative works must remain under the same license.
  • The software is provided without warranty.

See the LICENSE file for full details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forecasteval-0.2.3.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forecasteval-0.2.3-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file forecasteval-0.2.3.tar.gz.

File metadata

  • Download URL: forecasteval-0.2.3.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for forecasteval-0.2.3.tar.gz
Algorithm Hash digest
SHA256 a7fd341f9da010e54376fd5a742d876bf5e02104f6ea3abca0cefd9c4b9446ae
MD5 a641e65b1d7bcd1e2a3329e6826f2655
BLAKE2b-256 b4ab539d04ca89817ee34285c972ece2735feda8358b24509217f71b57237e4a

See more details on using hashes here.

File details

Details for the file forecasteval-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: forecasteval-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for forecasteval-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 33022a344c3f790d1bb29370da29b5633ef31d86f29e612dab69e6aa336a573d
MD5 2d99abb45906b4af465da22ac6b08ced
BLAKE2b-256 9d1c8a491961e72f24f0acadd484a2c3fd7674a8561851765569fb6165df7c75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page