A lightweight Python framework for rigorous and statistically grounded forecast evaluation, with baseline comparison, horizon-stratified analysis, and Diebold–Mariano testing.

These details have not been verified by PyPI

Project links

Project description

forecastEval — A Lightweight Framework for Rigorous Forecast Evaluation

forecastEval is an open-source Python library that implements a lightweight, unified framework for rigorous forecast evaluation.
It is designed to reduce the practical barriers to adopting evaluation best practices by providing an accessible API, interpretative reporting, and statistically grounded comparison against appropriate baselines.

Repository: https://github.com/gecad-group/BCADO_forecast-eval
PyPI package: https://pypi.org/project/forecastEval/

Abstract

Forecast evaluation is frequently undermined by (i) insufficient baseline awareness, (ii) reliance on aggregated metrics that hide horizon-dependent failure modes, and (iii) absence of statistical validation for performance differences.
forecastEval operationalises a consolidated evaluation framework through a single interface, providing:

Baseline-aware evaluation using MASE and skill scores, supporting persistence and seasonal naïve baselines;
Horizon-stratified reporting to expose performance variation across lead times;
Diebold–Mariano statistical testing with autocorrelation-adjusted variance estimation for principled significance assessment.

The library outputs both a detailed console report with interpretative guidance and an interactive HTML dashboard for transparent communication of results.

Scope

forecastEval is intended for:

academic benchmarking and reproducible evaluation pipelines;
practitioners validating model readiness for deployment;
settings where point forecasts are produced for time series with potential trend, seasonality, and noise.

Current focus: point forecast evaluation (with planned extensions for probabilistic forecasting and regime-aware stratification).

Methodological Coverage

The framework is implemented via a unified ForecastEvaluator class.

Guideline 1 — Baseline-aware performance validation

Objective: ensure meaningful gains over defensible baselines.

Automatic comparison against:
- Persistence (naïve)
- Seasonal naïve
Primary error scaling and interpretability via MASE
Skill scores to contextualise performance relative to baseline behaviour
Explicit interpretative outcomes (e.g., PASS/FAIL recommendations)

Guideline 2 — Horizon-stratified reporting

Objective: avoid misleading conclusions from aggregated metrics.

Horizon windows are user-definable (e.g., (0, 8), (8, 16), (16, 24))
Produces horizon-specific metrics and summaries
Architecture supports extension for:
- Guideline 2b: regime-aware stratification (domain-specific)
- Guideline 2c: uncertainty quantification / probabilistic evaluation

Guideline 3 — Statistical significance testing (Diebold–Mariano)

Objective: distinguish real performance differences from sampling noise.

Diebold–Mariano test implemented for loss differentials
Variance estimation adjusted for autocorrelation
Enables principled acceptance/rejection of “model improves on baseline” claims

Outputs

forecastEval produces two complementary outputs:

Console report
- structured metrics, baseline comparison, DM test results
- interpretative guidance and deployment-oriented conclusions
Interactive HTML dashboard
- collapsible sections, colour-coded status badges
- baseline comparison tables and horizon-wise breakdowns
- intended for communication and auditability

The HTML report is generated with generate_html_report(...).

Installation

pip install forecastEval

Dependencies

Minimal runtime depencendies:

numpy
scikit-learn
pandas

Quick Start

from forecast_eval import ForecastEvaluator

evaluator = ForecastEvaluator(seasonal_period=12)

results = evaluator.evaluate(
    y_true=y_test,
    y_pred=y_pred,
    y_train=y_train,
    seasonal=True,
    return_loss_series=True,
    stratify_by_horizon=True,
    horizon_indices=[(0, 8), (8, 16), (16, 24)]
)

print(evaluator.summary_report())
evaluator.generate_html_report("report.html")

Required inputs:

y_train: training time series;
y_true: test observations;
y_pred: model predictions;
seasonal_period: seasonal cycle length.

A complete end-to-end example script is provided in the repository: example.py. It demonstrates synthetic data generation, model forecasting, baseline comparison, statistical testing, and report generation.

Development Status

The project is under active development.

Current coverage:

point forecast evaluation;
Guidelines 1, 2a, and 3.

Planned extensions:

probabilistic forecasting evaluation (Guideline 2c);
automated and regime-aware stratification (Guideline 2b).

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

Redistribution and modification are permitted under GPL-3.0 terms.
Derivative works must remain under the same license.
The software is provided without warranty.

See the LICENSE file for full details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Jan 13, 2026

This version

0.2.3

Jan 13, 2026

0.2.2

Jan 13, 2026

0.2.1

Jan 12, 2026

0.0.3

Jan 12, 2026

0.0.2

Jan 8, 2026

0.0.1

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forecasteval-0.2.3.tar.gz (26.9 kB view details)

Uploaded Jan 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

forecasteval-0.2.3-py3-none-any.whl (29.9 kB view details)

Uploaded Jan 13, 2026 Python 3

File details

Details for the file forecasteval-0.2.3.tar.gz.

File metadata

Download URL: forecasteval-0.2.3.tar.gz
Upload date: Jan 13, 2026
Size: 26.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for forecasteval-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`a7fd341f9da010e54376fd5a742d876bf5e02104f6ea3abca0cefd9c4b9446ae`
MD5	`a641e65b1d7bcd1e2a3329e6826f2655`
BLAKE2b-256	`b4ab539d04ca89817ee34285c972ece2735feda8358b24509217f71b57237e4a`

See more details on using hashes here.

File details

Details for the file forecasteval-0.2.3-py3-none-any.whl.

File metadata

Download URL: forecasteval-0.2.3-py3-none-any.whl
Upload date: Jan 13, 2026
Size: 29.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for forecasteval-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`33022a344c3f790d1bb29370da29b5633ef31d86f29e612dab69e6aa336a573d`
MD5	`2d99abb45906b4af465da22ac6b08ced`
BLAKE2b-256	`9d1c8a491961e72f24f0acadd484a2c3fd7674a8561851765569fb6165df7c75`

See more details on using hashes here.

forecastEval 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

forecastEval — A Lightweight Framework for Rigorous Forecast Evaluation

Abstract

Scope

Methodological Coverage

Guideline 1 — Baseline-aware performance validation

Guideline 2 — Horizon-stratified reporting

Guideline 3 — Statistical significance testing (Diebold–Mariano)

Outputs

Installation

Dependencies

Quick Start

Required inputs:

Development Status

Current coverage:

Planned extensions:

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes