Skip to main content

fev: Forecast evaluation library

Project description

fev

preprint fev-bench huggingface huggingface License: Apache-2.0

fev (Forecast EValuation library) is a lightweight package that makes it easy to benchmark time series forecasting models.

  • Extensible: Easy to define your own forecasting tasks and benchmarks.
  • Reproducible: Ensures that the results obtained by different users are comparable.
  • Easy to use: Compatible with most popular forecasting libraries.
  • Minimal dependencies: Just a thin wrapper on top of 🤗datasets.

How is fev different from other benchmarking tools?

Existing forecasting benchmarks usually fall into one of two categories:

  • Standalone datasets without any supporting infrastructure. These provide no guarantees that the results obtained by different users are comparable. For example, changing the start date or duration of the forecast horizon totally changes the meaning of the scores.
  • Bespoke end-to-end systems that combine models, datasets and forecasting tasks. Such packages usually come with lots of dependencies and assumptions, which makes extending or integrating these libraries into existing systems difficult.

fev aims for the middle ground - it provides the core benchmarking functionality without introducing unnecessary constraints or bloated dependencies. The library supports point & probabilistic forecasting, different types of covariates, as well as all popular forecasting metrics.

⚙️ Installation

pip install fev

🚀 Quickstart

Create a task from a dataset stored on Hugging Face Hub

import fev

task = fev.Task(
    dataset_path="autogluon/chronos_datasets",
    dataset_config="m4_hourly",
    horizon=24,
)

Iterate over the rolling evaluation windows:

for window in task.iter_windows():
    past_data, future_data = window.get_input_data()
  • past_data contains the past data before the forecast horizon (item ID, past timestamps, target, all covariates).
  • future_data contains future data that is known at prediction time (item ID, future timestamps, and known covariates)

Make predictions

def naive_forecast(y: list, horizon: int) -> dict[str, list[float]]:
    # Make predictions for a single time series
    return {"predictions": [y[-1] for _ in range(horizon)]}

predictions_per_window = []
for window in task.iter_windows():
    past_data, future_data = window.get_input_data()
    predictions = [
        naive_forecast(ts[task.target_column], task.horizon) for ts in past_data
    ]
    predictions_per_window.append(predictions)

Get an evaluation summary

task.evaluation_summary(predictions_per_window, model_name="naive")
# {'model_name': 'naive',
#  'dataset_path': 'autogluon/chronos_datasets',
#  'dataset_config': 'm4_hourly',
#  'horizon': 24,
#  'num_windows': 1,
#  'initial_cutoff': -24,
#  'window_step_size': 24,
#  'min_context_length': 1,
#  'max_context_length': None,
#  'seasonality': 1,
#  'eval_metric': 'MASE',
#  'extra_metrics': [],
#  'quantile_levels': None,
#  'id_column': 'id',
#  'timestamp_column': 'timestamp',
#  'target_column': 'target',
#  'generate_univariate_targets_from': None,
#  'past_dynamic_columns': [],
#  'excluded_columns': [],
#  'task_name': 'm4_hourly',
#  'test_error': 3.815112047601983,
#  'training_time_s': None,
#  'inference_time_s': None,
#  'dataset_fingerprint': '19e36bb78b718d8d',
#  'trained_on_this_dataset': False,
#  'fev_version': '0.6.0',
#  'MASE': 3.815112047601983}

The evaluation summary contains all information necessary to uniquely identify the forecasting task.

Multiple evaluation summaries produced by different models on different tasks can be aggregated into a single table.

# Dataframes, dicts, JSON or CSV files supported
summaries = "https://raw.githubusercontent.com/autogluon/fev/refs/heads/main/benchmarks/example/results/results.csv"
fev.leaderboard(summaries)
# | model_name     |   skill_score |   win_rate | ... |
# |:---------------|--------------:|-----------:| ... |
# | auto_theta     |         0.126 |      0.667 | ... |
# | auto_arima     |         0.113 |      0.667 | ... |
# | auto_ets       |         0.049 |      0.444 | ... |
# | seasonal_naive |         0     |      0.222 | ... |

📚 Documentation

Model wrappers and instructions for contributing models are available in models/.

🏅 Leaderboards

We host leaderboards obtained using fev under https://huggingface.co/spaces/autogluon/fev-bench. This leaderboard includes results for the benchmark from fev-bench: A Realistic Benchmark for Time Series Forecasting. Previous results for Chronos Benchmark II are available in benchmarks/chronos_zeroshot/.

📈 Datasets

Repositories with datasets in format compatible with fev:

Citation

If you find this package useful for your research, please consider citing the associated paper(s):

@article{shchur2025fev,
  title={{fev-bench}: A Realistic Benchmark for Time Series Forecasting},
  author={Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang},
  year={2025},
  eprint={2509.26468},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fev-0.8.0.tar.gz (90.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fev-0.8.0-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file fev-0.8.0.tar.gz.

File metadata

  • Download URL: fev-0.8.0.tar.gz
  • Upload date:
  • Size: 90.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fev-0.8.0.tar.gz
Algorithm Hash digest
SHA256 b072e57a67dddb5da5690bf6050f8cec99ff657bfe6763e694d006d3d7d3f878
MD5 311d84bbab4267bc4b94cc3a4e21efaa
BLAKE2b-256 33b7341cc4192eca61ba3ac6090d516ef50f7e698265cb9587e71a4ff53ffd61

See more details on using hashes here.

Provenance

The following attestation bundles were made for fev-0.8.0.tar.gz:

Publisher: publish-to-pypi.yml on autogluon/fev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fev-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: fev-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fev-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d92bcdf7fa362c5b287b4c5f75c5ee8a27dd231361965d5c28980a99fbc52d44
MD5 6ec1f6b029a9725321828ee2368ca813
BLAKE2b-256 7149def3c0ce6830604b990c2e000a0ef42c67ca841f381ec75df8fef600a5c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for fev-0.8.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on autogluon/fev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page