Skip to main content

Time Series analysis and evaluation tools

Project description

ts-eval Time Series analysis and evaluation tools

pypi Build Status codecov python3 Code style: black License: MIT Contributions welcome


A set of tools to make time series analysis easier.

🧩 Current features

  • N-step ahead rolling origin time series evaluation – using a Jupyter widget.
  • Friedman / Nemenyi rank test (posthoc) – to see which model statistically performs better.
  • Relative Metrics – rMSE, rMAE + Forecasted Value analogues.
  • Prediction Interval Metrics – MIS, rMIS, FVrMIS
  • Fixed fourier series generation – fixed in time according to pandas index
  • Naive/Seasonal models for baseline predictions (with prediction intervals)
  • Statsmodels n-step evaluation – helper functions to evaluate n-step ahead forecasts using Statsmodels models or naive/seasonal naive models.

👩🏾‍🎨 Widget Preview

In:

TSMetrics(target, sm_seas, default)
.use_reference(snaive)
.for_horizons(0, 1, 5, 23)
.for_time_slices(time_slices.all, time_slices.weekend)
.with_description()
.with_prediction_rankings(mtx.FVrMSE, mtx.FVrMIS)
.with_predictions_plot()
.show()

Out: Demo Screenshot

👩🏾‍🚒 Demo

For a more elaborate example, please check out the Demo Notebook.

Alternatively, check out interactive Binder demo

🤦🏾‍ Motivation

While working on a long term time series analysis project, I had a need to summarize and store performance metrics of different models and compare them. As it's daunting to do this across dozens of notebooks, I huddled over some code to do what I want in a few lines of code.

👩🏾‍🚀 Installation

  pip install ts-eval

📋 Release Planning:

  • Release 0.3
    • use pandas better for dataframe styles / viz https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html
    • api like (viz1 | viz2 | viz3 ) / viz4 (patchwork R package)
    • CRPS evaluation
    • dynamic insample forecast for statsmodels
    • PI coverage estimation is really needed (in %)
    • predictions based on loess decomposition https://github.com/jrmontag/STLDecompose
    • dataset description: first index - last index sample (dates or ids)
    • transform with callback
    • altair-like API where you can combine components with +
    • success rate of prediction intervals like here http://freerangestats.info/blog/2016/01/30/hybrid-forecasts
    • describe case when MIS rankings are better for one dataset, but its mean is worse (due to huge outliers)
    • wrapper around xarray datasets, which always returns non-NaN data and/or statistics that I need are computed inside of this wrapper. NaNs are always inside. Doable?
    • boxplots by timestep visualization (with boxplot, outliers for each step)
    • remove collection of deps in style [tests_and_bla_bla] to [tests,bla]
    • links to papers – AvgRelMAE (Davydenko and Fildes, 2013); link to Nemenyi paper / implementations
    • make graphs with PIs more narrow on 0,1,.. steps as there's too much space left (with an option to turn this off).
    • better API for the end user – minimize interaction with xarray
    • pep517 build / wheels / better setup.py as per Hynek
    • travis: add 3.8 default python when it's available
    • docs: supported metrics & API options
    • Maybe use api like Summary in statsmodels MLEModel class, it has extend methods and warn/info messages
    • pretty legend for lots like here https://studywolf.wordpress.com/2017/11/21/matplotlib-legends-for-mean-and-confidence-interval-plots/
    • Look for TODOs
    • changable colors
    • turn off colored display option
    • a nicer API for raw metrics container
    • codacy badge
    • boxplots to compare models (as an alternative)
    • violin plots to compare predictions – areas can be colored, different metrics on left and right (like relative...)
  • Release 0.4
    • holiday/fourier features model
    • fix viz module to have less of important stuff
    • a gif with project visualization
    • check shapes of input arrays (target vs preds), now it doesn't raise an error
    • Baseline prediction using target dataset (without explicit calculation, but losing some time points)

💡 Ideas

  • components
    • Graph: Visualize outliers from confidence interval
    • Multi-comparison component: scikit_posthocs lib or homecooked?
    • inspect true confidence interval coverage via sampling (was done in postings around bayesian dropout sampling)
    • xarrays: compare if compared datasets are actually equal (offets by dates, shapes, maybe even hashing)
    • bin together step performance, like steps 0-1, 2-5, 6-12, 13-24
    • highlight regions using a mask (holidays, etc.)
    • option to view interactively points using widget (plotly)?
    • diagnostics: bias to over / underestimate points
    • animated graphs for change in seasonality
  • features
    • example notebook for fourier?
    • tests for fourier
    • nint generation
  • utils:
    • model adaptor (for different models, generic) which generates 3d prediction dataset. For stastmodels using dyn forecast or kalman filter
    • future importance calculator, but only if I can manipulate input features
    • feature selection using PACF / prewhiten?
  • project
  • sMAPE & MASE can be added for the jupyter evaluation tables
  • ? Residual stats: since I have residuals => Ljung-Box, Heteroscedasticity test, Jarque-Bera – like in statsmodels results, but probably these stats were inspected already by the user... and on which step should they be computed then?

See also

🤹🏼‍♂️ Development

Recommended development workflow:

pipenv install -e .[dev]
pipenv shell

The library doesn't use Flit/Poetry, so the suggested workflow is based on Pipenv (as per https://github.com/pypa/pipenv/issues/1911). Pipfile* are ignored in the .gitignore.

Changelog

0.2.3 (2021-06-03)

Fixes (unreleased changes from 2019, doh)

  • Fix results for 1 time series (noop)

0.2.2 (2019-10-22)

Fixes

  • Fix nan values propagated to Friedman Nemenyi test.
  • Critical distance is returned alongside Friedman Nemenyi test.

0.2.1 (2019-10-18)

Fixes

Outdated import in wheel version of the package.

0.2.0 (2019-10-16)

Features

  • Multiple prediction ranking with Friedman Nemenyi posthoc.
  • Visualization of prediction intervals
  • Indication of prediction ranking in a colorful table
  • Rewrite of the internal computation machinery

0.1.0 (2019-10-04)

Features

  • N-step ahead evaluation widget for Jupyter
  • Absolute & relative metrics for point forecasts and prediction intervals (MSE, MAE, rMSE, rMAE, MIS, rMIS)
  • Naive/Seasonal models for baseline (with prediction intervals)
  • Helper functions to evaluate n-step ahead forecasts using Statsmodels models or naive/seasonal naive models.
  • Holiday features generation and model evaluation on holiday datetimes.

0.0.1 (2019-09-18)

Features

  • Fixed fourier series generation (fixed in time according to pandas index)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ts-eval-0.2.3.tar.gz (1.2 MB view hashes)

Uploaded Source

Built Distribution

ts_eval-0.2.3-py2.py3-none-any.whl (40.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page