Time Series analysis and evaluation tools
ts-eval Time Series analysis and evaluation tools
A set of tools to make time series analysis easier.
🧩 Current features
- N-step ahead rolling origin time series evaluation – using a Jupyter widget.
- Friedman / Nemenyi rank test (posthoc) – to see which model statistically performs better.
- Relative Metrics – rMSE, rMAE + Forecasted Value analogues.
- Prediction Interval Metrics – MIS, rMIS, FVrMIS
- Fixed fourier series generation – fixed in time according to pandas index
- Naive/Seasonal models for baseline predictions (with prediction intervals)
- Statsmodels n-step evaluation – helper functions to evaluate n-step ahead forecasts using Statsmodels models or naive/seasonal naive models.
👩🏾🎨 Widget Preview
TSMetrics(target, sm_seas, default) .use_reference(snaive) .for_horizons(0, 1, 5, 23) .for_time_slices(time_slices.all, time_slices.weekend) .with_description() .with_prediction_rankings(mtx.FVrMSE, mtx.FVrMIS) .with_predictions_plot() .show()
For a more elaborate example, please check out the Demo Notebook.
Alternatively, check out interactive Binder demo
While working on a long term time series analysis project, I had a need to summarize and store performance metrics of different models and compare them. As it's daunting to do this across dozens of notebooks, I huddled over some code to do what I want in a few lines of code.
pip install ts-eval
📋 Release Planning:
- Release 0.3
- use pandas better for dataframe styles / viz https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html
- api like (viz1 | viz2 | viz3 ) / viz4 (patchwork R package)
- CRPS evaluation
- dynamic insample forecast for statsmodels
- PI coverage estimation is really needed (in %)
- predictions based on loess decomposition https://github.com/jrmontag/STLDecompose
- dataset description: first index - last index sample (dates or ids)
- transform with callback
- altair-like API where you can combine components with +
- success rate of prediction intervals like here http://freerangestats.info/blog/2016/01/30/hybrid-forecasts
- describe case when MIS rankings are better for one dataset, but its mean is worse (due to huge outliers)
- wrapper around xarray datasets, which always returns non-NaN data and/or statistics that I need are computed inside of this wrapper. NaNs are always inside. Doable?
- boxplots by timestep visualization (with boxplot, outliers for each step)
- remove collection of deps in style [tests_and_bla_bla] to [tests,bla]
- links to papers – AvgRelMAE (Davydenko and Fildes, 2013); link to Nemenyi paper / implementations
- make graphs with PIs more narrow on 0,1,.. steps as there's too much space left (with an option to turn this off).
- better API for the end user – minimize interaction with xarray
- pep517 build / wheels / better setup.py as per Hynek
- travis: add 3.8 default python when it's available
- docs: supported metrics & API options
- Maybe use api like Summary in statsmodels MLEModel class, it has extend methods and warn/info messages
- pretty legend for lots like here https://studywolf.wordpress.com/2017/11/21/matplotlib-legends-for-mean-and-confidence-interval-plots/
- Look for TODOs
- changable colors
- turn off colored display option
- a nicer API for raw metrics container
- codacy badge
- boxplots to compare models (as an alternative)
- violin plots to compare predictions – areas can be colored, different metrics on left and right (like relative...)
- Release 0.4
- holiday/fourier features model
- fix viz module to have less of important stuff
- a gif with project visualization
- check shapes of input arrays (target vs preds), now it doesn't raise an error
- Baseline prediction using target dataset (without explicit calculation, but losing some time points)
- Graph: Visualize outliers from confidence interval
- Multi-comparison component: scikit_posthocs lib or homecooked?
- inspect true confidence interval coverage via sampling (was done in postings around bayesian dropout sampling)
- xarrays: compare if compared datasets are actually equal (offets by dates, shapes, maybe even hashing)
- bin together step performance, like steps 0-1, 2-5, 6-12, 13-24
- highlight regions using a mask (holidays, etc.)
- option to view interactively points using widget (plotly)?
- diagnostics: bias to over / underestimate points
- animated graphs for change in seasonality
- example notebook for fourier?
- tests for fourier
- nint generation
- model adaptor (for different models, generic) which generates 3d prediction dataset. For stastmodels using dyn forecast or kalman filter
- future importance calculator, but only if I can manipulate input features
- feature selection using PACF / prewhiten?
- more defensive style (add arg checks, so it's easier to understand what is going on)
- https://timothycrosley.github.io/portray/ for docs
- sMAPE & MASE can be added for the jupyter evaluation tables
- ? Residual stats: since I have residuals => Ljung-Box, Heteroscedasticity test, Jarque-Bera – like in statsmodels results, but probably these stats were inspected already by the user... and on which step should they be computed then?
- RMC as an anlogue for nemenyi https://github.com/config-i1/greybox/blob/master/R/rmc.R
Recommended development workflow:
pipenv install -e .[dev] pipenv shell
The library doesn't use Flit/Poetry, so the suggested workflow is based on Pipenv (as per https://github.com/pypa/pipenv/issues/1911). Pipfile* are ignored in the .gitignore.
Fixes (unreleased changes from 2019, doh)
- Fix results for 1 time series (noop)
- Fix nan values propagated to Friedman Nemenyi test.
- Critical distance is returned alongside Friedman Nemenyi test.
Outdated import in wheel version of the package.
- Multiple prediction ranking with Friedman Nemenyi posthoc.
- Visualization of prediction intervals
- Indication of prediction ranking in a colorful table
- Rewrite of the internal computation machinery
- N-step ahead evaluation widget for Jupyter
- Absolute & relative metrics for point forecasts and prediction intervals (MSE, MAE, rMSE, rMAE, MIS, rMIS)
- Naive/Seasonal models for baseline (with prediction intervals)
- Helper functions to evaluate n-step ahead forecasts using Statsmodels models or naive/seasonal naive models.
- Holiday features generation and model evaluation on holiday datetimes.
- Fixed fourier series generation (fixed in time according to pandas index)
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for ts_eval-0.2.3-py2.py3-none-any.whl