Skip to main content

Time series forecasting suite using statistical models

Project description

Nixtla   Tweet  Slack

All Contributors

Statistical ⚡️ Forecast

Lightning fast forecasting with statistical and econometric models

CI Python PyPi conda-nixtla License: GPLv3 docs

StatsForecast offers a collection of widely used univariate time series forecasting models, including exponential smoothing and automatic ARIMA modeling optimized for high performance using numba.

🔥 Features

  • Fastest and most accurate auto_arima in Python and R.
  • New!: Distributed compuration in clusters with ray.
  • New!: Good Ol' sklearn syntax with AutoARIMA().fit(y).predict(h=7).
  • New!: Inclusion of exogenous variables and prediction intervals.
  • Out of the box implementation of exponential smoothing, croston, sesonal naive, random walk with drift and tbs.
  • 20x faster than pmdarima.
  • 1.5x faster than R.
  • 500x faster than Prophet.
  • Compiled to high performance machine code through numba.
  • 1,000,000 series in 30 min with ray.

Missing something? Please open an issue or write us in Slack

📖 Why?

Current Python alternatives for statistical models are slow and inaccurate. So we created a library that can be used to forecast in production environments or as benchmarks. StatsForecast includes an extensive battery of models that can efficiently fit thousands of time series.

🔬 Accuracy

We compared accuracy and speed against: pmdarima, Rob Hyndman's forecast package and Facebook's Prophet. We used the Daily, Hourly and Weekly data from the M4 competition.

The following table summarizes the results. As can be seen, our auto_arima is the best model in accuracy (measured by the MASE loss) and time, even compared with the original implementation in R.

dataset metric nixtla pmdarima [1] auto_arima_r prophet
M4-Daily MASE 3.26 3.35 4.46 14.26
M4-Daily time 1.41 27.61 1.81 514.33
M4-Hourly MASE 0.92 --- 1.02 1.78
M4-Hourly time 12.92 --- 23.95 17.27
M4-Weekly MASE 2.34 2.47 2.58 7.29
M4-Weekly time 0.42 2.92 0.22 19.82

[1] The model auto_arima from pmdarima had problems with Hourly data. An issue was opened in their repo.

The following table summarizes the data details.

group n_series mean_length std_length min_length max_length
Daily 4,227 2,371 1,756 107 9,933
Hourly 414 901 127 748 1,008
Weekly 359 1,035 707 93 2,610

⏲ Computational efficiency

We measured the computational time against the number of time series. The following graph shows the results. As we can see, the fastest model is our auto_arima.

Nixtla vs Prophet

You can reproduce the results here.

External regressors

Results with external regressors are qualitatively similar to the reported before. You can find the complete experiments here.

👾 Less code

pmd to stats

📖 Documentation

Here is a link to the documentation.

🧬 Getting Started Open In Colab

Example Jupyter Notebook

💻 Installation

PyPI

You can install the released version of StatsForecast from the Python package index with:

pip install statsforecast

(Installing inside a python virtualenvironment or a conda environment is recommended.)

Conda

Also you can install the released version of StatsForecast from conda with:

conda install -c conda-forge statsforecast

(Installing inside a python virtualenvironment or a conda environment is recommended.)

Dev Mode If you want to make some modifications to the code and see the effects in real time (without reinstalling), follow the steps below:
git clone https://github.com/Nixtla/statsforecast.git
cd statsforecast
pip install -e .

🧬 How to use

import numpy as np
import pandas as pd
from IPython.display import display, Markdown

import matplotlib.pyplot as plt
from statsforecast import StatsForecast
from statsforecast.models import seasonal_naive, auto_arima
from statsforecast.utils import AirPassengers
horizon = 12
ap_train = AirPassengers[:-horizon]
ap_test = AirPassengers[-horizon:]
series_train = pd.DataFrame(
    {
        'ds': pd.date_range(start='1949-01-01', periods=ap_train.size, freq='M'),
        'y': ap_train
    },
    index=pd.Index([0] * ap_train.size, name='unique_id')
)
fcst = StatsForecast(
    series_train, 
    models=[(auto_arima, 12), (seasonal_naive, 12)], 
    freq='M', 
    n_jobs=1
)
forecasts = fcst.forecast(12, level=(80, 95))
forecasts['y_test'] = ap_test
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
df_plot = pd.concat([series_train, forecasts]).set_index('ds')
df_plot[['y', 'y_test', 'auto_arima_season_length-12_mean', 'seasonal_naive_season_length-12']].plot(ax=ax, linewidth=2)
ax.fill_between(df_plot.index, 
                df_plot['auto_arima_season_length-12_lo-80'], 
                df_plot['auto_arima_season_length-12_hi-80'],
                alpha=.35,
                color='green',
                label='auto_arima_level_80')
ax.fill_between(df_plot.index, 
                df_plot['auto_arima_season_length-12_lo-95'], 
                df_plot['auto_arima_season_length-12_hi-95'],
                alpha=.2,
                color='green',
                label='auto_arima_level_95')
ax.set_title('AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
    label.set_fontsize(20)

png

Adding external regressors

series_train['trend'] = np.arange(1, ap_train.size + 1)
series_train['intercept'] = np.ones(ap_train.size)
series_train['month'] = series_train['ds'].dt.month
series_train = pd.get_dummies(series_train, columns=['month'], drop_first=True)
display_df(series_train.head())
unique_id ds y trend intercept month_2 month_3 month_4 month_5 month_6 month_7 month_8 month_9 month_10 month_11 month_12
0 1949-01-31 00:00:00 112 1 1 0 0 0 0 0 0 0 0 0 0 0
0 1949-02-28 00:00:00 118 2 1 1 0 0 0 0 0 0 0 0 0 0
0 1949-03-31 00:00:00 132 3 1 0 1 0 0 0 0 0 0 0 0 0
0 1949-04-30 00:00:00 129 4 1 0 0 1 0 0 0 0 0 0 0 0
0 1949-05-31 00:00:00 121 5 1 0 0 0 1 0 0 0 0 0 0 0
xreg_test = pd.DataFrame(
    {
        'ds': pd.date_range(start='1960-01-01', periods=ap_test.size, freq='M')
    },
    index=pd.Index([0] * ap_test.size, name='unique_id')
)
xreg_test['trend'] = np.arange(133, ap_test.size + 133)
xreg_test['intercept'] = np.ones(ap_test.size)
xreg_test['month'] = xreg_test['ds'].dt.month
xreg_test = pd.get_dummies(xreg_test, columns=['month'], drop_first=True)
fcst = StatsForecast(
    series_train, 
    models=[(auto_arima, 12), (seasonal_naive, 12)], 
    freq='M', 
    n_jobs=1
)
forecasts = fcst.forecast(12, xreg=xreg_test, level=(80, 95))
forecasts['y_test'] = ap_test

🔨 How to contribute

See CONTRIBUTING.md.

📃 References

  • The auto_arima model is based (translated) from the R implementation included in the forecast package developed by Rob Hyndman.

Contributors ✨

Thanks goes to these wonderful people (emoji key):


fede

💻

José Morales

💻 🚧

Sugato Ray

💻

Jeff Tackes

🐛

darinkist

🤔

Alec Helyar

💬

Dave Hirschfeld

💬

mergenthaler

💻

Kin

💻

Yasslight90

🤔

asinig

🤔

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statsforecast-0.5.4.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statsforecast-0.5.4-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file statsforecast-0.5.4.tar.gz.

File metadata

  • Download URL: statsforecast-0.5.4.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for statsforecast-0.5.4.tar.gz
Algorithm Hash digest
SHA256 3de7831e72439b7f04197e1fccd5fe2a692ff7082f5d887047152fa9cb58833c
MD5 96f3242b8e6df921116d620714153244
BLAKE2b-256 f0cdf590ce0f3a5b298d0c9d0196abc914d3b9ce42cd7722e93cf1088da414e6

See more details on using hashes here.

File details

Details for the file statsforecast-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: statsforecast-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for statsforecast-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6d2d7b8402f8d1ec0487bc725377f9c127fc274eebc9ef5706a82f0220b8d799
MD5 2d453f85728ddb6d8de65a753507035e
BLAKE2b-256 8bf2dc16e14e69d826ed41e6441b03e26a5f751c043d64e7d744943943f7e98b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page