Time series forecasting suite using statistical models
Project description
Nixtla
Statistical ⚡️ Forecast
Lightning fast forecasting with statistical and econometric models
StatsForecast offers a collection of widely used univariate time series forecasting models, including exponential smoothing and automatic ARIMA
modeling optimized for high performance using numba
.
🔥 Features
- Fastest and most accurate
auto_arima
in Python and R (for the moment...). - Out of the box implementation of other classical models and benchmarks like
exponential smoothing
,croston
,sesonal naive
,random walk with drift
andtbs
. - 20x faster than
pmdarima
. - 1.5x faster than R.
- 500x faster than
Prophet
. - Compiled to high performance machine code through
numba
.
📖 Why?
Current Python alternatives for statistical models are slow and inaccurate. So we created a library that can be used to forecast in production environments or as benchmarks. StatsForecast
includes an extensive battery of models that can efficiently fit thousands of time series.
🔬 Accuracy
We compared accuracy and speed against: pmdarima, Rob Hyndman's forecast package and Facebook's Prophet. We used the Daily
, Hourly
and Weekly
data from the M4 competition.
The following table summarizes the results. As can be seen, our auto_arima
is the best model in accuracy (measured by the MASE
loss) and time, even compared with the original implementation in R.
dataset | metric | nixtla | pmdarima [1] | auto_arima_r | prophet |
---|---|---|---|---|---|
M4-Daily | MASE | 3.26 | 3.35 | 4.46 | 14.26 |
M4-Daily | time | 1.41 | 27.61 | 1.81 | 514.33 |
M4-Hourly | MASE | 0.92 | --- | 1.02 | 1.78 |
M4-Hourly | time | 12.92 | --- | 23.95 | 17.27 |
M4-Weekly | MASE | 2.34 | 2.47 | 2.58 | 7.29 |
M4-Weekly | time | 0.42 | 2.92 | 0.22 | 19.82 |
[1] The model auto_arima
from pmdarima
had problems with Hourly data. An issue was opened in their repo.
The following table summarizes the data details.
group | n_series | mean_length | std_length | min_length | max_length |
---|---|---|---|---|---|
Daily | 4,227 | 2,371 | 1,756 | 107 | 9,933 |
Hourly | 414 | 901 | 127 | 748 | 1,008 |
Weekly | 359 | 1,035 | 707 | 93 | 2,610 |
⏲ Computational efficiency
We measured the computational time against the number of time series. The following graph shows the results. As we can see, the fastest model is our auto_arima
.
Nixtla vs Prophet
You can reproduce the results here.
📖 Documentation
Here is a link to the documentation.
🧬 Getting Started
💻 Installation
PyPI
You can install the released version of StatsForecast
from the Python package index with:
pip install statsforecast
(Installing inside a python virtualenvironment or a conda environment is recommended.)
Conda
Also you can install the released version of StatsForecast
from conda with:
conda install -c nixtla statsforecast
(Installing inside a python virtualenvironment or a conda environment is recommended.)
Dev Mode
If you want to make some modifications to the code and see the effects in real time (without reinstalling), follow the steps below:git clone https://github.com/Nixtla/statsforecast.git
cd statsforecast
pip install -e .
🧬 How to use
import numpy as np
import pandas as pd
from IPython.display import display, Markdown
import matplotlib.pyplot as plt
from statsforecast import StatsForecast
from statsforecast.models import seasonal_naive, auto_arima
from statsforecast.utils import AirPassengers
horizon = 12
ap_train = AirPassengers[:-horizon]
ap_test = AirPassengers[-horizon:]
series_train = pd.DataFrame(
{
'ds': np.arange(1, ap_train.size + 1),
'y': ap_train
},
index=pd.Index([0] * ap_train.size, name='unique_id')
)
def display_df(df):
display(Markdown(df.to_markdown()))
fcst = StatsForecast(
series_train,
models=[(auto_arima, 12), (seasonal_naive, 12)],
freq='M',
n_jobs=1
)
forecasts = fcst.forecast(12)
display_df(forecasts)
unique_id | ds | auto_arima_season_length-12 | seasonal_naive_season_length-12 |
---|---|---|---|
0 | 133 | 424.16 | 360 |
0 | 134 | 407.082 | 342 |
0 | 135 | 470.861 | 406 |
0 | 136 | 460.914 | 396 |
0 | 137 | 484.901 | 420 |
0 | 138 | 536.904 | 472 |
0 | 139 | 612.903 | 548 |
0 | 140 | 623.903 | 559 |
0 | 141 | 527.903 | 463 |
0 | 142 | 471.903 | 407 |
0 | 143 | 426.903 | 362 |
0 | 144 | 469.903 | 405 |
forecasts['y_test'] = ap_test
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
pd.concat([series_train, forecasts]).set_index('ds').plot(ax=ax, linewidth=2)
ax.set_title('AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(20)
🔨 How to contribute
See CONTRIBUTING.md.
📃 References
- The
auto_arima
model is based (translated) from the R implementation included in the forecast package developed by Rob Hyndman.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for statsforecast-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37a8184453a1f9bf44af5c954ed99d9e972d26cac192ccf744a99e45453dc22b |
|
MD5 | ca342183cfe64b8708678f8930c6e033 |
|
BLAKE2b-256 | b7e1b744a33fadc2fa2d292eb1bd98e9787bb677231bfeb6a9580b3dd6f7aacd |