The practitioner's time series forecasting library
Project description
🌄 Scalecast: The practitioner's time series forecasting library
About
Scalecast is a light-weight modeling procedure, wrapper, and results container meant for those who are looking for the fastest way possible to apply, tune, and validate many different model classes for forecasting applications. In the Data Science industry, it is often asked of practitioners to deliver predictions and ranges of predictions for several lines of businesses or data slices, 100s or even 1000s. In such situations, it is common to see a simple linear regression or some other quick procedure applied to all lines due to the complexity of the task. This works well enough for people who need to deliver something, but more can be achieved.
The scalecast package was designed to address this situation and offer advanced machine learning models and experiments that can be applied, optimized, and validated quickly. Unlike many libraries, the predictions produced by scalecast are always dynamic by default, not averages of one-step forecasts, so you don't run into the situation where the estimator looks great on the test-set but can't generalize to real data. What you see is what you get, with no attempt to oversell results. If you download a library that looks like it's able to predict the COVID pandemic in your test-set, you probably have a one-step forecast happening under-the-hood. You can't predict the unpredictable, and you won't see such things with scalecast.
from scalecast.Forecaster import Forecaster
from scalecast.Pipeline import Pipeline, Transformer, Reverter
from scalecast.auxmodels import mlp_stack
from scalecast import GridGenerator
import matplotlib.pyplot as plt
import pandas_datareader as pdr
models = (
'mlr',
'elasticnet',
'lightgbm',
'knn',
)
GridGenerator.get_example_grids()
df = pdr.get_data_fred(
'HOUSTNSA',
start='1959-01-01',
end='2022-08-01'
)
f = Forecaster(
y=df['HOUSTNSA'],
current_dates=df.index,
future_dates=24,
)
f.set_test_length(.2)
f.set_validation_length(24)
def forecaster(f,models):
""" add Xvars and forecast
"""
f.add_covid19_regressor()
f.auto_Xvar_select()
f.tune_test_forecast(
models,
dynamic_testing=24, # test-set metrics will be an average of rolling 24-step forecasts
cross_validate=True,
k = 3,
)
mlp_stack(f,models)
transformer = Transformer(
transformers = [
('DiffTransform',1),
('DiffTransform',12),
],
)
reverter = Reverter(
# list reverters in reverse order
reverters = [
('DiffRevert',12),
('DiffRevert',1),
],
base_transformer = transformer,
)
pipeline = Pipeline(
steps = [
('Transform',transformer),
('Forecast',forecaster),
('Revert',reverter),
],
)
f = pipeline.fit_predict(f,models=models)
f.reeval_cis() # expanding cis based on all model results
f.plot(ci=True,order_by='LevelTestSetMAPE')
plt.show()
results = f.export(
['model_summaries','lvl_fcsts']
)
The library provides the Forecaster
(for one series) and MVForecaster
(for multiple series) wrappers around the following estimators:
The Forecaster
object only can use:
The MVForecaster
object only can use:
Want more models? Open a feature request!
The library interfaces nicely with interactive notebook applications.
In addition, scalecast offers:
- Model Validation
- Probabilistic Forecasting
- Model input analysis
- Anomaly detection
- Changepoint detection
- Series transformation/revert functions
Installation
- Only the base package is needed to get started:
pip install --upgrade scalecast
- Optional add-ons:
pip install darts
pip install prophet
pip install greykite
pip install shap
(SHAP feature importance)
pip install kats
(for changepoint detection)
pip install pmdarima
(auto arima)
pip install tqdm
(progress bar with notebook)
pip install ipython
(widgets with notebook)
pip install ipywidgets
(widgets with notebook)
jupyter nbextension enable --py widgetsnbextension
(widgets with notebook)
jupyter labextension install @jupyter-widgets/jupyterlab-manager
(widgets with Lab)
Links
Official Docs
Forecasting with Different Model Types
- Sklearn Univariate
- Sklearn Multivariate
- RNN
- ARIMA
- Theta
- VECM
- Other Notebooks
The importance of dynamic validation
Model Input Selection
- Variable Reduction Techniques for Time Series
- Auto Model Specification with ML Techniques for Time Series
- Notebook 1
- Notebook 2
Scaled Forecasting on Many Series
Anomaly Detection
See Contributing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.