Skip to main content

The practitioner's time series forecasting library

Reason this release was yanked:

Prints when it shouldn't

Project description

🌄 Scalecast: The practitioner's time series forecasting library

About

Scalecast is a light-weight modeling procedure and wrapper meant for those who are looking for the fastest way possible to apply, tune, and validate many different model classes for forecasting applications. In the Data Science industry, it is often asked of practitioners to deliver predictions and ranges of predictions for several lines of businesses or data slices, 100s or even 1000s. In such situations, it is common to see a simple linear regression or some other quick procedure applied to all lines due to the complexity of the task. This works well enough for people who need to deliver something, but more can be achieved.

The scalecast package was designed to address this situation and offer advanced machine learning models that can be applied, optimized, and validated quickly. Unlike many libraries, the predictions produced by scalecast are always dynamic by default, not averages of one-step forecasts, so you don't run into the situation where the estimator looks great on the test-set but can't generalize to real data. What you see is what you get, with no attempt to oversell results. If you download a library that looks like it's able to predict the COVID pandemic in your test-set, you probably have a one-step forecast happening under-the-hood. You can't predict the unpredictable, and you won't see such things with scalecast.

The library provides the Forecaster (for one series) and MVForecaster (for multiple series) wrappers around the following estimators:

A simple scalecast process to load data, add regressors, and create validated forecasts looks like this:

import pandas as pd
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from scalecast import GridGenerator
from scalecast.Forecaster import Forecaster

# choose models:
models = (
    'mlr',
    'knn',
    'svr',
    'xgboost',
    'elasticnet',
    'mlp',
    'prophet',
)
# read data:
df = pdr.get_data_fred('HOUSTNSA',start='2009-01-01',end='2021-06-01')

# import validation grids to tune models:
GridGenerator.get_example_grids() # saves Grids.py with validation grids for each model that can be used to tune the forecasts

# create the Forecaster object
f = Forecaster(y=df.HOUSTNSA,current_dates=df.index) # to initialize, specify y and current_dates (must be arrays of the same length)

# prepare forecasts:
f.set_test_length(12) # specify a test length for your models - do this before eda
f.generate_future_dates(24) # this will create future dates that are on the same interval as the current dates and it will also set the forecast length
f.add_ar_terms(4) # add lagged y terms before differencing
f.add_AR_terms((2,12)) # seasonal lagged terms
f.integrate() # automatically decides if the y term and all ar terms should be differenced to make the series stationary
f.add_seasonal_regressors('month',raw=False,sincos=True) # uses pandas attributes: raw=True creates integers (default), sincos=True creates fourier transformations, dummy=True also available
f.add_seasonal_regressors('year') # to capture the yearly trend
f.add_covid19_regressor() # dates are flexible, default is from when disney world closed to when US CDC lifted mask recommendations, the dates most likely to have affected the economy
f.add_time_trend()
f.set_validation_length(6) # length, different than test_length, to tune the hyperparameters 

# call forecasts (tunes hyperparameters, tests, forecasts into future)
f.tune_test_forecast(models)

# plot results
f.plot_test_set(models='top_3',order_by='LevelTestSetMAPE',ci=True) # plots the differenced test set with confidence intervals
plt.show()
f.plot(order_by='LevelTestSetMAPE',level=True) # plots the level forecast
plt.show()

Installation

  • Base package only: pip install scalecast
  • Prophet installs separately due to how big it is: pip install fbprophet
  • Greykite also installs separately: pip install greykite
  • Notebook functions require:
    • pip install tqdm
    • pip install ipython
    • pip install ipywidgets
    • jupyter nbextension enable --py widgetsnbextension
    • jupyter labextension install @jupyter-widgets/jupyterlab-manager (if using Lab)

Links

Links
📚 Read the Docs Official scalecast docs
📋 Examples Official scalecast notebooks
📓 Binder Notebook Play with an example in your browser
🛠️ Change Log See what's changed

What can scalecast do?

Dynamic Univariate Forecasting ✔️
Dynamic Univariate Forecasting with Exogenous Regressors ✔️
Dynamic Multivariate Vector Forecasting ✔️
Dynamic Multivariate Vector Forecasting with Exogenous Regressors ✔️
Hyperparameter Tuning ✔️
Backcasting ✔️
Model Validation ✔️
Model Summary Generation ✔️
Future Period Forecasting ✔️
Plotting ✔️
Bootstrapped Confidence Intervals ✔️
Seasonality Capturing ✔️
Feature Importance ✔️
Feature Selection ✔️
Linear Models ✔️
Non-linear Models ✔️
Tree Models ✔️
Dense Neural Networks ✔️
Recurrent Neural Networks ✔️
ARIMA ✔️
Exponential Smoothing ✔️
Facebook Prophet ✔️
LinkedIn Silverkite ✔️
Ensemble Modeling ✔️
Any Scikit-learn Regressor or API ✔️
Differencing ✔️
Undifferencing ✔️
Level Results ✔️

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SCALECAST-0.9.5.tar.gz (793.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page