Skip to main content

Scalable machine learning based time series forecasting

Project description

mlforecast

Scalable machine learning based time series forecasting.

Install

pip install mlforecast

Optional dependencies

If you want more functionality you can instead use pip install mlforecast[extra1, extra2, ...]. The current extra dependencies are:

  • aws: adds the functionality to use S3 as the storage in the CLI.
  • cli: includes the validations necessary to use the CLI.
  • distributed: installs dask to perform distributed training. Note that you'll also need to install either lightgbm or xgboost.

For example, if you want to perform distributed training through the CLI using S3 as your storage you'll need all three extras, which you can get using: pip install mlforecast[aws, cli, distributed].

How to use

Programmatic API

Store your time series in a pandas dataframe with an index named unique_id that is the identifier of each serie, a column ds that contains the datestamps and a column y with the values.

from mlforecast.utils import generate_daily_series

series = generate_daily_series(20)
display_df(series.head())
unique_id ds y
id_00 2000-01-01 00:00:00 0.264447
id_00 2000-01-02 00:00:00 1.28402
id_00 2000-01-03 00:00:00 2.4628
id_00 2000-01-04 00:00:00 3.03552
id_00 2000-01-05 00:00:00 4.04356

Then you define your flow configuration. These include lags, transformations on the lags and date features. The transformations are defined as numba jitted functions that transform an array. If they have additional arguments you supply a tuple (transform_func, arg1, arg2, ...)

from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean

flow_config = dict(
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [(rolling_mean, 7), (rolling_mean, 14)]
    },
    date_features=['dayofweek', 'month']
)

Next define a model, if you're on a single machine this can be any regressor that follows the scikit-learn API. For distributed training there are LGBMForecast and XGBForecast.

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()

Now instantiate your forecast object with the model and the flow configuration. There are two types of forecasters, Forecast and DistributedForecast. Since this is a single machine example we'll use the first.

from mlforecast.forecast import Forecast

fcst = Forecast(model, flow_config)

To compute the transformations and train the model on the data you call .fit on your Forecast object.

fcst.fit(series)
Forecast(model=RandomForestRegressor(), flow_config={'lags': [7, 14], 'lag_transforms': {1: [CPUDispatcher(<function expanding_mean at 0x7fac9f73f280>)], 7: [(CPUDispatcher(<function rolling_mean at 0x7fac9f7a8f70>), 7), (CPUDispatcher(<function rolling_mean at 0x7fac9f7a8f70>), 14)]}, 'date_features': ['dayofweek', 'month']})

To get the forecasts for the next 14 days you just call .predict(14) on the forecaster.

predictions = fcst.predict(14)

display_df(predictions.head())
unique_id ds y_pred
id_00 2000-08-10 00:00:00 5.2325
id_00 2000-08-11 00:00:00 6.26395
id_00 2000-08-12 00:00:00 0.196386
id_00 2000-08-13 00:00:00 1.25263
id_00 2000-08-14 00:00:00 2.2988

CLI

If you're looking for computing quick baselines, want to avoid some boilerplate or just like using CLIs better then you can use the mlforecast binary with a configuration file like the following:

!cat sample_configs/local.yaml
data:
  prefix: data
  input: train
  output: outputs
  format: parquet
features:
  freq: D
  lags: [7, 14]
  lag_transforms:
    1: 
    - expanding_mean
    7: 
    - rolling_mean:
        window_size: 7
    - rolling_min:
        window_size: 7
  date_features: ["dayofweek", "month", "year"]
  num_threads: 2
backtest:
  n_windows: 2
  window_size: 7
forecast:
  horizon: 7
local:
  model:
    name: sklearn.ensemble.RandomForestRegressor
    params:
      n_estimators: 10
      max_depth: 7

This will use the data in prefix/input and write the results to prefix/output.

!mlforecast sample_configs/local.yaml
Split 1 MSE: 0.0240
Split 2 MSE: 0.0187

!ls data/outputs/
forecast.parquet  valid_0.parquet  valid_1.parquet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlforecast-0.0.4.1.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

mlforecast-0.0.4.1-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file mlforecast-0.0.4.1.tar.gz.

File metadata

  • Download URL: mlforecast-0.0.4.1.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for mlforecast-0.0.4.1.tar.gz
Algorithm Hash digest
SHA256 24d06b1b6b255006893e6ead0e2826d33b9eb1351fd1d07c792631ad9a749f08
MD5 bddcc77ff531ee175e0705bbe7480bbb
BLAKE2b-256 ff7539847f9f8a8339434ab0449a3b20bb330accb8a2e4adcc19b7e0635ca448

See more details on using hashes here.

File details

Details for the file mlforecast-0.0.4.1-py3-none-any.whl.

File metadata

  • Download URL: mlforecast-0.0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for mlforecast-0.0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1098646f1641bb914a0ca991c7dd7bddc94687784fc565b2b65d3bcaa3cd74d8
MD5 30c45575e214ee5c94ac78a42c2f41ff
BLAKE2b-256 8f500b2ba9b90696b441e9b79062bbcbe4e1cbf8863bb71f390ff655b7d59e30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page