Skip to main content

A library that unifies the API for most commonly used libraries and modelling techniques for time-series forecasting in the Python ecosystem.

Project description

CI (conda) CD (PyPI) Documentation Status PyPI Version Conda Version Code Coverage License Contributors Code style: black

HCrystal Ball


A time series library that unifies the API for most commonly
used libraries and modelling techniques for time-series
forecasting in the Python ecosystem.



HCrystal Ball consists of two main parts:

  • Wrappers - which bring different 3rd party libraries to time series compatible sklearn API
  • Model Selection - to enable gridsearch over wrappers, general or custom made transformers and add convenient layer over whole process (access to results, plots, storage, ...)

Documentation

See examples, tutorials, contribution, API and more on the documentation site or browse example notebooks directly.

Core Installation

If you want really minimal installation, you can install from pip or from conda-forge

pip install hcrystalball
conda install -c conda-forge hcrystalball

Typical Installation

Very often you will want to use more wrappers, than just Sklearn, run examples in jupyterlab, or execute model selection in parallel. Getting such dependencies to play together nicely might be cumbersome, so checking envrionment.yml might give you faster start.

# get dependencies file, e.g. using curl
curl -O https://raw.githubusercontent.com/heidelbergcement/hcrystalball/blob/master/environment.yml
# check comments in environment.yml, keep or remove as requested, than create environment using
conda env create -f environment.yml
# activate the environment
conda activate hcrystalball
# if you want to see progress bar in jupyterlab, execute also
jupyter labextension install @jupyter-widgets/jupyterlab-manager
# install the library from pip
pip install hcrystalball
# or from conda
conda install -c conda-forge hcrystalball

Development Installation:

To have everything in place including docs build or executing tests, execute following code

git clone https://github.com/heidelbergcement/hcrystalball
cd hcrystalball
conda env create -f environment.yml
conda activate hcrystalball
# ensures interactive progress bar will work in example notebooks
jupyter labextension install @jupyter-widgets/jupyterlab-manager
python setup.py develop

Example Usage

Wrappers

from hcrystalball.utils import generate_tsdata
from hcrystalball.wrappers import ProphetWrapper

X, y = generate_tsdata(n_dates=365*2)
X_train, y_train, X_test, y_test = X[:-10], y[:-10], X[-10:], y[-10:]

model = ProphetWrapper()
y_pred = model.fit(X_train, y_train).predict(X_test)
y_pred
            prophet
2018-12-22  6.066999
2018-12-23  6.050076
2018-12-24  6.105620
2018-12-25  6.141953
2018-12-26  6.150229
2018-12-27  6.163615
2018-12-28  6.147420
2018-12-29  6.048633
2018-12-30  6.031711
2018-12-31  6.087255

Model Selection

from hcrystalball.utils import generate_multiple_tsdata
from hcrystalball.model_selection import ModelSelector

df = generate_multiple_tsdata(n_dates=200,
                              n_regions=1,
                              n_plants=1,
                              n_products=2,
                              )

ms = ModelSelector(horizon=10,
                   frequency="D",
                   country_code_column="Country",
                   )

ms.create_gridsearch(n_splits=2,
                     sklearn_models=True,
                     prophet_models=False,
                     exog_cols=["Raining"],
                     )

ms.select_model(df=df,
                target_col_name="Quantity",
                partition_columns=["Region", "Plant", "Product"],
                )

# Model Selector is updated with results
ms

ModelSelector
-------------
  frequency: D
  horizon: 10
  country_code_column: Country
  results: List of 2 ModelSelectorResults
  paritions: List of 2 partitions
     {'Plant': 'plant_0', 'Product': 'product_0', 'Region': 'region_0'}
     {'Plant': 'plant_0', 'Product': 'product_1', 'Region': 'region_0'}
-------------

# Accessing result for 1 partition showcases rich representation
ms.results[0]

ModelSelectorResult
-------------------
  best_model_name: sklearn
  frequency: D
  horizon: 10

  country_code_column: None

  partition: {'Plant': 'plant_0', 'Product': 'product_0', 'Region': 'region_0'}
  partition_hash: 094a99e51ce41bad546788ddb8380ac1

  df_plot: DataFrame of shape (200, 6) suited for plotting cv results with .plot()
  X_train: DataFrame of shape (200, 2) with training feature values
  y_train: DataFrame of shape (200,) with training target values
  cv_results: DataFrame of shape (18, 16) with gridsearch cv info
  best_model_cv_results: Series with gridsearch cv info
  cv_data: DataFrame of shape (20, 20) with models predictions, split and true target values
  best_model_cv_data: DataFrame of shape (20, 3) with model predictions, split and true target values

  model_reprs: Dict of model_hash and model_reprs
  best_model_hash: cbc68abad45e02bec6b2de157bc8c396
  best_model: Pipeline(memory=None,
         steps=[('exog_passthrough',
                 TSColumnTransformer(n_jobs=None, remainder='drop',
                                     sparse_threshold=0.3,
                                     transformer_weights=None,
                                     transformers=[('raw_cols', 'passthrough',
                                                    ['Raining'])],
                                     verbose=False)),
                ('holiday', 'passthrough'),
                ('model',
                 Pipeline(memory=None,
                          steps=[('seasonality',
                                  SeasonalityTransformer(auto=True, freq='D',
                                                         monthly=None,
                                                         quar...
                                  SklearnWrapper(alpha=1.0,
                                                 clip_predictions_lower=None,
                                                 clip_predictions_upper=None,
                                                 copy_X=True,
                                                 fit_intercept=True,
                                                 fit_params=None, l1_ratio=0.5,
                                                 lags=14, max_iter=1000,
                                                 name='sklearn',
                                                 normalize=False,
                                                 optimize_for_horizon=False,
                                                 positive=False,
                                                 precompute=False,
                                                 random_state=None,
                                                 selection='cyclic', tol=0.0001,
                                                 warm_start=False))],
                          verbose=False))],
         verbose=False)
-------------------

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hcrystalball-0.1.4.tar.gz (8.2 MB view hashes)

Uploaded Source

Built Distribution

hcrystalball-0.1.4-py2.py3-none-any.whl (784.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page