Skip to main content

Scalable machine learning forecasting framework with Pyspark

Project description

ForecastFlowML: Scalable Machine Learning Forecasting with PySpark

Python Versions Tests codecov Documentation Status

ForecastFlowML is a scalable machine learning forecasting framework that enables parallel training (by distributing models rather than data) of scikit-learn like models based on PySpark.

With ForecastFlowML, you can build scikit-learn like regressors as direct multi-step forecasters, and train a seperate model for each group in your dataset. Our package leverages the power of PySpark to efficiently handle large datasets and enables distributed computing for faster model training.

Features

ForecastFlowML provides a range of features that make it a powerful and flexible tool for time-series forecasting, including:

  • Works with Pandas and Pyspark DataFrames.
  • Distributed model training per group in the dataframe.
  • Direct multi-step forecasting.
  • Built-in time based cross-validation.
  • Extensive time-series feature engineering (lag, rolling mean/std, stockout, history length).
  • Hyperparameter tuning for each group model with grid search.
  • Supports scikit-learn like libraries such as LightGBM or XGBoost.

Whether you're new to time-series forecasting or an experienced data scientist, ForecastFlowML can help you build and deploy accurate forecasting models at scale.

Documentation

Reach to our latest documentation here.

Get Started

What is ForecastFlowML?

Quick Start

User Guide

Feature Engineering

Time Series Cross Validation

Grid Search

Feature Importance

Save/Load ForecastFlowML

Benchmarks

Kaggle Walmart M5 Forecasting Competition

  • Ranks as 18th solution in late submission with minimal effort.

Installation

ForecastFlowML installation

You can install the package using the following command:

pip install forecastflowml

Check Java

Make sure you have installed Java 11. You can check whether you have Java or not with the following command:

java -version

Set PYSPARK_PYTHON

In the python script, set PYSPARK_PYTHON environment variable to your Python executable path before creating the spark instance:

import sys
import os
from pyspark.sql import SparkSession
os.environ["PYSPARK_PYTHON"] = sys.executable
spark = SparkSession.builder.master("local[*]").getOrCreate()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forecastflowml-0.0.2.tar.gz (72.6 kB view details)

Uploaded Source

Built Distribution

forecastflowml-0.0.2-py3-none-any.whl (67.2 kB view details)

Uploaded Python 3

File details

Details for the file forecastflowml-0.0.2.tar.gz.

File metadata

  • Download URL: forecastflowml-0.0.2.tar.gz
  • Upload date:
  • Size: 72.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for forecastflowml-0.0.2.tar.gz
Algorithm Hash digest
SHA256 81b66a8b86a3d17b1dde9150189254d023153f48aaed9e2f225e7a5962590af0
MD5 0a35f7ed2589d4749ffd994ef372c77c
BLAKE2b-256 9d46a44fb43bbab8867af97b96256a65dddae7180136d089b8a7246c0419f08e

See more details on using hashes here.

File details

Details for the file forecastflowml-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for forecastflowml-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cabd2ec99b9947269392bcf58763a860fae070d78d8817fb726fe72dbe413875
MD5 f9ee6b8fd8faf33e0019c8bde3216830
BLAKE2b-256 6c408fba5c0a82168d03ba92237ef37eb4a19db0f1b09b45357557acd789e31d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page