Scalable machine learning forecasting framework with Pyspark
Project description
ForecastFlowML: Scalable Machine Learning Forecasting with PySpark
ForecastFlowML is a scalable machine learning forecasting framework that enables parallel training (by distributing models rather than data) of scikit-learn like models based on PySpark.
With ForecastFlowML, you can build scikit-learn like regressors as direct multi-step forecasters, and train a seperate model for each group in your dataset. Our package leverages the power of PySpark to efficiently handle large datasets and enables distributed computing for faster model training.
Features
ForecastFlowML provides a range of features that make it a powerful and flexible tool for time-series forecasting, including:
- Works with Pandas and Pyspark DataFrames.
- Distributed model training per group in the dataframe.
- Direct multi-step forecasting.
- Built-in time based cross-validation.
- Extensive time-series feature engineering (lag, rolling mean/std, stockout, history length).
- Hyperparameter tuning for each group model with grid search.
- Supports
scikit-learn
like libraries such asLightGBM
orXGBoost
.
Whether you're new to time-series forecasting or an experienced data scientist, ForecastFlowML can help you build and deploy accurate forecasting models at scale.
Documentation
Reach to our latest documentation here.
Get Started
User Guide
Benchmarks
Kaggle Walmart M5 Forecasting Competition
- Ranks as 18th solution in late submission with minimal effort.
Installation
ForecastFlowML installation
You can install the package using the following command:
pip install forecastflowml
Check Java
Make sure you have installed Java 11. You can check whether you have Java or not with the following command:
java -version
Set PYSPARK_PYTHON
In the python script, set PYSPARK_PYTHON environment variable to your Python executable path before creating the spark instance:
import sys
import os
from pyspark.sql import SparkSession
os.environ["PYSPARK_PYTHON"] = sys.executable
spark = SparkSession.builder.master("local[*]").getOrCreate()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file forecastflowml-0.0.2.tar.gz
.
File metadata
- Download URL: forecastflowml-0.0.2.tar.gz
- Upload date:
- Size: 72.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81b66a8b86a3d17b1dde9150189254d023153f48aaed9e2f225e7a5962590af0 |
|
MD5 | 0a35f7ed2589d4749ffd994ef372c77c |
|
BLAKE2b-256 | 9d46a44fb43bbab8867af97b96256a65dddae7180136d089b8a7246c0419f08e |
File details
Details for the file forecastflowml-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: forecastflowml-0.0.2-py3-none-any.whl
- Upload date:
- Size: 67.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cabd2ec99b9947269392bcf58763a860fae070d78d8817fb726fe72dbe413875 |
|
MD5 | f9ee6b8fd8faf33e0019c8bde3216830 |
|
BLAKE2b-256 | 6c408fba5c0a82168d03ba92237ef37eb4a19db0f1b09b45357557acd789e31d |