Timeseries modelling on a rolling/expanding window basis.
Project description
FOLD
(/fold/)
A Time Series Continuous Validation library that lets you build, deploy and update Composite Models easily. An order of magnitude speed-up, combined with flexibility and rigour.
Explore the docs »
- Composite Models with Continuous Validation - What does that mean?
- Distributed computing - Why is this important?
- Update deployed models (coming in May) - Why is this important?
Installation
-
Prerequisites:
python >= 3.7
andpip
-
Install from git directly:
pip install https://github.com/dream-faster/fold/archive/main.zip
Quickstart
You can quickly train your chosen models and get predictions by running:
from sklearn.ensemble import RandomForestRegressor
from statsforecast.models import ARIMA
from fold import ExpandingWindowSplitter, train_evaluate
from fold.composites import Ensemble
from fold.transformations import OnlyPredictions
from fold.utils.dataset import get_preprocessed_dataset
X, y = get_preprocessed_dataset(
"weather/historical_hourly_la", target_col="temperature", shorten=1000
)
pipeline = [
Ensemble(
[
RandomForestRegressor(),
ARIMA(order=(1, 1, 0)),
]
),
OnlyPredictions(),
]
splitter = ExpandingWindowSplitter(initial_train_window=0.2, step=0.2)
scorecard, prediction, trained_pipelines = train_evaluate(pipeline, X, y, splitter)
Thinking of using fold
? We'd love to hear about your use case and help, please book a free 30-min call with us!
(If you install krisi
by running pip install krisi
you get an extended report back, rather than a single metric.)
Fold is different
-
Time Series Continuous Validation at lightning speed.
→ fold allows to simulate and evaluate your models like they would have performed, in reality/when deployed, with clever use of paralellization and design. -
Create composite models: ensembles, hybrids, stacking pipelines, easily.
→ Underutilized, but the easiest, fastest way to increase performance of your Time Series models. -
Built with Distributed Computing in mind.
→ Deploy your research and development pipelines to a cluster withray
, and usemodin
to handle out-of-memory datasets (full support for modin is coming in April). -
Bridging the gap between Online and Mini-Batch learning.
→ Mix and matchxgboost
with ARIMA, in a single pipeline. Boost your model's accuracy by updating them on every timestamp, if desired. -
Update your deployed models, easily, as new data flows in.
→ Real world is not static. Let your models adapt, without the need to re-train from scratch.
Examples, Walkthroughs and Blog Posts
Name | Type | Dataset Type | Docs Link | Colab |
---|---|---|---|---|
⚡️ Core Walkthrough | Walkthrough | Energy | Notebook | Colab |
🚄 Speed Comparison of Fold to other libraries | Walkthrough | Weather | Notebook | Colab |
📚 Example Collection | Example | Weather & Synthetic | Collection Link | - |
🖋️ Back to the Future with Time Series Forecasting | Blog | Public Release Blog Post | Blog post on Applied Exploration | - |
Core Features
- Supports both Regression and Classification tasks.
- Online and Mini-batch learning.
- Feature selection and other transformations on an expanding/rolling window basis
- Use any scikit-learn/tabular model natively!
- Use any univariate or sequence models (wrappers provided in fold-wrappers).
- Use any Deep Learning Time Series models (wrappers provided in fold-wrappers).
- Super easy syntax!
- Probabilistic foreacasts (currently, for Classification, full support coming in April).
- Hyperparemeter optimization / Model selection. (coming in early April!)
What is Continuous Validation?
It's Time Series Cross-Validation, plus: Inside a test window, and during deployment, fold provides a way for models to update their parameters or access the last value. Learn more
Our Open-core Time Series Toolkit
If you want to try them out, we'd love to hear about your use case and help, please book a free 30-min call with us!
Explore our Commercial License options here
Contribution
Submit an issue or reach out to us on info at dream-faster.ai for any inquiries.
Licence & Usage
We want to bring much-needed transparency, speed and rigour to the process of creating Time Series ML pipelines, while also building a sustainable business, that can support the ecosystem in the long-term.
Fold's licence is inbetween source-available and a traditional commercial software licence. It requires a paid licence for any commercial use, after the initial, 30 day trial period. Deployment is only possible with the additional purchase of fold-extended
, a product currently in development.
We also want to contribute to open research by giving free access to non-commercial use of fold
. If you are a researcher and would like to get access to Dream Faster's all of available tools, please contact us here.
Limitations
- No intermittent time series support, very limited support for missing values.
- No hierarchical time series support.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fold_core-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c5a9683e911cb882070c488c79db4ad75ae952b946220930137bc120a404cea |
|
MD5 | 9096eed74cad034ed65e928811b6e949 |
|
BLAKE2b-256 | fce2b238190e4874d0bc0d6a4712412e31131d9b4ac20db11fc7c0e70add8ea5 |