python-fold

Time-series cross-validation on steroids. Multi-purpose. Fully compatible with scikit-learn.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

logo

Table of Contents

About The Project
Installation
Getting Started
Contributing
License

About The Project

Introduction

fold is a powerful and flexible time-series cross-validation library designed to work seamlessly with scikit-learn. Whether you're working on financial forecasting, weather prediction, or any other domain that involves time-series data, fold offers a suite of advanced cross-validation techniques that go beyond traditional methods. By integrating fold into your workflow, you can ensure that your models are robust, reliable, and free from common pitfalls like look-ahead bias and data leakage.

Key Features:

Versatile Splits: Support for expanding, rolling, interval, and custom function splits.
Scikit-learn Compatibility: Easily integrate with scikit-learn’s model selection framework.
Advanced Cross-Validation Techniques: Includes purged k-fold, purged walk-forward, and temporal safe random splits.

Motivation

Traditional cross-validation methods often fall short when applied to time-series data due to the inherent temporal dependencies. Naive random splits can introduce look-ahead bias, where future data points inadvertently influence past predictions, leading to overly optimistic model performance. This is where fold steps in, providing a robust framework to handle the unique challenges of time-series data.

Why We Built Fold?

Accuracy and Reliability: Ensuring that models are tested in a way that closely mimics real-world scenarios is crucial for accurate predictions. fold's advanced splitting techniques prevent data leakage and provide a more realistic evaluation of model performance.
Flexibility: Different time-series problems require different cross-validation strategies. fold offers a variety of splitting methods, from simple expanding and rolling splits to more complex strategies like purged k-fold and custom function splits, allowing users to tailor their validation process to their specific needs.
Ease of Use: Designed to be fully compatible with scikit-learn, fold integrates seamlessly into existing workflows, making it easy for users to switch to more advanced time-series cross-validation methods without a steep learning curve.

Algorithm	Hash digest
SHA256	`a9c563b9fce7e82bf6a36a384b1d42be17aab67ac2b329654f2c9f5aae32ee2d`
MD5	`1c6c1a0381c8f00c855912f196576518`
BLAKE2b-256	`9ab982523b0ef205059ae536c64407df0bbe6eedd6a166d38a70aca8440e3c88`

Algorithm	Hash digest
SHA256	`aeea2269bf2b8a9dd403d697e89806bcabf2f3918c5dc96caa841284dc91e2a7`
MD5	`72f3f8a6e22b332700cc318e8f0ff6fb`
BLAKE2b-256	`8a677e39d6610bd2ada611ae2068f508607444b894887d0bf438d1d34ca55940`

python-fold 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

About The Project

Introduction

Key Features:

Motivation

Why We Built Fold?

Built With

Installation

Getting Started

Scikit-Learn Integration

Example Model

Computing Cross-Validated Metrics

Cross Validation Iterators

Scikit-Learn models

Bespoke Models

Expanding Number Split

Expanding Split

Rolling Number Split

Rolling Split

Rolling Optimized Split

Interval Split

Calendar Split

Period Split

Grouper Split

Custom Function Split

Random Split

Purged KFold

Purged Walk-Forward

Train Test Split

Create a 2-d dataset

Select model

Get a pandas DataFrame object

Last valid split number

Train and test-set

Apply transform

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Get a pandas `DataFrame` object