Skip to main content

Time based cross validation

Project description

Code style: black

Time based cross validation

timebasedcv is a Python codebase that provides a cross validation strategy based on time.


Documentation: https://fbruzzesi.github.io/timebasedcv

Source Code: https://github.com/fbruzzesi/timebasedcv


Alpha Notice

This codebase is experimental and is working for my use cases. It is very probable that there are cases not covered and for which it breaks (badly). If you find them, please feel free to open an issue in the issue page of the repo.

Description

The current implementation of scikit-learn TimeSeriesSplit lacks the flexibility of having multiple samples within the same time period/unit.

This codebase addresses such problem by providing a cross validation strategy based on a time period rather than the number of samples. This is useful when the data is time dependent, and the model should be trained on past data and tested on future data, independently from the number of observations present within a given time period.

We introduce two main classes:

  • TimeBasedSplit: a class that allows to define a time based split with a given frequency, train size, test size, gap, stride and window type. It's core method split requires to pass a time series as input to create the boolean masks for train and test from the instance information defined above. Therefore it is not compatible with scikit-learn CV Splitters.
  • TimeBasedCVSplitter: a class that conforms with scikit-learn CV Splitters but requires to pass the time series as input to the instance. That is because a CV Splitter needs to know a priori the number of splits and the split method shouldn't take any extra arguments as input other than the arrays to split.

Installation

timebasedcv is not published as a Python package on pypi, therefore it cannot be installed with pip directly.

However it is possible to install it from source using pip and git, or with a local clone:

source/git

python -m pip install git+https://github.com/FBruzzesi/timebasedcv.git

local clone

git clone https://github.com/FBruzzesi/timebasedcv.git
cd timebasedcv
python -m pip install .

Getting started

Please refer to the Getting Started section of the documentation site for a detailed guide on how to use the library.

Contributing

Please read the Contributing guidelines in the documentation site.

License

The project has a MIT Licence

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timebasedcv-0.0.1.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

timebasedcv-0.0.1-py2.py3-none-any.whl (14.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file timebasedcv-0.0.1.tar.gz.

File metadata

  • Download URL: timebasedcv-0.0.1.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.6.0 keyring/23.13.1 rfc3986/1.5.0 colorama/0.4.6 CPython/3.10.10

File hashes

Hashes for timebasedcv-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b796a4ade7cf274a364c0be88fb97d1df6087e7c376478e724f50013fb03fd62
MD5 bd18bf494c8f3b90d861606865c91c8d
BLAKE2b-256 52ac974a665bf927acc2959c310e1c6a618263bc61dd04c6b72550dd665e4475

See more details on using hashes here.

File details

Details for the file timebasedcv-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: timebasedcv-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.6.0 keyring/23.13.1 rfc3986/1.5.0 colorama/0.4.6 CPython/3.10.10

File hashes

Hashes for timebasedcv-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a95dd22b64f5114b9b523c12b306f50c01ed8dbf5c320d00a0f780c79c95e04c
MD5 f438f1bef5b4cde3354ca7f85bcd2471
BLAKE2b-256 d85e3dc492f7d3791a84168a99d4b14bd601eab144c0385e7baa909417722491

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page