Skip to main content

Time series cross-validation

Project description

TSCV: Time Series Cross-Validation

This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the training set and the test set, which mitigates the temporal dependence of time series and prevents information leakage.

Installation

pip install tscv

Usage

This extension defines 3 cross-validator classes and 1 function:

  • GapLeavePOut
  • GapKFold
  • GapRollForward
  • gap_train_test_split

The three classes can all be passed, as the cv argument, to scikit-learn functions such as cross-validate, cross_val_score, and cross_val_predict, just like the native cross-validator classes.

The one function is an alternative to the train_test_split function in scikit-learn.

Examples

The following example uses GapKFold instead of KFold as the cross-validator.

import numpy as np
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import cross_val_score
from tscv import GapKFold

iris = datasets.load_iris()
clf = svm.SVC(kernel='linear', C=1)

# use GapKFold as the cross-validator
cv = GapKFold(n_splits=5, gap_before=5, gap_after=5)
scores = cross_val_score(clf, iris.data, iris.target, cv=cv)

The following example uses gap_train_test_split to split the data set into the training set and the test set.

import numpy as np
from tscv import gap_train_test_split

X, y = np.arange(20).reshape((10, 2)), np.arange(10)
X_train, X_test, y_train, y_test = gap_train_test_split(X, y, test_size=2, gap_size=2)

Contributing

  • Report bugs in the issue tracker
  • Express your use cases in the issue tracker

Support

Acknowledgments

  • I would like to thank Jeffrey Racine and Christoph Bergmeir for the helpful discussion.

License

BSD-3-Clause

Citation

@article{zheng2019hv,
  title={$ hv $-Block Cross Validation is not a BIBD: a Note on the Paper by Jeff Racine (2000)},
  author={Zheng, Wenjie},
  journal={arXiv preprint arXiv:1910.08904},
  year={2019}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tscv-0.1.0.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

tscv-0.1.0-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file tscv-0.1.0.tar.gz.

File metadata

  • Download URL: tscv-0.1.0.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for tscv-0.1.0.tar.gz
Algorithm Hash digest
SHA256 91e4b2e8972d2b1552908cefc60f9c358128433538a519beab92ff3a5b7eaa9a
MD5 80004a9be9892bdd8b5d41f74df76c0c
BLAKE2b-256 ef6fafaea3e954077420dcbc13ee7ac192d6072e7209820e1004d9b017d9c50f

See more details on using hashes here.

File details

Details for the file tscv-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tscv-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for tscv-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d837c3380e464f6b7f7fcc9884d5d1e6c0d76796b6ac46320b44d35b8dd34bc7
MD5 03901f39b609a5ac8cb5a716b84d6a54
BLAKE2b-256 dc29d7f642306d56a942498a08956071f88a0daa2ca8ca3b7911dec3dbb8adb1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page