Skip to main content

A method for selecting samples by spreading the training data evenly.

Project description

Kennard Stone

python_badge license_badge Downloads

Anaconda-Server Badge

What is this?

This is an algorithm for evenly partitioning data in a scikit-learn-like interface. (See References for details of the algorithm.)

simulateion_gif

How to install

PyPI

pip install kennard-stone

Anaconda

conda install -c conda-forge kennard-stone

You need numpy, pandas and scikit-learn.

How to use

You can use them like scikit-learn.

See example for details.

In the following, X denotes an arbitrary explanatory variable and y an arbitrary objective variable. And, estimator indicates an arbitrary prediction model that conforms to scikit-learn.

train_test_split

kennard_stone

from kennard_stone import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

scikit-learn

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 334)

KFold

kennard_stone

from kennard_stone import KFold

kf = KFold(n_splits = 5)
for i_train, i_test in kf.split(X, y):
    X_train = X[i_train]
    y_train = y[i_train]
    X_test = X[i_test]
    y_test = y[i_test]

scikit-learn

from sklearn.model_selection import KFold

kf = KFold(n_splits = 5, shuffle = True, random_state = 334)
for i_train, i_test in kf.split(X, y):
    X_train = X[i_train]
    y_train = y[i_train]
    X_test = X[i_test]
    y_test = y[i_test]

Others

If you ever specify cv in scikit-learn, you can assign KFold objects to it and apply it to various functions.

An example is cross_validate.

kennard_stone

from kennard_stone import KFold
from sklearn.model_selection import cross_validate

kf = KFold(n_splits = 5)
print(cross_validate(estimator, X, y, cv = kf))

scikit-learn

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate

kf = KFold(n_splits = 5, shuffle = True, random_state = 334)
print(cross_validate(estimator, X, y, cv = kf))

OR

from sklearn.model_selection import cross_validate

print(cross_validate(estimator, X, y, cv = 5))

Points to note

There is no notion of random_state or shuffle because the partitioning is determined uniquely for the dataset. If these arguments are included, they do not cause an error. They simply have no effect on the result. Please be careful.

If you want to run the notebook in example directory, you will need to additionally download matplotlib, seaborn, tqdm, and jupyter other than the packages in requirements.txt.

LICENSE

MIT Licence

Copyright (c) 2021 yu9824

References

Papers

Sites

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kennard_stone-1.1.2.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kennard_stone-1.1.2-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file kennard_stone-1.1.2.tar.gz.

File metadata

  • Download URL: kennard_stone-1.1.2.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 CPython/3.9.12

File hashes

Hashes for kennard_stone-1.1.2.tar.gz
Algorithm Hash digest
SHA256 c7e411ff7e4df35b652343a36e1d502f730933312085eb2fbb271f288124d872
MD5 dcc7aa921b4143ef77a3c51c5ca2ebe5
BLAKE2b-256 10040b6b7fb56a26011dc507dcd769c0374eceb921e6df9c70f7473e6d2b9d9c

See more details on using hashes here.

File details

Details for the file kennard_stone-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: kennard_stone-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 CPython/3.9.12

File hashes

Hashes for kennard_stone-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1668eb0f1829e97fd7ac7b937faa86884c31aec1da60fcb7410181935890b2d5
MD5 2ed6580c7d5a4666045ee1302c32a773
BLAKE2b-256 a6aa1dbc6728d2d2986d58ac2aa5f02b806deaa7a7067c56208c29cf129994be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page