A method for selecting samples by spreading the training data evenly.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Kennard Stone

python_badge license_badge Downloads

What is this?

This is an algorithm for evenly partitioning data in a scikit-learn-like interface. (See References for details of the algorithm.)

simulateion_gif

How to install

pip install kennard-stone

You need numpy, pandas and scikit-learn.

How to use

You can use them like scikit-learn.

See example for details.

In the following, X denotes an arbitrary explanatory variable and y an arbitrary objective variable. And, estimator indicates an arbitrary prediction model that conforms to scikit-learn.

train_test_split

kennard_stone

from kennard_stone import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

scikit-learn

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 334)

KFold

kennard_stone

from kennard_stone import KFold

kf = KFold(n_splits = 5)
for i_train, i_test in kf.split(X, y):
    X_train = X[i_train]
    y_train = y[i_train]
    X_test = X[i_test]
    y_test = y[i_test]

scikit-learn

from sklearn.model_selection import KFold

kf = KFold(n_splits = 5, shuffle = True, random_state = 334)
for i_train, i_test in kf.split(X, y):
    X_train = X[i_train]
    y_train = y[i_train]
    X_test = X[i_test]
    y_test = y[i_test]

Others

If you ever specify cv in scikit-learn, you can assign KFold objects to it and apply it to various functions.

An example is cross_validate.

kennard_stone

from kennard_stone import KFold
from sklearn.model_selection import cross_validate

kf = KFold(n_splits = 5)
print(cross_validate(estimator, X, y, cv = kf))

scikit-learn

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate

kf = KFold(n_splits = 5, shuffle = True, random_state = 334)
print(cross_validate(estimator, X, y, cv = kf))

from sklearn.model_selection import cross_validate

print(cross_validate(estimator, X, y, cv = 5))

Points to note

There is no notion of random_state or shuffle because the partitioning is determined uniquely for the dataset.
If you include them in the argument, you will not get an error, but they have no effect, so be careful.

References

Papers

R. W. Kennard & L. A. Stone (1969) Computer Aided Design of Experiments, Technometrics, 11:1, 137-148, DOI: 10.1080/00401706.1969.10490666

Sites

https://datachemeng.com/trainingtestdivision/

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.2.1

Dec 4, 2023

2.2.0

Dec 3, 2023

2.1.6

Oct 21, 2023

2.1.5

Sep 16, 2023

2.1.3

Sep 10, 2023

2.1.2

Aug 23, 2023

2.1.1

May 6, 2023

2.1.0

May 1, 2023

2.0.1

Apr 25, 2023

2.0.0

Apr 22, 2023

1.1.2

May 18, 2022

1.1.1

Apr 23, 2022

1.1.0

Aug 11, 2021

This version

1.0.4

Jul 15, 2021

1.0.3

Jul 10, 2021

1.0.2

May 12, 2021

1.0.1

May 3, 2021

1.0.0

Apr 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kennard_stone-1.0.4.tar.gz (7.9 kB view hashes)

Uploaded Jul 15, 2021 Source

Built Distribution

kennard_stone-1.0.4-py3-none-any.whl (6.5 kB view hashes)

Uploaded Jul 15, 2021 Python 3

Hashes for kennard_stone-1.0.4.tar.gz

Hashes for kennard_stone-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`e8f73ba217cc6b9c93ee1fdcbea5a162ca940bd42f86bc517ac91def2f466a8f`
MD5	`caba44ed14326aae76aac8e255c83c99`
BLAKE2b-256	`84bdaaa2a60cb2779e0078cb32cc92ed986a1102ab7012da1ea3a46a530a2599`

Hashes for kennard_stone-1.0.4-py3-none-any.whl

Hashes for kennard_stone-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ba748476c79c534a2d1c457e2bf9d9aeaec58655f7c7336817c130241c5525b`
MD5	`10104b1882e1c73b07ac3e39e92c4246`
BLAKE2b-256	`78c45ea8c558bcfb78f1a7576047ecbcc028deae38b9349d02e3d21b2900a8c4`