A method for selecting samples by spreading the training data evenly.
Project description
Kennard Stone
What is this?
This is an algorithm for evenly partitioning data in a scikit-learn
-like interface. (See References for details of the algorithm.)
How to install
pip install kennard-stone
You need numpy
, pandas
and scikit-learn
.
How to use
You can use them like scikit-learn.
See example for details.
In the following, X
denotes an arbitrary explanatory variable and y
an arbitrary objective variable.
And, estimator
indicates an arbitrary prediction model that conforms to scikit-learn.
train_test_split
kennard_stone
from kennard_stone import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
scikit-learn
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 334)
KFold
kennard_stone
from kennard_stone import KFold
kf = KFold(n_splits = 5)
for i_train, i_test in kf.split(X, y):
X_train = X[i_train]
y_train = y[i_train]
X_test = X[i_test]
y_test = y[i_test]
scikit-learn
from sklearn.model_selection import KFold
kf = KFold(n_splits = 5, shuffle = True, random_state = 334)
for i_train, i_test in kf.split(X, y):
X_train = X[i_train]
y_train = y[i_train]
X_test = X[i_test]
y_test = y[i_test]
Others
If you ever specify cv
in scikit-learn, you can assign KFold
objects to it and apply it to various functions.
An example is cross_validate
.
kennard_stone
from kennard_stone import KFold
from sklearn.model_selection import cross_validate
kf = KFold(n_splits = 5)
print(cross_validate(estimator, X, y, cv = kf))
scikit-learn
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
kf = KFold(n_splits = 5, shuffle = True, random_state = 334)
print(cross_validate(estimator, X, y, cv = kf))
OR
from sklearn.model_selection import cross_validate
print(cross_validate(estimator, X, y, cv = 5))
Points to note
There is no notion of random_state
or shuffle
because the partitioning is determined uniquely for the dataset.
If these arguments are included, they do not cause an error. They simply have no effect on the result. Please be careful.
If you want to run the notebook in example directory, you will need to additionally download matplotlib, seaborn, tqdm, and jupyter other than the packages in requirements.txt.
LICENSE
MIT Licence
Copyright (c) 2021 yu9824
References
Papers
- R. W. Kennard & L. A. Stone (1969) Computer Aided Design of Experiments, Technometrics, 11:1, 137-148, DOI: 10.1080/00401706.1969.10490666
Sites
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for kennard_stone-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef5e9deed9b7e62bcd51e13baa1893eb98e1b7a2564f28d434b5fb5148cc512f |
|
MD5 | 8f4728554836172986b10bbb29f1e9b7 |
|
BLAKE2b-256 | 385305a507667c2fdbf781c0c5087a7b65404026c8a25fd54c59b335e3e9ff02 |