Kolmogorov-Smirnov metric for machine learning
Project description
Kolmogorov-Smirnov metric (ks metric) is derived from K-S test. K-S test measures the distance between two plotted cumulative distribution functions (CDF). To use it as a metric for classification machine learning problem we see the distance of plotted CDF of target and non-target. The model that produces the greatest amount of separability between target and non-target distribution would be considered the better model.
Installation
The package requires: pandas and numpy.
To install the package, execute:
$ python setup.py install
or
pip install ks_metric
Usage
To get the KS score :
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from ks_metric import ks_score
data = load_breast_cancer()
X, y = data['data'], data['target']
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)
clf = LogisticRegression(random_state=0, max_iter=10000).fit(X_train, y_train)
ks_score(y_train, clf.predict_proba(X_train)[:,1])
KS table :
from ks_metric import ks_table
ks_table(y_train, clf.predict_proba(X_train)[:,1])
KS scorer (for hyperparameter search) :
from sklearn.model_selection import GridSearchCV
from ks_metric import ks_scorer
clf = GridSearchCV(estimator=LogisticRegression(), param_grid={'C':[0.01,0.1,1]}, scoring=ks_scorer)
see the example notebook for detailed usage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ks_metric-0.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 573373321eb11659b96bed5591457da3951692e3207340aa73a7bb9b8d251a30 |
|
MD5 | 37576fccb6e886e810ce794a6d343eb2 |
|
BLAKE2b-256 | 4cf96d6a291a10ecb9b3e9af83187782aa7622084fc78f91185e9acb98b9e7b4 |