Skip to main content

Machine learning metrics that are not easy to found

Project description

mletrics

from mletrics.stability import psi
from mletrics.classification import ks

Install

pip install mletrics

How to use

Calculating psi values

import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from scikitplot.metrics import plot_ks_statistic
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from pathlib import Path

p = Path('..')
df = pd.read_csv(p/'datasets/titanic.csv')
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
cat_vars = ['Pclass', 'Sex', 'Embarked']
num_vars = ['Age', 'SibSp', 'Fare']
features = cat_vars + num_vars
target = 'Survived'

X = df[features].copy()
y = df[target].copy()
num_pipe = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='constant', fill_value=-999))
])

cat_pipe = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
        ('ohe', OneHotEncoder(sparse=False, handle_unknown='ignore'))
]) 

transformers = ColumnTransformer(transformers=[
                ('numeric', num_pipe, num_vars),
                ('categoric', cat_pipe, cat_vars)
])

model = Pipeline(steps=[
        ('transformers', transformers),
        ('model', RandomForestClassifier(random_state=42, max_depth=3))
])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model.fit(X_train, y_train)

y_proba_train = model.predict_proba(X_train)[:,1]
y_proba_test  = model.predict_proba(X_test)[:,1]

calculate psi value for the model probability between train and test

psi(y_proba_train, y_proba_test)
0.06001324825109782
  • PSI < 0.1 - No change. You can continue using existing model.
  • PSI >= 0.1 but less than 0.2 - Slight change is required.
  • PSI >= 0.2 - Significant change is required. Ideally, you should not use this model any more.

Reference: https://www.listendata.com/2015/05/population-stability-index.html

Calculating the KS

ks(y_test, y_proba_test)
0.5886743886743887

Comparing with the ks calculated by scikit-plot:

plot_ks_statistic(y_test, np.column_stack([1-y_proba_test, y_proba_test]))
<AxesSubplot:title={'center':'KS Statistic Plot'}, xlabel='Threshold', ylabel='Percentage below threshold'>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mletrics-0.0.3.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mletrics-0.0.3-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file mletrics-0.0.3.tar.gz.

File metadata

  • Download URL: mletrics-0.0.3.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for mletrics-0.0.3.tar.gz
Algorithm Hash digest
SHA256 606c13665f81aa3afd2ca69ecdedf9a7e82334de4edd6ffb2c4d277e0a751f6a
MD5 ff37e0985f965d6a4f852058c335e82d
BLAKE2b-256 b7de3de54d239da71bd441fb70b3172b0d260a10b6de044d9a866f078e2ada1e

See more details on using hashes here.

File details

Details for the file mletrics-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: mletrics-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for mletrics-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 11eb58d540479275cd5b2d16876597e2c4701d3955842b7ffc079865f772be29
MD5 473adfa606ce5fff79ac6aa6b3166c23
BLAKE2b-256 63ba73bd72692836f00ba03c4af6fe1539d6aaad171ec22f7368488820608827

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page