skflex provides a suite of flexible utility functions for use with the sklearn library

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Utilities

Project description

skflex

FLEXIBLE FUNCTIONS ----- FAST PROCESSING AND EVALUATION

skflex provides a suite of utility functions for use with the sklearn library. The module primarily focuses on producing typical plots and metrics for evaluating machine learning models. It has been designed with flexability and customisation in mind to speed up workflows, and enhance comparative evaluation.

Installation and Import

pip install skflex

import skflex.skflex as skf

Functions

Functions currently included along with descriptions and default parameter settings.

refer to the Github repository for example images of plots.

roc_auc_plot

Accepts fitted model(s) and test data. It will then:

Calculate ROC
Calculate AUC
Plot ROC curve with AUC provided in the legend

Parameters:

models - fitted model objects. NOTE: Only models with a 'predict_proba' or 'decision_function' method are supported.
X_test - test feature set
y_test - test labels set
title - title for ROC curve plot
width - plot width
height - plot height
legend_size - size of plot legend

Default:

models, X_test = None, y_test = None, width = 14, height = 12, legend_size = 14, title='ROC Curve'

Example:

from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression

model_1 = GaussianNB()
model_2 = LogisticRegression()

model_1.fit(X_train, y_train)
model_2.fit(X_train, y_train)

skf.roc_auc_plot(model_1, model_2, X_test = X_test, y_test = y_test, 
                title = 'Example ROC plot')

classifier_train_report

Accepts classifier models, training data, and test data. It will then:

Fit the model(s) to training data
Make predictions using test data
Produce classification report for comparison
Produce confusion matrix for comparison
Provide an ordered summary (ranked best to worst score) using given evaluation metric

Parameters:

models - model objects to be trained and evaluated
X_train - training feature set
y_train - training label set
X_test - test feature set
y_test - test label set
scoring - summary evaluation metric. Common classifier evaluation metrics including accuracy, f1, precision, and recall are supported. Refer to sklean scoring documentation for more information. Scoring methodologies should be passed as strings, for example, precision would be passed as scoring = 'precision'
title - title for output report

Default:

models, X_train = None, y_train = None, X_test = None, y_test = None, scoring = 'accuracy', title = 'Reports'

Example:

from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression

model_1 = GaussianNB()
model_2 = LogisticRegression()

skf.classifier_train_report(model_1, model_2, X_train = X_train, y_train = y_train, 
                            X_test = X_test, 
                            y_test = y_test, 
                            scoring = 'accuracy', 
                            title = 'Example Report')

validation_plot

Accepts a model, a related hyper-parameter, a list of hyper-parameter values, training and test data, number of cross-validation folds, scoring methodology, as well as a plot title. It will produce a plot of the validation curve for the training and test data using the mean scores and standard deviations obtained through the cross-validation process.

Parameters:

model - model object
param - hyperparameter to be used to produce the validation curve
param_grid - hyperparameter values to be tested
X_train - training feature set
y_train - training label set
cv - number of cross-validation folds
scoring - scoring methodology used during cross-validation process
title - title for validation plot
width - plot width
height - plot height

Default:

model = None, param = None, param_grid = None, X_train = None, y_train = None, cv = 5, scoring = 'accuracy', width = 9, height = 9, title = 'Validation Curve'

Example:

from sklearn.naive_bayes import DecisionTreeClassifier

model_1 = DecisionTreeClassifier()
params = [5, 10, 15, 20, 30, 40, 50]

skf.validation_plot(model = model_1, param = 'max_depth', param_grid = params, 
                    X_train = X_train, 
                    y_train = y_train, 
                    title = 'Example Validation Curve')

train_val_test

Accepts a Pandas dataframe and will return a training, validation, and test set. Operates in a similar fashion to the sklearn train_test_split function by defining a percentage split for the training and validation sets (example 0.6 = 60%). The remainder is allocated to the test set.

Parameters:

data - dataframe to be split into a training, validation, and test set
class_labels - column in the dataframe containing class labels
train - percentage of data to be allocated to the training set
val - percentage of data to be allocated to the validation set
shuffle - if true, will shuffle the rows in the dataframe before splitting
random_state - if shuffle is ture, will set a random seed so that ordering produced by shuffle can be reproduced

Default:

data = None, class_labels = None, train = 0.6, val = 0.2, shuffle = True, random_state = None

Returns: X_train, y_train, X_val, y_val, X_test, y_test

Example:

X_train, y_train, X_val, y_val, X_test, y_test = skf.train_val_test(data = my_data, 
                                                                    class_labels = 'labels', 
                                                                    train = 0.6, 
                                                                    val = 0.2)

pca_scree_plot

Accepts data (array/dataframe), and number of principal components to be analysed. It will produce a scree plot of the cumulative variance explained.

Parameters:

data - dataset to be analysed
n_components - number of principal components to be analysed
width - width of plot
height - height of plot
legend_size - size of plot legend
scale_data - normalises data before analysis and plotting. If the data being passed has not yet been normalised, this parameter should be set as scale_data = True
title - plot title

Default:

data = None, n_components = None, width = 16, height = 10, legend_size = 12, scale_data = False, title = 'PCA Scree Plot'

Example:

from sklearn.preprocessing import scale

n_data = scale(my_data)

skf.pca_scree_plot(data = n_data, n_components = 70, title = 'Example PCA Scree Plot')

Requirements

Sklearn >= 0.24.1
Matplotlib >= 3.3.4
Pandas >= 1.2.4
Numpy >= 1.20.1

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Utilities

Release history Release notifications | RSS feed

1.0.2

Oct 19, 2021

1.0.1

Oct 18, 2021

This version

1.0.0

Oct 17, 2021

0.0.1

Oct 16, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skflex-1.0.0.tar.gz (5.5 kB view details)

Uploaded Oct 17, 2021 Source

Built Distribution

skflex-1.0.0-py3-none-any.whl (5.9 kB view details)

Uploaded Oct 17, 2021 Python 3

File details

Details for the file skflex-1.0.0.tar.gz.

File metadata

Download URL: skflex-1.0.0.tar.gz
Upload date: Oct 17, 2021
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for skflex-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4e525c2148f770147b95fff286e3435c195407f676453414996b012f949bf9ea`
MD5	`cc00bac6ebf9cd376cd82ff4c45d7710`
BLAKE2b-256	`f2dba0c0a43fdd37d32322d041e056dbe7189e080baa392389a8135088e7e879`

See more details on using hashes here.

File details

Details for the file skflex-1.0.0-py3-none-any.whl.

File metadata

Download URL: skflex-1.0.0-py3-none-any.whl
Upload date: Oct 17, 2021
Size: 5.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for skflex-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e527725304ebc9c6e3cb8cc935a7dcc4256a5fa5e218689caed7d72514b041f`
MD5	`2c4653196432b38a8d10372933c193e6`
BLAKE2b-256	`1c48bb9c146ae422a66253ce5f04699cb52cd39c74910f6cdcc9a49cd27e3b5a`

See more details on using hashes here.

skflex 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

skflex

FLEXIBLE FUNCTIONS ----- FAST PROCESSING AND EVALUATION

Installation and Import

Functions

roc_auc_plot

Parameters:

Default:

Example:

classifier_train_report

Parameters:

Default:

Example:

validation_plot

Parameters:

Default:

Example:

train_val_test

Parameters:

Default:

Example:

pca_scree_plot

Parameters:

Default:

Example:

Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes