Skip to main content

The missing scikit-learn addition to work with Weight-of-Evidence scoring.

Project description

image

pywoe [Beta]

The missing scikit-learn addition to work with Weight-of-Evidence scoring, with a special focus on credit risk modelling. There's evidently a lack of open source, free-to-use, well-tested Python package for basic credit risk modelling tasks. Such a package should provide easily serialisable, deployable, transferable data validation, feature engineering and feature selection techniques. It should also be easy to use within the Jupyter Lab framework.

This is still very much a work-in-progress, and the package can be extended in multiple useful ways. Feel free to contribute.

Table of Contents

  1. Installation
  2. Usage Examples
  3. Further Work

Installation

To install the latest version of the package, simply run

pip install pywoe

Usage Examples

Introduction

For easy start, there's a ready-made sklearn pipeline provided. To load, do the following. Feel free to run the pipeline on example data, as below.

from pywoe.interface import get_raw_data_to_woe_values_pipeline
from sklearn.datasets import load_breast_cancer

pipeline = get_raw_data_to_woe_values_pipeline()
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
pipeline.fit(X, y)
woe_transformed = pipeline.transform(X)

The setup above automatically constructs bins and computes WoE across them. The output can be used to select features for a logistic regression model, or to preprocess features before entering them to a model.

Informaton Values (IV)

In the example above, Information Values have also been computed. To retrieve them alongside the binning decided for a feature mean radius, do:

pipeline['woe_transformer'].woe_spec['mean radius'].bins

and you'll see the values printed out.

Inspecting Default Settings

from pywoe import constants

constants.NUMERIC_ACCURACY
constants.DEFAULT_DECISION_TREE_CLASSIFIER_FIT_KWARGS
constants.DEFAULT_DECISION_TREE_CLASSIFIER_INIT_KWARGS
constants.P_VALUE_THRESHOLD

Overriding Defaults

from sklearn.pipeline import Pipeline
from pywoe.feature_engineering.validator import FeatureValidator
from pywoe.feature_engineering.binning import DecisionTreeBinner
from pywoe.feature_engineering.woe import WoETransformer

feature_validator = FeatureValidator()
binner = DecisionTreeBinner(
    feature_validator=feature_validator,
    init_kwargs={
        "criterion": "entropy",
        "max_depth": 3,
        "min_samples_leaf": 0.2
    }
)
woe_transformer = WoETransformer(binner=binner)

# Keep in mind `binner` is not an `sklearn` object, it is a parameter 
# to `woe_transformer`, so it's not used in the pipeline.
pipeline = Pipeline([
    ('validator', feature_validator),
    ('woe_transformer', woe_transformer)
])

Further Work

Further work needed includes, but is not limited to:

  • (significantly) improving testing,
  • adding marginal-IV-based automated feature selection,
  • adding Jupyter-integrated plotting capabilities to inspect models,
  • adding residual monitoring (ReMo) capabilities,
  • ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pywoe-0.0.2-py2-none-any.whl (20.9 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page