Skip to main content

A toolkit for Credit Scoring using Weight of Evidence (WoE) and Logistic Regression

Project description

Contributors

Forks

Stargazers

Issues

GPLv3 License

LinkedIn

Credit Scoring Toolkit

In finance is a common practice to create risk scorecards to assess the credit worthiness for a given customer. Unfortunately, out of the box credit scoring tools are quite expensive and scatter, that's why we created this toolkit: to empower all credit scoring practicioners and spread the use of weight of evidence based scoring techniques for alternative uses cases (virtually any binary classification problem).


Explore the documentation»
Report Bug

Request Feature

Table of Contents
  1. About The Project
    1. Discrete Normalizer
    2. Discretizer
    3. WoeEncoder
    4. WoeBaseFeatureSelector
    5. WoeContinuousFeatureSelector
    6. WoeDiscreteFeatureSelector
    7. CreditScoring
    8. IVCalculator
    9. Built With
  2. Installation
  3. Usage
  4. Contributing
  5. License
  6. Contact
  7. Citing
  8. Acknowledgments

About The Project

The general process for creating Weight of Evidence based scorecards is illustrated in the figure below :

alt text

For that matter, we implemented the following classes to address the necesary steps to perform credit scoring transformation:

DiscreteNormalizer

Class for normalizing discrete data for a given relative frequency threshold

Discretizer

Class for discretizing continuous data into bins using several methods

WoeEncoder

Class for encoding discrete features into Weight of Evidence(WoE) transformation

WoeBaseFeatureSelector

Base class for selecting features based on their WoE transformation and Information Value statistic.

WoeContinuousFeatureSelector

Class for selecting continuous features based on their WoE transformation and Information Value statistic.

WoeDiscreteFeatureSelector

Class for selecting discrete features based on their WoE transformation and Information Value statistic.

CreditScoring

Implements credit risk scorecards following the methodology proposed in Siddiqi, N. (2012). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.

IVCalculator

A utility class to quickly calculate Information Value (IV) for both continuous and discrete features. This class provides a simple interface that abstracts away the manual steps of discretization and normalization, making it easy to assess feature predictive power.

Built With

(back to top)

Installation

You can simply install the module using pip

  • pip
pip install woe-credit-scoring

(back to top)

Usage

The new AutoCreditScoring class provides a streamlined way to train a credit scoring model, generate reports, and make predictions. Here's a quick example of how to use it:

Dependencies

import pandas as pd 
from CreditScoringToolkit import AutoCreditScoring
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn.preprocessing._discretization")

Reading example data

# Read example data for train and validation (loan applications)
train = pd.read_csv('example_data/train.csv')
valid = pd.read_csv('example_data/valid.csv')   

Defining feature type

# Assign features lists by type
vard = [v for v in train.columns if v.startswith('D_')]
varc = [v for v in train.columns if v.startswith('C_')]

Automated Credit Scoring

The AutoCreditScoring class handles the entire workflow, from feature selection and WoE transformation to model training and scoring.

# If you prefer, use AutoCreditScoring class to perform all the steps in a single call with additional features
# like outlier detection and treatment, feature selection, reporting and more.
from CreditScoringToolkit import AutoCreditScoring

kwargs = {'iv_feature_threshold':0.05,
          'max_discretization_bins':6,
          'strictly_monotonic':True,
          'create_reporting':True,
          'discretization_method':'dcc'}
acs = AutoCreditScoring(train,'TARGET',varc,vard)
acs.fit(**kwargs)

# You can also save the reports to a folder in PNG format
acs.save_reports('reports')

This will generate several reports, including:

  • Score distribution histograms and KDE plots
  • Event rate by score range plots
  • Feature importance based on Information Value
  • ROC curve for the model

png png png png png

Making Predictions

Once the model is trained, you can use the predict method to score new data.

predictions = acs.predict(valid)
predictions.head()

This will return a DataFrame with the individual point contributions for each feature (pts_* columns) and the final score.

IV Calculator

The IVCalculator class provides a quick and easy way to calculate Information Value (IV) for your features without going through the entire credit scoring workflow. This is useful for initial feature analysis and selection.

from CreditScoringToolkit import IVCalculator

# Initialize IVCalculator with your data
iv_calculator = IVCalculator(
    data=train,
    target='TARGET',
    continuous_features=varc,
    discrete_features=vard
)

# Calculate IV for all features
iv_report = iv_calculator.calculate_iv(
    max_discretization_bins=5,
    strictly_monotonic=False,
    discretization_method='quantile',
    discrete_normalization_threshold=0.05
)

# Display the report
print(iv_report)

The output will be a DataFrame with columns:

  • feature: Feature name
  • iv: Information Value
  • feature_type: 'continuous' or 'discrete'

This allows you to quickly identify which features have the most predictive power before building your full credit scoring model.

(back to top)

Contributing

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".

Don't forget to give the project a star! Thanks again!

  1. Fork the Project

  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)

  3. Commit your Changes (git commit -m 'Add some AmazingFeature')

  4. Push to the Branch (git push origin feature/AmazingFeature)

  5. Open a Pull Request

(back to top)

License

Distributed under the GNU General Public License v3.0 License. See LICENSE for more information.

(back to top)

Contact

José G Fuentes - @jgusteacher - jose.gustavo.fuentes@comunidad.unam.mx

Project Link: https://github.com/JGFuentesC/woe_credit_scoring

(back to top)

Citing

If you use this software in scientific publications, we would appreciate citations to the following paper:

Combination of Unsupervised Discretization Methods for Credit Risk José G. Fuentes Cabrera, Hugo A. Pérez Vicente, Sebastián Maldonado,Jonás Velasco

(back to top)

Acknowledgments

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

woe_credit_scoring-2.0.4.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

woe_credit_scoring-2.0.4-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file woe_credit_scoring-2.0.4.tar.gz.

File metadata

  • Download URL: woe_credit_scoring-2.0.4.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for woe_credit_scoring-2.0.4.tar.gz
Algorithm Hash digest
SHA256 232e2fbb29951a392eb3dccf936554a5db6611c2755a4e26c32dc59d51e76d9f
MD5 0143a3ab66fc4c4ab59d4c9320499195
BLAKE2b-256 871f21aedd1e28be0140325d905d92a2862064c7a316008906f5be941172921f

See more details on using hashes here.

File details

Details for the file woe_credit_scoring-2.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for woe_credit_scoring-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6f64361b6f34093912953301fafdeecd23018ff2eb8310ada9c7d536b4da553d
MD5 7b4bd7977dd5eee0d8f9ec6f48a9b15d
BLAKE2b-256 b876ba5184e0a6d04963235fe72be264d0e67cc073d9f0d21459dab63e5c39b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page