A toolkit for Credit Scoring using Weight of Evidence (WoE) and Logistic Regression
Project description
Credit Scoring Toolkit
In finance is a common practice to create risk scorecards to assess the credit worthiness for a given customer. Unfortunately, out of the box credit scoring tools are quite expensive and scatter, that's why we created this toolkit: to empower all credit scoring practicioners and spread the use of weight of evidence based scoring techniques for alternative uses cases (virtually any binary classification problem).
Explore the documentation»
Report Bug
Table of Contents
About The Project
The general process for creating Weight of Evidence based scorecards is illustrated in the figure below :
For that matter, we implemented the following classes to address the necesary steps to perform credit scoring transformation:
DiscreteNormalizer
Class for normalizing discrete data for a given relative frequency threshold
Discretizer
Class for discretizing continuous data into bins using several methods
WoeEncoder
Class for encoding discrete features into Weight of Evidence(WoE) transformation
WoeBaseFeatureSelector
Base class for selecting features based on their WoE transformation and Information Value statistic.
WoeContinuousFeatureSelector
Class for selecting continuous features based on their WoE transformation and Information Value statistic.
WoeDiscreteFeatureSelector
Class for selecting discrete features based on their WoE transformation and Information Value statistic.
CreditScoring
Implements credit risk scorecards following the methodology proposed in Siddiqi, N. (2012). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.
IVCalculator
A utility class to quickly calculate Information Value (IV) for both continuous and discrete features. This class provides a simple interface that abstracts away the manual steps of discretization and normalization, making it easy to assess feature predictive power.
Built With
Installation
You can simply install the module using pip
- pip
pip install woe-credit-scoring
Usage
The new AutoCreditScoring class provides a streamlined way to train a credit scoring model, generate reports, and make predictions. Here's a quick example of how to use it:
Dependencies
import pandas as pd
from CreditScoringToolkit import AutoCreditScoring
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn.preprocessing._discretization")
Reading example data
# Read example data for train and validation (loan applications)
train = pd.read_csv('example_data/train.csv')
valid = pd.read_csv('example_data/valid.csv')
Defining feature type
# Assign features lists by type
vard = [v for v in train.columns if v.startswith('D_')]
varc = [v for v in train.columns if v.startswith('C_')]
Automated Credit Scoring
The AutoCreditScoring class handles the entire workflow, from feature selection and WoE transformation to model training and scoring.
# If you prefer, use AutoCreditScoring class to perform all the steps in a single call with additional features
# like outlier detection and treatment, feature selection, reporting and more.
from CreditScoringToolkit import AutoCreditScoring
kwargs = {'iv_feature_threshold':0.05,
'max_discretization_bins':6,
'strictly_monotonic':True,
'create_reporting':True,
'discretization_method':'dcc'}
acs = AutoCreditScoring(train,'TARGET',varc,vard)
acs.fit(**kwargs)
# You can also save the reports to a folder in PNG format
acs.save_reports('reports')
This will generate several reports, including:
- Score distribution histograms and KDE plots
- Event rate by score range plots
- Feature importance based on Information Value
- ROC curve for the model
Making Predictions
Once the model is trained, you can use the predict method to score new data.
predictions = acs.predict(valid)
predictions.head()
This will return a DataFrame with the individual point contributions for each feature (pts_* columns) and the final score.
IV Calculator
The IVCalculator class provides a quick and easy way to calculate Information Value (IV) for your features without going through the entire credit scoring workflow. This is useful for initial feature analysis and selection.
from CreditScoringToolkit import IVCalculator
# Initialize IVCalculator with your data
iv_calculator = IVCalculator(
data=train,
target='TARGET',
continuous_features=varc,
discrete_features=vard
)
# Calculate IV for all features
iv_report = iv_calculator.calculate_iv(
max_discretization_bins=5,
strictly_monotonic=False,
discretization_method='quantile',
discrete_normalization_threshold=0.05
)
# Display the report
print(iv_report)
The output will be a DataFrame with columns:
feature: Feature nameiv: Information Valuefeature_type: 'continuous' or 'discrete'
This allows you to quickly identify which features have the most predictive power before building your full credit scoring model.
Contributing
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks again!
-
Fork the Project
-
Create your Feature Branch (
git checkout -b feature/AmazingFeature) -
Commit your Changes (
git commit -m 'Add some AmazingFeature') -
Push to the Branch (
git push origin feature/AmazingFeature) -
Open a Pull Request
License
Distributed under the GNU General Public License v3.0 License. See LICENSE for more information.
Contact
José G Fuentes - @jgusteacher - jose.gustavo.fuentes@comunidad.unam.mx
Project Link: https://github.com/JGFuentesC/woe_credit_scoring
Citing
If you use this software in scientific publications, we would appreciate citations to the following paper:
Combination of Unsupervised Discretization Methods for Credit Risk José G. Fuentes Cabrera, Hugo A. Pérez Vicente, Sebastián Maldonado,Jonás Velasco
Acknowledgments
-
Siddiqi, N. (2012). Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.. For his amazing textbook.
-
@othneildrew. For his amazing README template
-
Demo data. For providing example data.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file woe_credit_scoring-2.0.4.tar.gz.
File metadata
- Download URL: woe_credit_scoring-2.0.4.tar.gz
- Upload date:
- Size: 41.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
232e2fbb29951a392eb3dccf936554a5db6611c2755a4e26c32dc59d51e76d9f
|
|
| MD5 |
0143a3ab66fc4c4ab59d4c9320499195
|
|
| BLAKE2b-256 |
871f21aedd1e28be0140325d905d92a2862064c7a316008906f5be941172921f
|
File details
Details for the file woe_credit_scoring-2.0.4-py3-none-any.whl.
File metadata
- Download URL: woe_credit_scoring-2.0.4-py3-none-any.whl
- Upload date:
- Size: 43.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f64361b6f34093912953301fafdeecd23018ff2eb8310ada9c7d536b4da553d
|
|
| MD5 |
7b4bd7977dd5eee0d8f9ec6f48a9b15d
|
|
| BLAKE2b-256 |
b876ba5184e0a6d04963235fe72be264d0e67cc073d9f0d21459dab63e5c39b9
|