Skip to main content

A package for credit scoring analysis

Project description

CreditPy

A Credit Risk Scoring and Validation Package

CreditPy is a Python package developed as a successor to the CreditR package, drawing inspiration from its successful history. With a wide user base including companies in sectors such as audit and consultancy firms, technology consultancy firms, management consultancy firms, banks, financial institutions, and fintech companies, CreditPy aims to provide an easy model set-up functionality in the field of credit risk scoring.

CreditPy enables the efficient implementation of credit risk scoring methodologies. It facilitates tasks such as variable analysis, variable selection, model development, model calibration, rating scale development, and model validation quickly and effectively.

CreditPy continues the legacy of CreditR, which has had a successful global adoption and integration across various industries. It is used by companies from Turkey, the Netherlands, the United States, India, the United Kingdom, Nigeria, South Africa, Australia, Brazil, Mexico, Belgium, Sweden, Dubai, Abu Dhabi, Saudi Arabia, Azerbaijan, Russia and many others. Designed with an intellectual perspective, CreditPy aims to cover the needs of basic credit risk scoring applications.

Prerequisites

Before using CreditPy, please make sure you have Python installed on your system. You can install CreditPy and its dependencies using pip:

pip install creditpy

Getting Started

You can import CreditPy modules and use its functions as follows:

#In case if the package is not visible for the operating system.
# Get the current directory of the script
current_dir = os.path.dirname(os.path.realpath(__file__))

# Construct the path to the creditpy package
creditpy_path = os.path.join(current_dir, 'creditpy')

# Add the creditpy package path to sys.path
sys.path.append(creditpy_path)

import pandas as pd
from creditpy import calculate_gini, missing_ratio, train_test_split, woe, IV_calc_data, Gini_univariate_data, Gini_elimination, variable_clustering, variable_clustering_gini, correlation_cluster, max_gini_model, woe_glm_feature_importance, scaled_score, regression_calibration, master_scale, bayesian_calibration, vif_calc, k_fold_cross_validation_glm, Kolmogorov_Smirnov, PSI_calc_data, Herfindahl_Hirschman_Index, Anchor_point, chisquare_test, Binomial_test

An Application of the Package

An example application of the package is shared below in a study of how some common steps in credit risk scoring are carried out using the functions provided in the package.

#This Python script is designed to make the creditpy package easier to understand.
#Obtaining a high accuracy model is not within the scope of this study.

# Load sample data
germancredit = pd.read_csv('data/german_credit.csv')

# Prepare sample data
sample_data = germancredit[["duration.in.month", "credit.amount", "installment.rate.in.percentage.of.disposable.income",
                            "age.in.years", "creditability"]]

# Calculate missing ratios
missing_ratio_result = missing_ratio(sample_data)
print("Missing Ratio:", missing_ratio_result)

# Split data into train and test sets
train, test = train_test_split(sample_data, random_state=123, train_size=0.70)

# Apply WOE transformation
train_woe, test_woe = woe(train, test, target_column='creditability')

# Calculate IV and Gini for the whole dataset
IV_summary = IV_calc_data(train_woe, "creditability")
print("Information Value (IV) Summary:", IV_summary)
gini_summary = Gini_univariate_data(train_woe, "creditability")
print("Univariate Gini Summary:", gini_summary)

# Gini elimination
eliminated_data = Gini_elimination(train_woe, "creditability", 0.1825)
print("Data after Gini elimination:\n", eliminated_data)

# Variable clustering
clustering_data = variable_clustering(eliminated_data, "creditability", 2)
print("Variable Clustering Data:", clustering_data)
gini_values = variable_clustering_gini(eliminated_data, "creditability", 2)
print("Gini Values:", gini_values)
# Call the correlation_cluster function
correlation_cluster_result = correlation_cluster(eliminated_data, clustering_data, clusters='Group', target_column="creditability")
print("Correlation Cluster Result:", correlation_cluster_result)

# Logistic regression model
model = max_gini_model(eliminated_data, "creditability", 10)

# Calculate variable weights
variable_weights = woe_glm_feature_importance(eliminated_data, model, "creditability")
print("Variable Weights:", variable_weights)

# Get the columns used for training the model (excluding the target variable)
training_columns = eliminated_data.drop(columns=['creditability']).columns
# Fill missing values with 0 in the training data (This is just for example usage)
eliminated_data.fillna(0, inplace=True)  # Replace NaN with 0
# Generate PD values for train data using aligned columns
train_probs = model.predict_proba(eliminated_data[training_columns])[:, 1]
ms_train_data = pd.concat([eliminated_data[training_columns], pd.Series(train_probs, name="PD", index=eliminated_data.index)], axis=1)

# Align the columns of the test dataset with the training columns
test_data_aligned = test_woe[training_columns]
# Fill missing values with 0 (This is just for example usage)
test_data_aligned.fillna(0, inplace=True)  # Replace NaN with 0
# Generate PD values for test data using aligned columns
test_probs = model.predict_proba(test_data_aligned)[:, 1]
ms_test_data = pd.concat([test_data_aligned, pd.Series(test_probs, name="PD", index=test_data_aligned.index)], axis=1)
ms_train_data['creditability'] = eliminated_data['creditability']
ms_test_data['creditability'] = test_woe['creditability']

# Bayesian calibration
ms_train_data["Score"] = np.log(ms_train_data["PD"] / (1 - ms_train_data["PD"]))
ms_test_data["Score"] = np.log(ms_test_data["PD"] / (1 - ms_test_data["PD"]))
master_scale_data = master_scale(ms_train_data, "creditability", "PD", 10)
bayesian_method = bayesian_calibration(master_scale(ms_train_data, "creditability", "PD", 10), average_score='Score', calibration_data = ms_train_data, calibration_data_score="Score", total_observations= 'Total.Observations', PD = "PD", central_tendency=0.05)
print("Calibration model:", bayesian_method["Calibration_model"].summary())
print("Calibration formula:", bayesian_method["Calibration_formula"])
print("Master scale data:", bayesian_method["Data"].head())
print("Calibration data:", bayesian_method["Calibration_data"].head())

# Scaled score
scaled_score_data = scaled_score(bayesian_method["Calibration_data"], "calibrated_pd", 3000, 15)
print("Scaled Score Data:", scaled_score_data)

# Calculate VIF
vif_values = vif_calc(eliminated_data)
print("VIF Values:", vif_values)

# Assuming you have predictions and actual values from your model
predictions = ms_test_data['PD']
actual_values = ms_test_data["creditability"]

# Calculate Gini coefficient for the model
gini_value = calculate_gini(predictions, actual_values)
print("Gini Value:", gini_value)

# 5 Fold cross-validation
k_fold_result = k_fold_cross_validation_glm(ms_train_data, "creditability", 5, 1)
print("5 Fold Cross Validation Result:", k_fold_result)

# KS test
ks_result_train = Kolmogorov_Smirnov(ms_train_data, "creditability", "PD")
print("KS Result (Train Data):", ks_result_train)
ks_result_test = Kolmogorov_Smirnov(ms_test_data, "creditability", "PD")
print("KS Result (Test Data):", ks_result_test)

# Variable stabilities measurement
psi_result = PSI_calc_data(train_woe, test_woe, bins=10, default_flag="creditability")
print("PSI Result:", psi_result)

# HHI test
hhi_value = Herfindahl_Hirschman_Index(master_scale_data, "Total.Observations")
print("HHI Value:", hhi_value)

# Anchor point test
anchor_result = Anchor_point(master_scale_data, "PD", "Total.Observations", 0.30)
print("Anchor Point Result:", anchor_result)

# Chi-square test
chisquare_result = chisquare_test(master_scale_data, "PD", "Bad.Count", "Total.Observations", 0.90)
print("Chi-square Test Result:", chisquare_result)

# Binomial test
binomial_result = Binomial_test(master_scale_data, "Total.Observations", "PD", "Bad.Rate", 0.90, "one")
print("Binomial Test Result:", binomial_result)

Bug Fixes

Please inform me about the errors you have encountered while using the package via the e-mail address that is shared in the Author section.

Author

License

This project is licensed under the MIT - See the LICENSE.md file for details

Built With

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

creditpy-2.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

creditpy-2.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file creditpy-2.0.tar.gz.

File metadata

  • Download URL: creditpy-2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for creditpy-2.0.tar.gz
Algorithm Hash digest
SHA256 d2650a213cddefb45b32689df84a9ecf92b6d46dc631735aa93c6f0d7b0a06d4
MD5 3150da0b515b7633f619711acb189106
BLAKE2b-256 12a56ba9d257bccddd56c76502d9839b2d9ca0ab6bedc0dc4104f6db0f06ff11

See more details on using hashes here.

File details

Details for the file creditpy-2.0-py3-none-any.whl.

File metadata

  • Download URL: creditpy-2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for creditpy-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ac0a9bc3ff1432568b90187de9a33a583e8e949f92ef516a6908d708d8b91aa
MD5 c5aae1fe7d8d79c54898247388f2f174
BLAKE2b-256 df109351557a622fcc46b6b3a4ee35d7a49ac43cde6d1718d89c5da067d3ab20

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page