Weight Of Evidence Transformer and LogisticRegression model with scikit-learn API

These details have not been verified by PyPI

Project description

WOE-Scoring

Monotone Weight Of Evidence (WOE) Transformer and LogisticRegression model with scikit-learn API. Optimized for performance and stability.

Features

WOE Transformation: Convert categorical and numerical features to Weight of Evidence encoding
Automated Feature Selection: Multiple algorithms for optimal feature selection
Automated Feature Generation: Automatically create and select high-quality ratio and interaction features
Binning Strategies: Smart binning with monotonicity constraints
Sklearn Compatibility: Follows scikit-learn's API standards
Performance Optimized: Parallel processing and vectorized operations
SQL Export: Generate SQL for model deployment
Scorecard Generation: Create credit scorecards with customizable scaling

Installation

pip install woe-scoring

Quickstart

Install the package:

pip install woe-scoring

Use WOETransformer:

import pandas as pd
from woe_scoring import WOETransformer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

df = pd.read_csv("titanic_data.csv")
train, test = train_test_split(
    df, test_size=0.3, random_state=42, stratify=df["Survived"]
)

special_cols = [
    "PassengerId",
    "Survived",
    "Name",
    "Ticket",
    "Cabin",
]

cat_cols = [
    "Pclass",
    "Sex",
    "SibSp",
    "Parch",
    "Embarked",
]

encoder = WOETransformer(
    max_bins=8,
    min_pct_group=0.1,
    diff_woe_threshold=0.1,
    cat_features=cat_cols,
    special_cols=special_cols,
    n_jobs=-1,
    merge_type="chi2",
    generate_features=True,  # Enable feature generation
    max_generated_features=10,
)

encoder.fit(train, train["Survived"])
encoder.save_to_file("train_dict.json")

encoder.load_woe_iv_dict("train_dict.json")
encoder.refit(train, train["Survived"])

enc_train = encoder.transform(train)
enc_test = encoder.transform(test)

model = LogisticRegression()
model.fit(enc_train, train["Survived"])
test_proba = model.predict_proba(enc_test)[:, 1]

Use CreateModel:

import pandas as pd
from woe_scoring import CreateModel
from sklearn.model_selection import train_test_split

df = pd.read_csv("titanic_data.csv")
train, test = train_test_split(
    df, test_size=0.3, random_state=42, stratify=df["Survived"]
)

special_cols = [
    "PassengerId",
    "Survived",
    "Name",
    "Ticket",
    "Cabin",
]

model = CreateModel(
    max_vars=5,
    special_cols=special_cols,
    selection_method="sfs",
    model_type="sklearn",
    gini_threshold=5.0,
    n_jobs=-1,
    random_state=42,
    class_weight="balanced",
    cv=3,
)
model.fit(train, train["Survived"])
test_proba = model.predict_proba(test[model.feature_names_])

print(model.coef_, model.intercept_)
print(model.feature_names_)

Detailed Documentation

WOETransformer

The WOETransformer converts categorical and numerical features into Weight of Evidence (WOE) values. WOE measures the predictive power of a feature by comparing the distribution of events and non-events.

WOETransformer(
    max_bins=10,               # Maximum number of bins for each feature
    min_pct_group=0.05,        # Minimum percentage of each bin
    n_jobs=1,                  # Number of parallel jobs
    prefix="WOE_",             # Prefix for transformed features
    merge_type="chi2",         # Bin merging strategy ('chi2', 'woe', 'monotonic')
    cat_features=None,         # List of categorical features
    special_cols=None,         # Columns to exclude from transformation
    cat_features_threshold=0,  # Threshold for auto-identifying categorical features
    diff_woe_threshold=0.05,   # Minimum WOE difference between bins
    safe_original_data=False,  # Whether to keep original features
    generate_features=False,   # Whether to generate new features
    max_generated_features=20  # Max number of generated features to select
)

Key Methods

fit(data, target): Calculates optimal bins and WOE values
transform(data): Converts features to WOE values
save_to_file(path): Saves binning information to a JSON file
load_woe_iv_dict(path): Loads binning information from a JSON file
refit(data, target): Updates WOE values for existing bins with new data

CreateModel

The CreateModel class combines feature selection, model training, and model evaluation:

CreateModel(
    selection_method='rfe',    # Feature selection method ('rfe', 'sfs', 'iv')
    model_type='sklearn',      # Model implementation ('sklearn', 'statsmodel')
    max_vars=None,             # Maximum number of features to select
    special_cols=None,         # Columns to include as-is
    unused_cols=None,          # Columns to exclude
    n_jobs=1,                  # Number of parallel jobs
    gini_threshold=5.0,        # Minimum Gini score to keep a feature
    iv_threshold=0.05,         # Minimum IV threshold for feature selection
    corr_threshold=0.5,        # Correlation threshold for feature selection
    min_pct_group=0.05,        # Minimum percentage for each group
    random_state=None,         # Random seed for reproducibility
    class_weight='balanced',   # Class weighting strategy
    direction='forward',       # Direction for sequential feature selection
    cv=3,                      # Cross-validation folds
    l1_exp_scale=4,            # Exponent scale for L1 regularization
    l1_grid_size=20,           # Grid size for L1 regularization search
    scoring='roc_auc'          # Performance metric
)

Key Methods

fit(data, target): Selects features and fits model
predict(data): Makes binary predictions
predict_proba(data): Returns probability predictions
save_reports(path): Saves model reports
generate_sql(encoder): Generates SQL for model deployment
save_scorecard(encoder, path, ...): Creates credit scorecard

Advanced Usage

Automated Feature Generation

WOE-Scoring can automatically generate and select high-quality features from your data:

encoder = WOETransformer(
    generate_features=True,    # Enable feature generation
    max_generated_features=20, # Select top 20 new features
    n_jobs=-1
)
encoder.fit(X, y)

This process:

Creates ratio features from all pairs of numeric columns
Calculates statistical aggregations (mean) for numeric columns grouped by categorical columns
Calculates the Gini score for all new features
Selects the top max_generated_features
Adds them to the dataset and proceeds with WOE binning

Generating SQL for Deployment

# First fit the WOE transformer and model
encoder = WOETransformer()
encoder.fit(train, train["target"])
train_woe = encoder.transform(train)

model = CreateModel()
model.fit(train_woe, train["target"])

# Generate SQL query for scoring
sql_query = model.generate_sql(encoder)

Creating a Scorecard

# Save a credit scorecard to Excel
model.save_scorecard(
    encoder=encoder,
    path="output_dir",
    base_scorecard_points=600,  # Base score
    odds=50,                    # Base odds
    points_to_double_odds=20    # Points to double the odds
)

Customizing Binning for Categorical Features

# Specify categorical features and their treatment
encoder = WOETransformer(
    cat_features=["education", "marital_status", "occupation"],
    max_bins=5,                 # Max bins for categorical features
    diff_woe_threshold=0.1,     # Merge bins with similar WOE values
    min_pct_group=0.05          # Minimum population percentage per bin
)

Performance Optimization

The library is optimized for performance with:

Vectorized operations for fast transformation
Parallel processing for binning and feature selection
Efficient memory usage for large datasets
Optimized algorithms for binning and feature selection

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.1.0

Feb 1, 2026

2.0.0

Feb 1, 2026

1.1.0

Feb 1, 2026

1.0.4

Feb 1, 2026

1.0.3

Feb 1, 2026

1.0.2

Jun 18, 2025

0.10.8

May 17, 2023

0.10.7

May 16, 2023

0.10.6

May 3, 2023

0.10.5

May 3, 2023

0.10.4

May 3, 2023

0.10.3

Apr 27, 2023

0.10.1

Apr 25, 2023

0.10.0

Apr 25, 2023

0.9.0

Apr 20, 2023

0.8.0

Apr 19, 2023

0.7.13

Oct 21, 2022

0.7.12

Oct 19, 2022

0.7.11

Oct 19, 2022

0.7.10

Oct 19, 2022

0.7.9

Oct 19, 2022

0.7.8

Oct 19, 2022

0.7.7

Sep 17, 2022

0.7.6

Apr 29, 2022

0.7.5

Apr 5, 2022

0.7.4

Apr 4, 2022

0.7.3

Apr 4, 2022

0.7.2

Apr 4, 2022

0.6.1

Mar 31, 2022

0.6.0

Mar 31, 2022

0.5.4

Mar 29, 2022

0.5.3

Mar 29, 2022

0.5.2

Mar 29, 2022

0.5.1

Mar 29, 2022

0.4.0

Mar 21, 2022

0.3.4

Oct 20, 2021

0.3.3

Oct 18, 2021

0.3.2

Oct 18, 2021

0.3.1

Sep 29, 2021

0.2.3

May 6, 2021

0.2.2

Apr 16, 2021

0.2.1

Apr 12, 2021

0.1.5

Apr 7, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

woe_scoring-2.1.0.tar.gz (30.9 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

woe_scoring-2.1.0-py3-none-any.whl (33.0 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file woe_scoring-2.1.0.tar.gz.

File metadata

Download URL: woe_scoring-2.1.0.tar.gz
Upload date: Feb 1, 2026
Size: 30.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for woe_scoring-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`17e9f88e73ff69f7c37907db88f5fcea61dd7a6e4f5402f7356557c29cc7809f`
MD5	`61cdafb4c3173a355478d2c0fdf0ed12`
BLAKE2b-256	`41f78648adc8d47688f1fe32b641e9f75a6dc466b33be6296e88476df6748ede`

See more details on using hashes here.

File details

Details for the file woe_scoring-2.1.0-py3-none-any.whl.

File metadata

Download URL: woe_scoring-2.1.0-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 33.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for woe_scoring-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab457f8730f532e04eb9b27a8af4738e22ce791580d95d4349c903587b356173`
MD5	`2db85b1d748af9e037f7c26beed145a3`
BLAKE2b-256	`e6a47f083316c0fe9258f6b26004b1d4f48cde9874174f56a4debdfb5dc55fa0`

See more details on using hashes here.

woe_scoring 2.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

WOE-Scoring

Features

Installation

Quickstart

Detailed Documentation

WOETransformer

Key Methods

CreateModel

Key Methods

Advanced Usage

Automated Feature Generation

Generating SQL for Deployment

Creating a Scorecard

Customizing Binning for Categorical Features

Performance Optimization

License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes