Weight Of Evidence Transformer and LogisticRegression model with scikit-learn API
Project description
WOE-Scoring
Monotone Weight Of Evidence (WOE) Transformer and LogisticRegression model with scikit-learn API. Optimized for performance and stability.
Features
- WOE Transformation: Convert categorical and numerical features to Weight of Evidence encoding
- Automated Feature Selection: Multiple algorithms for optimal feature selection
- Binning Strategies: Smart binning with monotonicity constraints
- Sklearn Compatibility: Follows scikit-learn's API standards
- Performance Optimized: Parallel processing and vectorized operations
- SQL Export: Generate SQL for model deployment
- Scorecard Generation: Create credit scorecards with customizable scaling
Installation
pip install woe-scoring
Quickstart
- Install the package:
pip install woe-scoring
- Use WOETransformer:
import pandas as pd
from woe_scoring import WOETransformer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("titanic_data.csv")
train, test = train_test_split(
df, test_size=0.3, random_state=42, stratify=df["Survived"]
)
special_cols = [
"PassengerId",
"Survived",
"Name",
"Ticket",
"Cabin",
]
cat_cols = [
"Pclass",
"Sex",
"SibSp",
"Parch",
"Embarked",
]
encoder = WOETransformer(
max_bins=8,
min_pct_group=0.1,
diff_woe_threshold=0.1,
cat_features=cat_cols,
special_cols=special_cols,
n_jobs=-1,
merge_type="chi2",
)
encoder.fit(train, train["Survived"])
encoder.save_to_file("train_dict.json")
encoder.load_woe_iv_dict("train_dict.json")
encoder.refit(train, train["Survived"])
enc_train = encoder.transform(train)
enc_test = encoder.transform(test)
model = LogisticRegression()
model.fit(enc_train, train["Survived"])
test_proba = model.predict_proba(enc_test)[:, 1]
- Use CreateModel:
import pandas as pd
from woe_scoring import CreateModel
from sklearn.model_selection import train_test_split
df = pd.read_csv("titanic_data.csv")
train, test = train_test_split(
df, test_size=0.3, random_state=42, stratify=df["Survived"]
)
special_cols = [
"PassengerId",
"Survived",
"Name",
"Ticket",
"Cabin",
]
model = CreateModel(
max_vars=5,
special_cols=special_cols,
selection_method="sfs",
model_type="sklearn",
gini_threshold=5.0,
n_jobs=-1,
random_state=42,
class_weight="balanced",
cv=3,
)
model.fit(train, train["Survived"])
test_proba = model.predict_proba(test[model.feature_names_])
print(model.coef_, model.intercept_)
print(model.feature_names_)
Detailed Documentation
WOETransformer
The WOETransformer converts categorical and numerical features into Weight of Evidence (WOE) values. WOE measures the predictive power of a feature by comparing the distribution of events and non-events.
WOETransformer(
max_bins=10, # Maximum number of bins for each feature
min_pct_group=0.05, # Minimum percentage of each bin
n_jobs=1, # Number of parallel jobs
prefix="WOE_", # Prefix for transformed features
merge_type="chi2", # Bin merging strategy ('chi2', 'woe', 'monotonic')
cat_features=None, # List of categorical features
special_cols=None, # Columns to exclude from transformation
cat_features_threshold=0, # Threshold for auto-identifying categorical features
diff_woe_threshold=0.05, # Minimum WOE difference between bins
safe_original_data=False # Whether to keep original features
)
Key Methods
fit(data, target): Calculates optimal bins and WOE valuestransform(data): Converts features to WOE valuessave_to_file(path): Saves binning information to a JSON fileload_woe_iv_dict(path): Loads binning information from a JSON filerefit(data, target): Updates WOE values for existing bins with new data
CreateModel
The CreateModel class combines feature selection, model training, and model evaluation:
CreateModel(
selection_method='rfe', # Feature selection method ('rfe', 'sfs', 'iv')
model_type='sklearn', # Model implementation ('sklearn', 'statsmodel')
max_vars=None, # Maximum number of features to select
special_cols=None, # Columns to include as-is
unused_cols=None, # Columns to exclude
n_jobs=1, # Number of parallel jobs
gini_threshold=5.0, # Minimum Gini score to keep a feature
iv_threshold=0.05, # Minimum IV threshold for feature selection
corr_threshold=0.5, # Correlation threshold for feature selection
min_pct_group=0.05, # Minimum percentage for each group
random_state=None, # Random seed for reproducibility
class_weight='balanced', # Class weighting strategy
direction='forward', # Direction for sequential feature selection
cv=3, # Cross-validation folds
l1_exp_scale=4, # Exponent scale for L1 regularization
l1_grid_size=20, # Grid size for L1 regularization search
scoring='roc_auc' # Performance metric
)
Key Methods
fit(data, target): Selects features and fits modelpredict(data): Makes binary predictionspredict_proba(data): Returns probability predictionssave_reports(path): Saves model reportsgenerate_sql(encoder): Generates SQL for model deploymentsave_scorecard(encoder, path, ...): Creates credit scorecard
Advanced Usage
Generating SQL for Deployment
# First fit the WOE transformer and model
encoder = WOETransformer()
encoder.fit(train, train["target"])
train_woe = encoder.transform(train)
model = CreateModel()
model.fit(train_woe, train["target"])
# Generate SQL query for scoring
sql_query = model.generate_sql(encoder)
Creating a Scorecard
# Save a credit scorecard to Excel
model.save_scorecard(
encoder=encoder,
path="output_dir",
base_scorecard_points=600, # Base score
odds=50, # Base odds
points_to_double_odds=20 # Points to double the odds
)
Customizing Binning for Categorical Features
# Specify categorical features and their treatment
encoder = WOETransformer(
cat_features=["education", "marital_status", "occupation"],
max_bins=5, # Max bins for categorical features
diff_woe_threshold=0.1, # Merge bins with similar WOE values
min_pct_group=0.05 # Minimum population percentage per bin
)
Performance Optimization
The library is optimized for performance with:
- Vectorized operations for fast transformation
- Parallel processing for binning and feature selection
- Efficient memory usage for large datasets
- Optimized algorithms for binning and feature selection
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file woe_scoring-1.1.0.tar.gz.
File metadata
- Download URL: woe_scoring-1.1.0.tar.gz
- Upload date:
- Size: 29.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.13.0 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11b39f9b5dde7a391faabcaecd780ac0e3960708da4f3c7c1f0c2aa4a8b22db7
|
|
| MD5 |
06004b7ee95a0ca2200fce3d86b6b30b
|
|
| BLAKE2b-256 |
e4842404282b8f381f2fc4d114de67d8be94365738fd013804517b200dfd539d
|
File details
Details for the file woe_scoring-1.1.0-py3-none-any.whl.
File metadata
- Download URL: woe_scoring-1.1.0-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.13.0 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1529f60b26ca39eaf42658a1630aea0cf224e80fda619f58961ad2c87920da6e
|
|
| MD5 |
bf1b4d702f9621c064219ae47684d9ec
|
|
| BLAKE2b-256 |
8687fee43d4ef559a3dd285102a3e52165bfc669e41be4922b4caa709be28b7f
|