Automatic Discretization of Features with Optimal Target Association
Project description
AutoCarver automates supervised feature discretization (binning) to maximize statistical association with your target — using Tschuprow's T or Cramér's V — and validates the chosen bins against a held-out dev set. It supports binary classification, multiclass classification, and regression, and is widely used for credit scoring, fraud detection, and risk modeling.
Install
pip install autocarver
Quick Start
Binary classification on the Titanic dataset:
import pandas as pd
from sklearn.model_selection import train_test_split
from AutoCarver import BinaryCarver, Features
# 1. Load data
url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
data = pd.read_csv(url)
target = "Survived"
# 2. Train / dev split, stratified on the target
train, dev = train_test_split(data, test_size=0.33, random_state=42, stratify=data[target])
# 3. Declare features by type
features = Features(
categoricals=["Sex"],
quantitatives=["Age", "Fare", "Siblings/Spouses Aboard", "Parents/Children Aboard"],
ordinals={"Pclass": ["1", "2", "3"]},
)
# 4. Fit the carver (dev set drives the robustness checks)
carver = BinaryCarver(features=features, min_freq=0.05, max_n_mod=5)
train_processed = carver.fit_transform(train, train[target], X_dev=dev, y_dev=dev[target])
dev_processed = carver.transform(dev)
# 5. Inspect the carved buckets, target rate, and association
print(carver.summary)
# 6. Persist for later use
carver.save("titanic_carver.json")
# carver = BinaryCarver.load("titanic_carver.json")
For multiclass classification use MulticlassCarver; for regression use ContinuousCarver — the API is identical. To pre-select features by target association and inter-feature redundancy, pipe the carved output through ClassificationSelector or RegressionSelector.
Why AutoCarver?
- Optimal supervised binning — maximizes Tschuprow's T (default) or Cramér's V between each feature and the target instead of relying on hand-tuned quantiles.
- Robust to data drift — every candidate bin combination is validated on a dev set, rejecting any whose target rates flip or whose buckets fall below
min_freq. - Interpretable buckets — human-readable boundaries you can audit, document, and ship to a scorecard.
- Dimensionality reduction — groups under-represented modalities and caps bins per feature (
max_n_mod), which is especially useful before one-hot encoding. - Feature pre-selection —
ClassificationSelector/RegressionSelectorrank features by target association and filter on inter-feature correlation.
Documentation
Full reference, tutorials, and end-to-end notebook examples on ReadTheDocs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autocarver-7.1.10.tar.gz.
File metadata
- Download URL: autocarver-7.1.10.tar.gz
- Upload date:
- Size: 72.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63ee5f676286f57f3db0ed1441d703b51c7696880b5e3985c035db7f713a7188
|
|
| MD5 |
ce8642ba2dd490a37c6511d94f42224f
|
|
| BLAKE2b-256 |
ba7c7d5dd8fa9a70ecdea185d642d4553ec6534ea9d9356b6b31a498d3cc3517
|
Provenance
The following attestation bundles were made for autocarver-7.1.10.tar.gz:
Publisher:
release.yml on mdefrance/AutoCarver
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autocarver-7.1.10.tar.gz -
Subject digest:
63ee5f676286f57f3db0ed1441d703b51c7696880b5e3985c035db7f713a7188 - Sigstore transparency entry: 1569761364
- Sigstore integration time:
-
Permalink:
mdefrance/AutoCarver@530592650acb04bbad4ba6ee5612e79ef4ccce01 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mdefrance
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@530592650acb04bbad4ba6ee5612e79ef4ccce01 -
Trigger Event:
pull_request
-
Statement type:
File details
Details for the file autocarver-7.1.10-py3-none-any.whl.
File metadata
- Download URL: autocarver-7.1.10-py3-none-any.whl
- Upload date:
- Size: 108.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e27eacfebe199def715699c78304c9ba87ec179c9221133777f211f93b9543d
|
|
| MD5 |
b0e79481a8190ab6a811ad69c8d25dd2
|
|
| BLAKE2b-256 |
e42aec51a49d7cf9e5481acbe0dc0fcb8dee34dd6930d79f1607836d9b8b3b09
|
Provenance
The following attestation bundles were made for autocarver-7.1.10-py3-none-any.whl:
Publisher:
release.yml on mdefrance/AutoCarver
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autocarver-7.1.10-py3-none-any.whl -
Subject digest:
8e27eacfebe199def715699c78304c9ba87ec179c9221133777f211f93b9543d - Sigstore transparency entry: 1569761753
- Sigstore integration time:
-
Permalink:
mdefrance/AutoCarver@530592650acb04bbad4ba6ee5612e79ef4ccce01 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mdefrance
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@530592650acb04bbad4ba6ee5612e79ef4ccce01 -
Trigger Event:
pull_request
-
Statement type: