Skip to main content

Automatic Discretization of Features with Optimal Target Association

Project description

AutoCarver Logo

PyPI Python License SPEC 0 Docs Tests Coverage

AutoCarver automates supervised feature discretization (binning) to maximize statistical association with your target — using Tschuprow's T or Cramér's V — and validates the chosen bins against a held-out dev set. It supports binary classification, multiclass classification, and regression, and is widely used for credit scoring, fraud detection, and risk modeling.

Install

pip install autocarver

Quick Start

Binary classification on the Titanic dataset:

import pandas as pd
from sklearn.model_selection import train_test_split
from AutoCarver import BinaryCarver, Features

# 1. Load data
url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
data = pd.read_csv(url)
target = "Survived"

# 2. Train / dev split, stratified on the target
train, dev = train_test_split(data, test_size=0.33, random_state=42, stratify=data[target])

# 3. Declare features by type
features = Features(
    categoricals=["Sex"],
    quantitatives=["Age", "Fare", "Siblings/Spouses Aboard", "Parents/Children Aboard"],
    ordinals={"Pclass": ["1", "2", "3"]},
)

# 4. Fit the carver (dev set drives the robustness checks)
carver = BinaryCarver(features=features, min_freq=0.05, max_n_mod=5)
train_processed = carver.fit_transform(train, train[target], X_dev=dev, y_dev=dev[target])
dev_processed = carver.transform(dev)

# 5. Inspect the carved buckets, target rate, and association
print(carver.summary)

# 6. Persist for later use
carver.save("titanic_carver.json")
# carver = BinaryCarver.load("titanic_carver.json")

For multiclass classification use MulticlassCarver; for regression use ContinuousCarver — the API is identical. To pre-select features by target association and inter-feature redundancy, pipe the carved output through ClassificationSelector or RegressionSelector.

Why AutoCarver?

  • Optimal supervised binning — maximizes Tschuprow's T (default) or Cramér's V between each feature and the target instead of relying on hand-tuned quantiles.
  • Robust to data drift — every candidate bin combination is validated on a dev set, rejecting any whose target rates flip or whose buckets fall below min_freq.
  • Interpretable buckets — human-readable boundaries you can audit, document, and ship to a scorecard.
  • Dimensionality reduction — groups under-represented modalities and caps bins per feature (max_n_mod), which is especially useful before one-hot encoding.
  • Feature pre-selectionClassificationSelector / RegressionSelector rank features by target association and filter on inter-feature correlation.

Documentation

Full reference, tutorials, and end-to-end notebook examples on ReadTheDocs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autocarver-7.1.8.tar.gz (70.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autocarver-7.1.8-py3-none-any.whl (106.0 kB view details)

Uploaded Python 3

File details

Details for the file autocarver-7.1.8.tar.gz.

File metadata

  • Download URL: autocarver-7.1.8.tar.gz
  • Upload date:
  • Size: 70.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autocarver-7.1.8.tar.gz
Algorithm Hash digest
SHA256 a8db63d25d46f395d9b5d3ad2e8ac946a42bf506c2975fabba81728ab5f3cd9e
MD5 8aee0b1460b55c3ad4bbf2ab468cf725
BLAKE2b-256 c7bb6e2ed1016e54231d8c6914c21a910d6e3a66c3f6ae42375a277d4a0f7db8

See more details on using hashes here.

Provenance

The following attestation bundles were made for autocarver-7.1.8.tar.gz:

Publisher: release.yml on mdefrance/AutoCarver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file autocarver-7.1.8-py3-none-any.whl.

File metadata

  • Download URL: autocarver-7.1.8-py3-none-any.whl
  • Upload date:
  • Size: 106.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autocarver-7.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2956661f62b1b98baca077cdb1ebe9e3336df67dc94324947c598b29d7a1da64
MD5 15e578f406011cb1229c58948c2256d4
BLAKE2b-256 6c66019638d13ce0821fb2661af7add9a49b56a51b427bb60525cd3be6235f43

See more details on using hashes here.

Provenance

The following attestation bundles were made for autocarver-7.1.8-py3-none-any.whl:

Publisher: release.yml on mdefrance/AutoCarver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page