Skip to main content

Intelligent automatic feature engineering for tabular ML.

Project description

AutoFeature

Intelligent automatic feature engineering for tabular ML.

PyPI version Python License: MIT

What is AutoFeature?

AutoFeature is a scikit-learn compatible library that automates the most impactful parts of tabular feature engineering:

Component What it does
AutoFeatureEngineer Detects and generates useful interaction features (products, ratios, differences) using importance-guided search
TargetAwareSelector Selects features by mutual information with the target — not just variance
CyclicalEncoder Encodes periodic variables (hour, month, day) with sin/cos to preserve cyclical structure
SmartCategoricalEncoder Automatically picks the right encoding per column: label / one-hot / target encoding
LeakageDetector Warns about features that suspiciously correlate with the target
AutoFeaturePipeline Runs everything end-to-end in one call

Installation

pip install sufyaan-autofeature

Requires Python ≥ 3.8, scikit-learn ≥ 1.0, pandas ≥ 1.3, numpy ≥ 1.21.

Quickstart

Full Pipeline (recommended)

import pandas as pd
from autofeature import AutoFeaturePipeline

pipeline = AutoFeaturePipeline(
    cyclical_columns={"hour": 24, "month": 12},
    max_interaction_features=15,
    k=20,                  # keep top 20 features
    task="classification",
    verbose=True,
)

X_train_out = pipeline.fit_transform(X_train, y_train)
X_test_out  = pipeline.transform(X_test)

print(pipeline.get_summary())

Individual Components

from autofeature import (
    AutoFeatureEngineer,
    TargetAwareSelector,
    CyclicalEncoder,
    SmartCategoricalEncoder,
    LeakageDetector,
)

# 1. Detect leakage
ld = LeakageDetector()
ld.fit(X_train, y_train)
X_train = ld.remove_leaky(X_train)

# 2. Encode categoricals automatically
enc = SmartCategoricalEncoder()
X_train = enc.fit_transform(X_train, y_train)
X_test  = enc.transform(X_test)

# 3. Encode cyclical columns
cyc = CyclicalEncoder(columns={"hour": 24, "day_of_week": 7})
X_train = cyc.fit_transform(X_train)
X_test  = cyc.transform(X_test)

# 4. Generate interaction features
afe = AutoFeatureEngineer(max_interaction_features=20)
X_train = afe.fit_transform(X_train, y_train)
X_test  = afe.transform(X_test)

# See what interactions were selected
print(afe.get_interaction_report())

# 5. Select top features by target mutual information
sel = TargetAwareSelector(k=15)
X_train = sel.fit_transform(X_train, y_train)
X_test  = sel.transform(X_test)

print(sel.get_feature_scores())

API Reference

AutoFeatureEngineer

AutoFeatureEngineer(
    max_interaction_features=20,   # max interactions to add
    interaction_types=["product", "ratio", "difference"],
    interaction_threshold=0.01,    # minimum importance gain
    n_estimators=50,               # trees in internal evaluator
    task="auto",                   # "classification" | "regression" | "auto"
    random_state=42,
    verbose=False,
)

Methods: fit(X, y), transform(X), fit_transform(X, y), get_interaction_report()

TargetAwareSelector

TargetAwareSelector(
    k=10,             # number of features to keep, or "all"
    task="auto",
    threshold=None,   # MI threshold (overrides k if set)
    random_state=42,
)

Methods: fit(X, y), transform(X), fit_transform(X, y), get_feature_scores()

CyclicalEncoder

CyclicalEncoder(
    columns={"hour": 24, "month": 12},  # column → period mapping
    drop_original=True,
)

Produces {col}_sin and {col}_cos columns.

SmartCategoricalEncoder

SmartCategoricalEncoder(
    max_onehot_cardinality=10,   # >10 unique values → target encoding
    smoothing=1.0,               # regularisation for target encoding
    task="auto",
    handle_unknown="mean",       # or "zero"
)

LeakageDetector

LeakageDetector(
    correlation_threshold=0.95,
    name_patterns=["label", "target", "outcome"],
    verbose=True,
)

Methods: fit(X, y), remove_leaky(X), get_report()

AutoFeaturePipeline

AutoFeaturePipeline(
    cyclical_columns=None,
    max_interaction_features=20,
    k=20,
    task="auto",
    detect_leakage=True,
    remove_leaky=False,
    random_state=42,
    verbose=False,
)

Methods: fit(X, y), transform(X), fit_transform(X, y), get_summary()

Why AutoFeature?

  • Target-aware: selections and interactions are evaluated against the actual prediction target, not generic statistics
  • Scikit-learn compatible: works with Pipeline, GridSearchCV, and any estimator
  • Production-safe: fit on train, transform on test — no leakage from the transform step
  • Interpretable: every decision (which interaction, which encoding, which feature) is inspectable

Contributing

Pull requests are welcome. For major changes, please open an issue first.

git clone https://github.com/yourusername/autofeature
cd autofeature
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sufyaan_autofeature-0.1.1.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sufyaan_autofeature-0.1.1-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file sufyaan_autofeature-0.1.1.tar.gz.

File metadata

  • Download URL: sufyaan_autofeature-0.1.1.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sufyaan_autofeature-0.1.1.tar.gz
Algorithm Hash digest
SHA256 366b30f1d56338d110d6ee46abf4b7e986fa07c365bb8fecca76423a1746e71a
MD5 5f864353fc5b4e499232258151798d8f
BLAKE2b-256 e7dd8a13171270573912b8b4f9bc2265f15672023b4a3c9562895dd2e894d441

See more details on using hashes here.

Provenance

The following attestation bundles were made for sufyaan_autofeature-0.1.1.tar.gz:

Publisher: publish.yml on sufyaannn/sufyaan-autofeature

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sufyaan_autofeature-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sufyaan_autofeature-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fe49ba0b215b29de623f8be42aeb1f2611c6180c056d5288b968c2527c8a5a1a
MD5 6b1d43bfd0581b24f78498de6776e790
BLAKE2b-256 39266768913a08eddf2fba3b60baa31ae55b03e63e83909ee4890a21aa8f2a2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sufyaan_autofeature-0.1.1-py3-none-any.whl:

Publisher: publish.yml on sufyaannn/sufyaan-autofeature

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page