Intelligent automatic feature engineering for tabular ML.
Project description
AutoFeature
Intelligent automatic feature engineering for tabular ML.
What is AutoFeature?
AutoFeature is a scikit-learn compatible library that automates the most impactful parts of tabular feature engineering:
| Component | What it does |
|---|---|
AutoFeatureEngineer |
Detects and generates useful interaction features (products, ratios, differences) using importance-guided search |
TargetAwareSelector |
Selects features by mutual information with the target — not just variance |
CyclicalEncoder |
Encodes periodic variables (hour, month, day) with sin/cos to preserve cyclical structure |
SmartCategoricalEncoder |
Automatically picks the right encoding per column: label / one-hot / target encoding |
LeakageDetector |
Warns about features that suspiciously correlate with the target |
AutoFeaturePipeline |
Runs everything end-to-end in one call |
Installation
pip install sufyaan-autofeature
Requires Python ≥ 3.8, scikit-learn ≥ 1.0, pandas ≥ 1.3, numpy ≥ 1.21.
Quickstart
Full Pipeline (recommended)
import pandas as pd
from autofeature import AutoFeaturePipeline
pipeline = AutoFeaturePipeline(
cyclical_columns={"hour": 24, "month": 12},
max_interaction_features=15,
k=20, # keep top 20 features
task="classification",
verbose=True,
)
X_train_out = pipeline.fit_transform(X_train, y_train)
X_test_out = pipeline.transform(X_test)
print(pipeline.get_summary())
Individual Components
from autofeature import (
AutoFeatureEngineer,
TargetAwareSelector,
CyclicalEncoder,
SmartCategoricalEncoder,
LeakageDetector,
)
# 1. Detect leakage
ld = LeakageDetector()
ld.fit(X_train, y_train)
X_train = ld.remove_leaky(X_train)
# 2. Encode categoricals automatically
enc = SmartCategoricalEncoder()
X_train = enc.fit_transform(X_train, y_train)
X_test = enc.transform(X_test)
# 3. Encode cyclical columns
cyc = CyclicalEncoder(columns={"hour": 24, "day_of_week": 7})
X_train = cyc.fit_transform(X_train)
X_test = cyc.transform(X_test)
# 4. Generate interaction features
afe = AutoFeatureEngineer(max_interaction_features=20)
X_train = afe.fit_transform(X_train, y_train)
X_test = afe.transform(X_test)
# See what interactions were selected
print(afe.get_interaction_report())
# 5. Select top features by target mutual information
sel = TargetAwareSelector(k=15)
X_train = sel.fit_transform(X_train, y_train)
X_test = sel.transform(X_test)
print(sel.get_feature_scores())
API Reference
AutoFeatureEngineer
AutoFeatureEngineer(
max_interaction_features=20, # max interactions to add
interaction_types=["product", "ratio", "difference"],
interaction_threshold=0.01, # minimum importance gain
n_estimators=50, # trees in internal evaluator
task="auto", # "classification" | "regression" | "auto"
random_state=42,
verbose=False,
)
Methods: fit(X, y), transform(X), fit_transform(X, y), get_interaction_report()
TargetAwareSelector
TargetAwareSelector(
k=10, # number of features to keep, or "all"
task="auto",
threshold=None, # MI threshold (overrides k if set)
random_state=42,
)
Methods: fit(X, y), transform(X), fit_transform(X, y), get_feature_scores()
CyclicalEncoder
CyclicalEncoder(
columns={"hour": 24, "month": 12}, # column → period mapping
drop_original=True,
)
Produces {col}_sin and {col}_cos columns.
SmartCategoricalEncoder
SmartCategoricalEncoder(
max_onehot_cardinality=10, # >10 unique values → target encoding
smoothing=1.0, # regularisation for target encoding
task="auto",
handle_unknown="mean", # or "zero"
)
LeakageDetector
LeakageDetector(
correlation_threshold=0.95,
name_patterns=["label", "target", "outcome"],
verbose=True,
)
Methods: fit(X, y), remove_leaky(X), get_report()
AutoFeaturePipeline
AutoFeaturePipeline(
cyclical_columns=None,
max_interaction_features=20,
k=20,
task="auto",
detect_leakage=True,
remove_leaky=False,
random_state=42,
verbose=False,
)
Methods: fit(X, y), transform(X), fit_transform(X, y), get_summary()
Why AutoFeature?
- Target-aware: selections and interactions are evaluated against the actual prediction target, not generic statistics
- Scikit-learn compatible: works with
Pipeline,GridSearchCV, and any estimator - Production-safe: fit on train, transform on test — no leakage from the transform step
- Interpretable: every decision (which interaction, which encoding, which feature) is inspectable
Contributing
Pull requests are welcome. For major changes, please open an issue first.
git clone https://github.com/yourusername/autofeature
cd autofeature
pip install -e ".[dev]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sufyaan_autofeature-0.1.1.tar.gz.
File metadata
- Download URL: sufyaan_autofeature-0.1.1.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
366b30f1d56338d110d6ee46abf4b7e986fa07c365bb8fecca76423a1746e71a
|
|
| MD5 |
5f864353fc5b4e499232258151798d8f
|
|
| BLAKE2b-256 |
e7dd8a13171270573912b8b4f9bc2265f15672023b4a3c9562895dd2e894d441
|
Provenance
The following attestation bundles were made for sufyaan_autofeature-0.1.1.tar.gz:
Publisher:
publish.yml on sufyaannn/sufyaan-autofeature
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sufyaan_autofeature-0.1.1.tar.gz -
Subject digest:
366b30f1d56338d110d6ee46abf4b7e986fa07c365bb8fecca76423a1746e71a - Sigstore transparency entry: 2063555427
- Sigstore integration time:
-
Permalink:
sufyaannn/sufyaan-autofeature@9d42815ecd16da48619b8430011121b878a35bd0 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sufyaannn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9d42815ecd16da48619b8430011121b878a35bd0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sufyaan_autofeature-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sufyaan_autofeature-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe49ba0b215b29de623f8be42aeb1f2611c6180c056d5288b968c2527c8a5a1a
|
|
| MD5 |
6b1d43bfd0581b24f78498de6776e790
|
|
| BLAKE2b-256 |
39266768913a08eddf2fba3b60baa31ae55b03e63e83909ee4890a21aa8f2a2c
|
Provenance
The following attestation bundles were made for sufyaan_autofeature-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on sufyaannn/sufyaan-autofeature
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sufyaan_autofeature-0.1.1-py3-none-any.whl -
Subject digest:
fe49ba0b215b29de623f8be42aeb1f2611c6180c056d5288b968c2527c8a5a1a - Sigstore transparency entry: 2063555448
- Sigstore integration time:
-
Permalink:
sufyaannn/sufyaan-autofeature@9d42815ecd16da48619b8430011121b878a35bd0 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sufyaannn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9d42815ecd16da48619b8430011121b878a35bd0 -
Trigger Event:
release
-
Statement type: