AutoFE - Playground: Automatic Feature Engineering & Selection for Kaggle Playground Competitions

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Thomas2009

These details have not been verified by PyPI

Project description

🧪 AutoFE-PG

Automatic Feature Engineering & Selection for Kaggle Playground Competitions

Python 3.8+ License: MIT Version

AutoFE-PG is a powerful library that automatically generates, evaluates, and selects engineered features to boost your tabular ML models — with zero target leakage.

Version 0.3.0 is a complete refactoring focused on general-purpose strategies that work across any tabular competition, featuring advanced binning, digit-based features, Cyclical encoding, Weight of Evidence, and Genetic Programming interactions.

✨ Key Features

Feature	Description
Genetic Programming	Generates complex non-linear interactions using `gplearn`
Digit-Based Logic	Extracts integer and decimal positions; creates digit-cross-category interactions
Target Representation	OOF Target Aggregation (mean, std, skew), WoE, and Entropy features
Cyclical Encoding	Sine/Cosine transformations for periodic numerical features
Advanced Binning	Both Quantile (qcut) and Equal-width (cut) discretization
External Signal Injection	Inject historical Priors, WoE, and Entropy from original datasets
Zero Target Leakage	All target-dependent features use strict out-of-fold (OOF) strategies
Greedy Selection	Forward selection keeps only features that improve CV score
GPU Acceleration	Built-in support for XGBoost GPU engines

🚀 Quick Start

Installation

pip install autofepg
# Optional: for Genetic Programming features
pip install gplearn

Basic Usage

import pandas as pd
from autofepg import select_features

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

X_train = train.drop(columns=["id", "target"])
y_train = train["target"]
X_test = test.drop(columns=["id"])

result = select_features(
    X_train, y_train, X_test,
    task="classification",
    time_budget=3600  # 1 hour limit
)

X_train_new = result["X_train"]
X_test_new = result["X_test"]

print(f"Features added: {len(result['selected_features'])}")
print(f"CV Improvement: {result['base_score']:.6f} -> {result['best_score']:.6f}")

Injecting Historical Signals (Original Data)

If you have access to a "real world" dataset (common in Kaggle Playground synthetic competitions), you can inject its signals without leakage:

result = select_features(
    X_train, y_train, X_test,
    original_df=original_df,
    original_target=original_target,
    task="classification"
)

📖 Feature Strategies (v0.3.0)

1. Digits & Discretization

Digit Extraction: Integer positions (units, tens, etc.) and decimal positions.
Digit Interactions: Column-wise and cross-column interactions between digits.
Binning: Discretize continuous variables via Quantile (qcut) or Equal-width (cut) bins.
Rounding: Rounding to various decimal places or magnitudes to find structural splits.

2. Specialized Encoding

Cyclical Encoding: Sin/Cos transforms for periodic data.
Target Encoding (OOF): Out-of-fold mean target per category.
Weight of Evidence (WoE): OOF WoE scores for binary classification.
Entropy: OOF target entropy per value group.
OOF Aggregation: Mean, Std, and Skew of the target grouped by feature values.

3. Non-Linear Interactions

Genetic Programming: Evolves mathematical expressions using the base features (requires gplearn).
Pair Interactions: Categorical label-encoding of bigrams.
Numerical Products: NaN-safe products of bigram numerical features.
Digit × Category: Target encoding on the interaction of a column's digit and another category.

4. External Data Signals

Bayesian Priors: Historical P(target|value) from the original dataset.
External WoE: WoE scores pre-computed from the original dataset.
External Entropy: Group purity/impurity derived from the original dataset.

⚙️ Configuration

Parameter	Default	Description
`task`	`"auto"`	`"classification"`, `"regression"`, or `"auto"`
`n_folds`	`5`	Number of CV folds for evaluation
`time_budget`	`None`	Max wall-clock seconds for the search
`improvement_threshold`	`1e-7`	Min score delta to keep a feature
`sample`	`None`	Rows to sample for evaluation (speeds up search)
`gp_generations`	`5`	Evolution steps for Genetic Programming
`gp_n_components`	`5`	Max GP features to potentially keep
`original_df`	`None`	External dataset for Priors/WoE/Entropy

📝 Changelog

v0.3.0 (Current)

Refactoring: Removed competition-specific features (Domain Alignment, Dataset Frequency, Rarity).
New Features: Cyclical Features, OOF/External WoE, OOF/External Entropy, Genetic Programming (gplearn).
Enhanced Digits: Added Decimal Digit extraction.
Enhanced Aggregation: Added Skewness support to OOF Target Aggregation.
Simplified API: Decoupled from specific dataset patterns; focused on universal engineering.

v0.2.0

Added original dataset support (Domain Alignment, Bayesian Priors).
Introduced Cross-Dataset Frequency and Rarity features.

📄 License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Thomas2009

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Mar 7, 2026

0.2.0

Feb 19, 2026

0.1.3

Feb 16, 2026

0.1.2

Feb 16, 2026

0.1.1

Feb 16, 2026

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autofepg-0.3.0.tar.gz (26.7 kB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autofepg-0.3.0-py3-none-any.whl (23.7 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file autofepg-0.3.0.tar.gz.

File metadata

Download URL: autofepg-0.3.0.tar.gz
Upload date: Mar 7, 2026
Size: 26.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autofepg-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`2cb2f0586e448f0c8c2b0c830cda2de51779d39626ab022ccb176dd1a62cf2eb`
MD5	`448db0ffcc06b9a637dc492f2ba5f319`
BLAKE2b-256	`98e71e0132da79baefd7e4e7bde01bbd8d0fa485142caad43d071b95c67416df`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autofepg-0.3.0.tar.gz:

Publisher: publish.yml on thomastschinkel/autofepg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autofepg-0.3.0.tar.gz
- Subject digest: 2cb2f0586e448f0c8c2b0c830cda2de51779d39626ab022ccb176dd1a62cf2eb
- Sigstore transparency entry: 1056206958
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: thomastschinkel/autofepg@11a59214aebca380c6953c3291253669b97b5d6e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/thomastschinkel
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@11a59214aebca380c6953c3291253669b97b5d6e
- Trigger Event: release

File details

Details for the file autofepg-0.3.0-py3-none-any.whl.

File metadata

Download URL: autofepg-0.3.0-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 23.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autofepg-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`884a571752c0fd6a6016745137d10da9d3e048d760b8a69d042cd7969e47cf0b`
MD5	`73e9a56f69c5d0322858e39012cd8fe5`
BLAKE2b-256	`ebb9522e5d16fa7627c9aae1de8004ce21d7b1ffc59668c2fb90d66307466e3b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autofepg-0.3.0-py3-none-any.whl:

Publisher: publish.yml on thomastschinkel/autofepg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autofepg-0.3.0-py3-none-any.whl
- Subject digest: 884a571752c0fd6a6016745137d10da9d3e048d760b8a69d042cd7969e47cf0b
- Sigstore transparency entry: 1056207059
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: thomastschinkel/autofepg@11a59214aebca380c6953c3291253669b97b5d6e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/thomastschinkel
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@11a59214aebca380c6953c3291253669b97b5d6e
- Trigger Event: release

autofepg 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🧪 AutoFE-PG

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

Injecting Historical Signals (Original Data)

📖 Feature Strategies (v0.3.0)

1. Digits & Discretization

2. Specialized Encoding

3. Non-Linear Interactions

4. External Data Signals

⚙️ Configuration

📝 Changelog

v0.3.0 (Current)

v0.2.0

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance