AutoFE - Playground: Automatic Feature Engineering & Selection for Kaggle Playground Competitions

These details have not been verified by PyPI

Project links

Project description

🧪 AutoFE-PG

Automatic Feature Engineering & Selection for Kaggle Playground Competitions

AutoFE-PG is a production-ready library that automatically generates, evaluates, and selects engineered features to boost your tabular ML models — with zero target leakage.

✨ Key Features

Feature	Description
Auto column detection	Automatically identifies categorical vs. numerical columns
20+ feature strategies	Target encoding, count encoding, digit extraction, arithmetic interactions, group statistics, and more
Zero target leakage	All target-dependent features use strict out-of-fold encoding
Greedy forward selection	Adds features one-by-one, keeping only those that improve CV score
Optional backward pruning	Removes redundant features after forward selection
GPU acceleration	Automatically uses XGBoost GPU if available
Time budget	Set a wall-clock limit; the search stops gracefully
Sampling support	Evaluate on a subsample for faster iteration
Custom XGBoost params	Pass your own hyperparameters
Score variance tracking	Reports mean ± std across folds
Classification & regression	Supports both tasks with auto-detection

🚀 Quick Start

Installation

pip install -e .

Or install dependencies directly:

pip install -r requirements.txt

Minimal Example

import pandas as pd
from autofepg import select_features

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

X_train = train.drop(columns=["id", "target"])
y_train = train["target"]
X_test = test.drop(columns=["id"])

result = select_features(
    X_train, y_train, X_test,
    task="classification",
    time_budget=3600,
)

X_train_new = result["X_train"]
X_test_new = result["X_test"]

print(f"Baseline AUC: {result['base_score']:.6f}")
print(f"Best AUC:     {result['best_score']:.6f}")
print(f"Features added: {len(result['selected_features'])}")

Using the Class API

from autofepg import AutoFE

autofe = AutoFE(
    task="classification",
    n_folds=5,
    time_budget=1800,
    improvement_threshold=0.0001,
    backward_selection=True,
    sample=10000,
    xgb_params={
        "n_estimators": 1000,
        "max_depth": 8,
        "learning_rate": 0.05,
    },
)

X_train_new, X_test_new = autofe.fit_select(
    X_train, y_train, X_test,
    aux_target_cols=["employment_status", "debt_to_income_ratio"],
)

# Inspect results
print(autofe.get_selected_feature_names())
history_df = autofe.get_history()

📖 How It Works

1. Feature Generation

AutoFE-PG generates candidates from a hardcoded priority sequence ordered by expected impact:

Priority	Strategy	Leakage-free?
1	Target Encoding (single columns)	✅ OOF
2	Count Encoding (single columns)	✅ No target
3	Target Encoding on pairs	✅ OOF
4	Count Encoding on pairs	✅ No target
5	Frequency Encoding	✅ No target
6	Missing Indicators	✅ No target
7	TE with auxiliary targets	✅ OOF
8	Unary transforms (log, sqrt, etc.)	✅ No target
9	Arithmetic interactions	✅ No target
10	Polynomial features	✅ No target
11	Pairwise label-encoded interactions	✅ No target
12	TE/CE on digit features	✅ OOF / No target
13	Digit × Category TE	✅ OOF
14	Quantile binning	✅ No target
15	Raw digit extraction	✅ No target
16	Digit interactions	✅ No target
17	Rounding features	✅ No target
18	Num-to-Cat conversion	✅ No target
19	Group statistics & deviations	✅ No target

2. Greedy Forward Selection

Each candidate is evaluated by adding it to the current feature set and running XGBoost K-fold CV. A feature is kept only if it improves the score beyond the configured threshold.

3. Optional Backward Pruning

After forward selection, features are tested for removal. If removing a feature improves (or maintains) the score, it is permanently dropped.

⚙️ Configuration

Parameter	Type	Default	Description
task	str	"auto"	"classification", "regression", or "auto"
n_folds	int	5	Number of CV folds
time_budget	float	None	Max seconds (wall clock)
improvement_threshold	float	1e-7	Min score delta to keep a feature
sample	int	None	Subsample rows for faster CV
backward_selection	bool	False	Run backward pruning after forward
max_pair_cols	int	20	Max columns for pairwise features
max_digit_positions	int	4	Max digit positions to extract
xgb_params	dict	None	Custom XGBoost hyperparameters
metric_fn	callable	None	Custom metric (y_true, y_pred) -> float
metric_direction	str	None	"maximize" or "minimize"
random_state	int	42	Random seed
verbose	bool	True	Print progress

📊 Output

The select_features() function returns a dictionary:

{
    "X_train": pd.DataFrame,          # Augmented training data
    "X_test": pd.DataFrame,           # Augmented test data (if provided)
    "autofe": AutoFE,                 # Fitted AutoFE object
    "history": pd.DataFrame,          # Full selection history
    "selected_features": List[str],   # Names of kept features
    "base_score": float,              # Baseline CV mean
    "base_score_std": float,          # Baseline CV std
    "best_score": float,              # Final CV mean
    "best_score_std": float,          # Final CV std
}

🧪 Running Tests

pytest tests/ -v

📁 Project Structure

autofepg/
├── autofepg/
│   ├── __init__.py          # Public API
│   ├── utils.py             # GPU detection, task inference, metrics
│   ├── generators.py        # All feature generator classes
│   ├── builder.py           # FeatureCandidateBuilder
│   ├── engine.py            # XGBoost CV engine
│   └── core.py              # AutoFE class + select_features()
├── tests/
│   ├── __init__.py
│   └── test_autofepg.py     # Unit and integration tests
├── examples/
│   ├── example_classification.py
│   └── example_regression.py
├── .github/
│   └── workflows/
│       └── ci.yml
├── .gitignore
├── LICENSE
├── README.md
├── CHANGELOG.md
├── CONTRIBUTING.md
├── Makefile
├── pyproject.toml
├── setup.py
└── requirements.txt

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Mar 7, 2026

0.2.0

Feb 19, 2026

0.1.3

Feb 16, 2026

0.1.2

Feb 16, 2026

This version

0.1.1

Feb 16, 2026

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autofepg-0.1.1.tar.gz (26.3 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autofepg-0.1.1-py3-none-any.whl (23.2 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file autofepg-0.1.1.tar.gz.

File metadata

Download URL: autofepg-0.1.1.tar.gz
Upload date: Feb 16, 2026
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autofepg-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`ba6368705b7cc6dd48e8f339578cc90355c795355e0600d9e31cc34df34d0182`
MD5	`e90d745ea04cf5ac4229a90d5b7d9baa`
BLAKE2b-256	`829807bc529d4f69407a89e56a708a93fa7f2aec4adec5b794e59c440833369d`

See more details on using hashes here.

File details

Details for the file autofepg-0.1.1-py3-none-any.whl.

File metadata

Download URL: autofepg-0.1.1-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 23.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autofepg-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27600b26666e8e7a3ee7953d66cabfddb46e60309eedb55c838a9f0d882ea298`
MD5	`7bd7353a5ac9c06f9f1ea548359b58ae`
BLAKE2b-256	`241fbb81ace158edca03d4f752aad03277a94c2398fc815788157ae0123ad735`

See more details on using hashes here.

autofepg 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧪 AutoFE-PG

✨ Key Features

🚀 Quick Start

Installation

Minimal Example

Using the Class API

📖 How It Works

1. Feature Generation

2. Greedy Forward Selection

3. Optional Backward Pruning

⚙️ Configuration

📊 Output

🧪 Running Tests

📁 Project Structure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes