AutoFE - Playground: Automatic Feature Engineering & Selection for Kaggle Playground Competitions
Project description
๐งช AutoFE-PG
Automatic Feature Engineering & Selection for Kaggle Playground Competitions
AutoFE-PG is a production-ready library that automatically generates, evaluates, and selects engineered features to boost your tabular ML models โ with zero target leakage.
โจ Key Features
| Feature | Description |
|---|---|
| Auto column detection | Automatically identifies categorical vs. numerical columns |
| 20+ feature strategies | Target encoding, count encoding, digit extraction, arithmetic interactions, group statistics, and more |
| Zero target leakage | All target-dependent features use strict out-of-fold encoding |
| Greedy forward selection | Adds features one-by-one, keeping only those that improve CV score |
| Optional backward pruning | Removes redundant features after forward selection |
| GPU acceleration | Automatically uses XGBoost GPU if available |
| Time budget | Set a wall-clock limit; the search stops gracefully |
| Sampling support | Evaluate on a subsample for faster iteration |
| Custom XGBoost params | Pass your own hyperparameters |
| Score variance tracking | Reports mean ยฑ std across folds |
| Classification & regression | Supports both tasks with auto-detection |
๐ Quick Start
Installation
pip install autofepg .
Or install dependencies directly:
pip install -r requirements.txt
Minimal Example
import pandas as pd
from autofepg import select_features
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
X_train = train.drop(columns=["id", "target"])
y_train = train["target"]
X_test = test.drop(columns=["id"])
result = select_features(
X_train, y_train, X_test,
task="classification",
time_budget=3600,
)
X_train_new = result["X_train"]
X_test_new = result["X_test"]
print(f"Baseline AUC: {result['base_score']:.6f}")
print(f"Best AUC: {result['best_score']:.6f}")
print(f"Features added: {len(result['selected_features'])}")
Using the Class API
from autofepg import AutoFE
autofe = AutoFE(
task="classification",
n_folds=5,
time_budget=1800,
improvement_threshold=0.0001,
backward_selection=True,
sample=10000,
xgb_params={
"n_estimators": 1000,
"max_depth": 8,
"learning_rate": 0.05,
},
)
X_train_new, X_test_new = autofe.fit_select(
X_train, y_train, X_test,
aux_target_cols=["employment_status", "debt_to_income_ratio"],
)
# Inspect results
print(autofe.get_selected_feature_names())
history_df = autofe.get_history()
๐ How It Works
1. Feature Generation
AutoFE-PG generates candidates from a hardcoded priority sequence ordered by expected impact:
| Priority | Strategy | Leakage-free? |
|---|---|---|
| 1 | Target Encoding (single columns) | โ OOF |
| 2 | Count Encoding (single columns) | โ No target |
| 3 | Target Encoding on pairs | โ OOF |
| 4 | Count Encoding on pairs | โ No target |
| 5 | Frequency Encoding | โ No target |
| 6 | Missing Indicators | โ No target |
| 7 | TE with auxiliary targets | โ OOF |
| 8 | Unary transforms (log, sqrt, etc.) | โ No target |
| 9 | Arithmetic interactions | โ No target |
| 10 | Polynomial features | โ No target |
| 11 | Pairwise label-encoded interactions | โ No target |
| 12 | TE/CE on digit features | โ OOF / No target |
| 13 | Digit ร Category TE | โ OOF |
| 14 | Quantile binning | โ No target |
| 15 | Raw digit extraction | โ No target |
| 16 | Digit interactions | โ No target |
| 17 | Rounding features | โ No target |
| 18 | Num-to-Cat conversion | โ No target |
| 19 | Group statistics & deviations | โ No target |
2. Greedy Forward Selection
Each candidate is evaluated by adding it to the current feature set and running XGBoost K-fold CV. A feature is kept only if it improves the score beyond the configured threshold.
3. Optional Backward Pruning
After forward selection, features are tested for removal. If removing a feature improves (or maintains) the score, it is permanently dropped.
โ๏ธ Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| task | str | "auto" | "classification", "regression", or "auto" |
| n_folds | int | 5 | Number of CV folds |
| time_budget | float | None | Max seconds (wall clock) |
| improvement_threshold | float | 1e-7 | Min score delta to keep a feature |
| sample | int | None | Subsample rows for faster CV |
| backward_selection | bool | False | Run backward pruning after forward |
| max_pair_cols | int | 20 | Max columns for pairwise features |
| max_digit_positions | int | 4 | Max digit positions to extract |
| xgb_params | dict | None | Custom XGBoost hyperparameters |
| metric_fn | callable | None | Custom metric (y_true, y_pred) -> float |
| metric_direction | str | None | "maximize" or "minimize" |
| random_state | int | 42 | Random seed |
| verbose | bool | True | Print progress |
๐ Output
The select_features() function returns a dictionary:
{
"X_train": pd.DataFrame, # Augmented training data
"X_test": pd.DataFrame, # Augmented test data (if provided)
"autofe": AutoFE, # Fitted AutoFE object
"history": pd.DataFrame, # Full selection history
"selected_features": List[str], # Names of kept features
"base_score": float, # Baseline CV mean
"base_score_std": float, # Baseline CV std
"best_score": float, # Final CV mean
"best_score_std": float, # Final CV std
}
๐งช Running Tests
pytest tests/ -v
๐ Project Structure
autofepg/
โโโ autofepg/
โ โโโ __init__.py # Public API
โ โโโ utils.py # GPU detection, task inference, metrics
โ โโโ generators.py # All feature generator classes
โ โโโ builder.py # FeatureCandidateBuilder
โ โโโ engine.py # XGBoost CV engine
โ โโโ core.py # AutoFE class + select_features()
โโโ tests/
โ โโโ __init__.py
โ โโโ test_autofepg.py # Unit and integration tests
โโโ examples/
โ โโโ example_classification.py
โ โโโ example_regression.py
โโโ .github/
โ โโโ workflows/
โ โโโ ci.yml
โโโ .gitignore
โโโ LICENSE
โโโ README.md
โโโ CHANGELOG.md
โโโ CONTRIBUTING.md
โโโ Makefile
โโโ pyproject.toml
โโโ setup.py
โโโ requirements.txt
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autofepg-0.1.3.tar.gz.
File metadata
- Download URL: autofepg-0.1.3.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e955aee95ad494376a021b3eaed632931bc4579a331d86d6f1ab38b810cbb06
|
|
| MD5 |
05cf2905d2538faaa2c5280511568bad
|
|
| BLAKE2b-256 |
e3cd1af6e71c1033ad5ef353bda00075719195b35a2d9936b099a85680012860
|
Provenance
The following attestation bundles were made for autofepg-0.1.3.tar.gz:
Publisher:
publish.yml on thomastschinkel/autofepg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autofepg-0.1.3.tar.gz -
Subject digest:
9e955aee95ad494376a021b3eaed632931bc4579a331d86d6f1ab38b810cbb06 - Sigstore transparency entry: 955549744
- Sigstore integration time:
-
Permalink:
thomastschinkel/autofepg@e80aad785e86f6213de40f9bb30cc6961d238454 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/thomastschinkel
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e80aad785e86f6213de40f9bb30cc6961d238454 -
Trigger Event:
release
-
Statement type:
File details
Details for the file autofepg-0.1.3-py3-none-any.whl.
File metadata
- Download URL: autofepg-0.1.3-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eff7d96e3c954ac792e8d4a9774c6a6ed5e15b15382c1dd65a3e7d75c5f01738
|
|
| MD5 |
301d19f55722f7c6d9c7a094a347d5ec
|
|
| BLAKE2b-256 |
178fe48d06acef9430f943b45dfd386bbd453338d6f7fb746b6595896f2673e9
|
Provenance
The following attestation bundles were made for autofepg-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on thomastschinkel/autofepg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autofepg-0.1.3-py3-none-any.whl -
Subject digest:
eff7d96e3c954ac792e8d4a9774c6a6ed5e15b15382c1dd65a3e7d75c5f01738 - Sigstore transparency entry: 955549752
- Sigstore integration time:
-
Permalink:
thomastschinkel/autofepg@e80aad785e86f6213de40f9bb30cc6961d238454 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/thomastschinkel
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e80aad785e86f6213de40f9bb30cc6961d238454 -
Trigger Event:
release
-
Statement type: