Next-generation LLM-powered auto feature engineering framework with GitHub Copilot SDK
Project description
FeatCopilot 🚀
Next-Generation LLM-Powered Auto Feature Engineering Framework
FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanations—turning raw data into ML-ready features in seconds.
🎬 Introduction Video
📊 Benchmark Highlights
Simple Models Benchmark (42 Datasets)
| Configuration | Improved | Avg Improvement | Best Improvement |
|---|---|---|---|
| Tabular Engine | 20 (48%) | +4.54% | +197% (delays_zurich) |
| Tabular + LLM | 23 (55%) | +6.12% | +420% (delays_zurich) |
Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge
AutoML Benchmark (FLAML, 120s budget)
| Metric | Value |
|---|---|
| Datasets | 41 |
| Improved | 19 (46%) |
| Best Improvement | +8.55% (abalone) |
Key Results
- ✅ +197% improvement on delays_zurich (tabular only)
- 🧠 +420% improvement with LLM-enhanced features
- 📈 +8.98% on abalone regression task
- 🚀 +5.68% on complex_classification
Key Features
- 🔧 Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
- 🤖 LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
- 📊 Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
- 🔌 Scikit-learn Compatible: Drop-in replacement for sklearn transformers
- 📝 Interpretable: Every feature comes with human-readable explanations
Installation
# Basic installation
pip install featcopilot
# With LLM capabilities
pip install featcopilot[llm]
# Full installation
pip install featcopilot[full]
Quick Start
Fast Mode (Tabular Only)
from featcopilot import AutoFeatureEngineer
# Sub-second feature engineering
engineer = AutoFeatureEngineer(
engines=['tabular'],
max_features=50
)
X_transformed = engineer.fit_transform(X, y) # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")
LLM Mode (With LiteLLM)
from featcopilot import AutoFeatureEngineer
# LLM-powered semantic features (+420% max improvement)
engineer = AutoFeatureEngineer(
engines=['tabular', 'llm'],
max_features=50
)
X_transformed = engineer.fit_transform(
X, y,
column_descriptions={
'age': 'Customer age in years',
'income': 'Annual household income in USD',
'tenure': 'Months as customer',
},
task_description="Predict customer churn"
) # 30-60 seconds
# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
print(f"{feature}: {explanation}")
Engines
Tabular Engine
Generates polynomial features, interaction terms, and mathematical transformations.
from featcopilot.engines import TabularEngine
engine = TabularEngine(
polynomial_degree=2,
interaction_only=False,
include_transforms=['log', 'sqrt', 'square']
)
Time Series Engine
Extracts statistical, frequency, and temporal features from time series data.
from featcopilot.engines import TimeSeriesEngine
engine = TimeSeriesEngine(
features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)
LLM Engine
Uses GitHub Copilot SDK (default) or LiteLLM (100+ providers) for intelligent feature generation.
from featcopilot.llm import SemanticEngine
# Default: GitHub Copilot SDK
engine = SemanticEngine(
model='gpt-5.2',
max_suggestions=20,
validate_features=True
)
# Alternative: LiteLLM backend
engine = SemanticEngine(
model='gpt-4o',
backend='litellm',
max_suggestions=20
)
Feature Selection
from featcopilot.selection import FeatureSelector
selector = FeatureSelector(
methods=['mutual_info', 'importance', 'correlation'],
max_features=30,
correlation_threshold=0.95
)
X_selected = selector.fit_transform(X, y)
Comparison with Existing Libraries
| Feature | FeatCopilot | Featuretools | TSFresh | AutoFeat | OpenFE | CAAFE |
|---|---|---|---|---|---|---|
| Tabular Features | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| Time Series | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| Relational | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| LLM-Powered | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Semantic Understanding | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| Code Generation | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| Sklearn Compatible | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Interpretable | ✅ | ⚠️ | ⚠️ | ⚠️ | ❌ | ✅ |
Documentation
📖 Full Documentation: https://thinkall.github.io/featcopilot/
Requirements
- Python 3.9+
- NumPy, Pandas, Scikit-learn
- GitHub Copilot SDK (default) or LiteLLM (for 100+ LLM providers)
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file featcopilot-0.3.0.tar.gz.
File metadata
- Download URL: featcopilot-0.3.0.tar.gz
- Upload date:
- Size: 78.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14f3b9024c2cf7fbde850c0c9697a53b05d16da540b4ea66348b790e24b9ef8a
|
|
| MD5 |
1f3c7287873b3246f76d4bed62a4c043
|
|
| BLAKE2b-256 |
3b471d633eece392248047ac0fbcf35856742d3d785225f7d7a07bd1e49b3508
|
Provenance
The following attestation bundles were made for featcopilot-0.3.0.tar.gz:
Publisher:
publish.yml on thinkall/featcopilot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
featcopilot-0.3.0.tar.gz -
Subject digest:
14f3b9024c2cf7fbde850c0c9697a53b05d16da540b4ea66348b790e24b9ef8a - Sigstore transparency entry: 919530568
- Sigstore integration time:
-
Permalink:
thinkall/featcopilot@a8a7720eb73ae7c95d320e2555a2d5960ca6fba4 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/thinkall
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8a7720eb73ae7c95d320e2555a2d5960ca6fba4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file featcopilot-0.3.0-py3-none-any.whl.
File metadata
- Download URL: featcopilot-0.3.0-py3-none-any.whl
- Upload date:
- Size: 88.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
901fe68e45a6ef254e8ccb27ac2d1b8bd399afce22106960ecb82f0121789da2
|
|
| MD5 |
dcbf938527fbb34be78591b8f7ef88fe
|
|
| BLAKE2b-256 |
fbe25a01ce1a1f866a8b07fbf80d2cb9881958004eed0982b262b42ce374fb37
|
Provenance
The following attestation bundles were made for featcopilot-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on thinkall/featcopilot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
featcopilot-0.3.0-py3-none-any.whl -
Subject digest:
901fe68e45a6ef254e8ccb27ac2d1b8bd399afce22106960ecb82f0121789da2 - Sigstore transparency entry: 919530570
- Sigstore integration time:
-
Permalink:
thinkall/featcopilot@a8a7720eb73ae7c95d320e2555a2d5960ca6fba4 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/thinkall
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8a7720eb73ae7c95d320e2555a2d5960ca6fba4 -
Trigger Event:
release
-
Statement type: