Skip to main content

Next-generation LLM-powered auto feature engineering framework with GitHub Copilot SDK

Project description

FeatCopilot 🚀

Next-Generation LLM-Powered Auto Feature Engineering Framework

FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanations—turning raw data into ML-ready features in seconds.

🎬 Introduction Video

FeatCopilot Introduction

📊 Benchmark Highlights

Simple Models Benchmark (42 Datasets)

Configuration Improved Avg Improvement Best Improvement
Tabular Engine 20 (48%) +4.54% +197% (delays_zurich)
Tabular + LLM 23 (55%) +6.12% +420% (delays_zurich)

Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge

AutoML Benchmark (FLAML, 120s budget)

Metric Value
Datasets 41
Improved 19 (46%)
Best Improvement +8.55% (abalone)

Key Results

  • +197% improvement on delays_zurich (tabular only)
  • 🧠 +420% improvement with LLM-enhanced features
  • 📈 +8.98% on abalone regression task
  • 🚀 +5.68% on complex_classification

View Full Benchmark Results

Key Features

  • 🔧 Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
  • 🤖 LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
  • 📊 Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
  • 🔌 Scikit-learn Compatible: Drop-in replacement for sklearn transformers
  • 📝 Interpretable: Every feature comes with human-readable explanations

Installation

# Basic installation
pip install featcopilot

# With LLM capabilities
pip install featcopilot[llm]

# Full installation
pip install featcopilot[full]

Quick Start

Fast Mode (Tabular Only)

from featcopilot import AutoFeatureEngineer

# Sub-second feature engineering
engineer = AutoFeatureEngineer(
    engines=['tabular'],
    max_features=50
)

X_transformed = engineer.fit_transform(X, y)  # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")

LLM Mode (With LiteLLM)

from featcopilot import AutoFeatureEngineer

# LLM-powered semantic features (+420% max improvement)
engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    max_features=50
)

X_transformed = engineer.fit_transform(
    X, y,
    column_descriptions={
        'age': 'Customer age in years',
        'income': 'Annual household income in USD',
        'tenure': 'Months as customer',
    },
    task_description="Predict customer churn"
)  # 30-60 seconds

# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
    print(f"{feature}: {explanation}")

Engines

Tabular Engine

Generates polynomial features, interaction terms, and mathematical transformations.

from featcopilot.engines import TabularEngine

engine = TabularEngine(
    polynomial_degree=2,
    interaction_only=False,
    include_transforms=['log', 'sqrt', 'square']
)

Time Series Engine

Extracts statistical, frequency, and temporal features from time series data.

from featcopilot.engines import TimeSeriesEngine

engine = TimeSeriesEngine(
    features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)

LLM Engine

Uses GitHub Copilot SDK (default) or LiteLLM (100+ providers) for intelligent feature generation.

from featcopilot.llm import SemanticEngine

# Default: GitHub Copilot SDK
engine = SemanticEngine(
    model='gpt-5.2',
    max_suggestions=20,
    validate_features=True
)

# Alternative: LiteLLM backend
engine = SemanticEngine(
    model='gpt-4o',
    backend='litellm',
    max_suggestions=20
)

Feature Selection

from featcopilot.selection import FeatureSelector

selector = FeatureSelector(
    methods=['mutual_info', 'importance', 'correlation'],
    max_features=30,
    correlation_threshold=0.95
)

X_selected = selector.fit_transform(X, y)

Comparison with Existing Libraries

Feature FeatCopilot Featuretools TSFresh AutoFeat OpenFE CAAFE
Tabular Features
Time Series
Relational
LLM-Powered
Semantic Understanding ⚠️
Code Generation ⚠️
Sklearn Compatible
Interpretable ⚠️ ⚠️ ⚠️

Documentation

📖 Full Documentation: https://thinkall.github.io/featcopilot/

Requirements

  • Python 3.9+
  • NumPy, Pandas, Scikit-learn
  • GitHub Copilot SDK (default) or LiteLLM (for 100+ LLM providers)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featcopilot-0.3.0.tar.gz (78.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featcopilot-0.3.0-py3-none-any.whl (88.7 kB view details)

Uploaded Python 3

File details

Details for the file featcopilot-0.3.0.tar.gz.

File metadata

  • Download URL: featcopilot-0.3.0.tar.gz
  • Upload date:
  • Size: 78.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for featcopilot-0.3.0.tar.gz
Algorithm Hash digest
SHA256 14f3b9024c2cf7fbde850c0c9697a53b05d16da540b4ea66348b790e24b9ef8a
MD5 1f3c7287873b3246f76d4bed62a4c043
BLAKE2b-256 3b471d633eece392248047ac0fbcf35856742d3d785225f7d7a07bd1e49b3508

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.3.0.tar.gz:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file featcopilot-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: featcopilot-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 88.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for featcopilot-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 901fe68e45a6ef254e8ccb27ac2d1b8bd399afce22106960ecb82f0121789da2
MD5 dcbf938527fbb34be78591b8f7ef88fe
BLAKE2b-256 fbe25a01ce1a1f866a8b07fbf80d2cb9881958004eed0982b262b42ce374fb37

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.3.0-py3-none-any.whl:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page