Skip to main content

Next-generation LLM-powered auto feature engineering framework with GitHub Copilot SDK

Project description

FeatCopilot 🚀

Next-Generation LLM-Powered Auto Feature Engineering Framework

FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanations—turning raw data into ML-ready features in seconds.

📊 Benchmark Highlights

Tabular Engine (Fast Mode - <1s)

Task Type Average Improvement Best Case
Text Classification +12.44% +49.02% (News Headlines)
Time Series +1.51% +12.12% (Retail Demand)
Classification +0.54% +4.35%
Regression +0.65% +5.57%

LLM Engine (With LiteLLM - 30-60s)

Task Type Average Improvement Best Case
Regression +7.79% +19.66% (Retail Demand)
Classification +2.38% +2.87%
  • 12/12 wins on text classification (tabular mode)
  • 🧠 +19.66% max improvement with LLM-powered features
  • <1 second (tabular) or 30-60s (with LLM) processing time
  • 📈 Largest gains with simple models (LogisticRegression, Ridge)

View Full Benchmark Results

Key Features

  • 🔧 Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
  • 🤖 LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
  • 📊 Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
  • 🔌 Scikit-learn Compatible: Drop-in replacement for sklearn transformers
  • 📝 Interpretable: Every feature comes with human-readable explanations

Installation

# Basic installation
pip install featcopilot

# With LLM capabilities
pip install featcopilot[llm]

# Full installation
pip install featcopilot[full]

Quick Start

Fast Mode (Tabular Only)

from featcopilot import AutoFeatureEngineer

# Sub-second feature engineering
engineer = AutoFeatureEngineer(
    engines=['tabular'],
    max_features=50
)

X_transformed = engineer.fit_transform(X, y)  # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")

LLM Mode (With LiteLLM)

from featcopilot import AutoFeatureEngineer

# LLM-powered semantic features (+19.66% max improvement)
engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    max_features=50
)

X_transformed = engineer.fit_transform(
    X, y,
    column_descriptions={
        'age': 'Customer age in years',
        'income': 'Annual household income in USD',
        'tenure': 'Months as customer',
    },
    task_description="Predict customer churn"
)  # 30-60 seconds

# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
    print(f"{feature}: {explanation}")

Engines

Tabular Engine

Generates polynomial features, interaction terms, and mathematical transformations.

from featcopilot.engines import TabularEngine

engine = TabularEngine(
    polynomial_degree=2,
    interaction_only=False,
    include_transforms=['log', 'sqrt', 'square']
)

Time Series Engine

Extracts statistical, frequency, and temporal features from time series data.

from featcopilot.engines import TimeSeriesEngine

engine = TimeSeriesEngine(
    features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)

LLM Engine

Uses GitHub Copilot SDK (default) or LiteLLM (100+ providers) for intelligent feature generation.

from featcopilot.llm import SemanticEngine

# Default: GitHub Copilot SDK
engine = SemanticEngine(
    model='gpt-5.2',
    max_suggestions=20,
    validate_features=True
)

# Alternative: LiteLLM backend
engine = SemanticEngine(
    model='gpt-4o',
    backend='litellm',
    max_suggestions=20
)

Feature Selection

from featcopilot.selection import FeatureSelector

selector = FeatureSelector(
    methods=['mutual_info', 'importance', 'correlation'],
    max_features=30,
    correlation_threshold=0.95
)

X_selected = selector.fit_transform(X, y)

Comparison with Existing Libraries

Feature FeatCopilot Featuretools TSFresh AutoFeat OpenFE CAAFE
Tabular Features
Time Series
Relational
LLM-Powered
Semantic Understanding ⚠️
Code Generation ⚠️
Sklearn Compatible
Interpretable ⚠️ ⚠️ ⚠️

Documentation

📖 Full Documentation: https://thinkall.github.io/featcopilot/

Requirements

  • Python 3.9+
  • NumPy, Pandas, Scikit-learn
  • GitHub Copilot SDK (default) or LiteLLM (for 100+ LLM providers)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featcopilot-0.2.0.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featcopilot-0.2.0-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file featcopilot-0.2.0.tar.gz.

File metadata

  • Download URL: featcopilot-0.2.0.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for featcopilot-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6262eada6524f15421309af9172d3640dcfd3dce52a9694c7fc216895e6b3d20
MD5 fa35ebaa4debe6c28d5b4f447ba6b372
BLAKE2b-256 97d557d041140999e1679c6914bb60e05a7b87b9ef842de355af94eaf7bb8c67

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.2.0.tar.gz:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file featcopilot-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: featcopilot-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for featcopilot-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ab69d91438572e8cd53fcf50641df567029c8f0feb34d2320bedd2d65dda9da
MD5 f0e234030b2f9adb930ea25cc8a6b3f7
BLAKE2b-256 8c0e0bd050fd933eba213c60082199396b76679257d35093e6e4717c82b3abbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.2.0-py3-none-any.whl:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page