Skip to main content

Next-generation LLM-powered auto feature engineering framework with GitHub Copilot SDK

Project description

FeatCopilot ๐Ÿš€

Next-Generation LLM-Powered Auto Feature Engineering Framework

Tests codecov

FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanationsโ€”turning raw data into ML-ready features in seconds.

๐ŸŽฌ Introduction Video

FeatCopilot Introduction

๐Ÿ“Š Benchmark Highlights

Simple Models Benchmark (63 Datasets)

Configuration Improved Avg Improvement Best Improvement
Tabular Engine 31 (49%) +7.52% +144% (triple_interaction)

Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge

AutoML Benchmark (FLAML + AutoGluon, 120s budget)

Framework Datasets Improved Avg Improvement
FLAML 10 9 (90%) +1.85%
AutoGluon 10 9 (90%) +1.55%

FE Tools Comparison (FeatCopilot vs autofeat vs featuretools)

Metric FeatCopilot autofeat featuretools
Win Rate 80% ๐Ÿ† 40% 0%
Avg Improvement +1.89% ๐Ÿ† +1.46% -2.71%
Coverage 100% ๐Ÿ† 50% 100%
Composite Score 0.606 ๐Ÿฅ‡ 0.351 ๐Ÿฅ‰ 0.397 ๐Ÿฅˆ

Key Results

  • ๐Ÿ”ฅ +144% improvement on triple_interaction_regression (tabular only)
  • ๐Ÿ“ˆ +104% on xor_regression, +70% on pairwise_product_regression
  • ๐Ÿ† #1 FE tool โ€” beats autofeat and featuretools across 10 datasets
  • ๐Ÿš€ 90% AutoML improvement rate across FLAML and AutoGluon

View Full Benchmark Results

Key Features

  • ๐Ÿ”ง Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
  • ๐Ÿค– LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
  • ๐Ÿ“Š Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
  • ๐Ÿ”Œ Scikit-learn Compatible: Drop-in replacement for sklearn transformers
  • ๐Ÿ“ Interpretable: Every feature comes with human-readable explanations

Installation

# Basic installation
pip install featcopilot

# With LLM capabilities
pip install featcopilot[llm]

# Full installation
pip install featcopilot[full]

Quick Start

Fast Mode (Tabular Only)

from featcopilot import AutoFeatureEngineer

# Sub-second feature engineering
engineer = AutoFeatureEngineer(
    engines=['tabular'],
    max_features=50
)

X_transformed = engineer.fit_transform(X, y)  # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")

LLM Mode (With LiteLLM)

from featcopilot import AutoFeatureEngineer

# LLM-powered semantic features
engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    max_features=50
)

X_transformed = engineer.fit_transform(
    X, y,
    column_descriptions={
        'age': 'Customer age in years',
        'income': 'Annual household income in USD',
        'tenure': 'Months as customer',
    },
    task_description="Predict customer churn"
)  # 30-60 seconds

# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
    print(f"{feature}: {explanation}")

Engines

Tabular Engine

Generates polynomial features, interaction terms, and mathematical transformations.

from featcopilot.engines import TabularEngine

engine = TabularEngine(
    polynomial_degree=2,
    interaction_only=False,
    include_transforms=['log', 'sqrt', 'square']
)

Time Series Engine

Extracts statistical, frequency, and temporal features from time series data.

from featcopilot.engines import TimeSeriesEngine

engine = TimeSeriesEngine(
    features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)

LLM Engine

Uses GitHub Copilot SDK (default) or LiteLLM (100+ providers) for intelligent feature generation.

from featcopilot.llm import SemanticEngine

# Default: GitHub Copilot SDK
engine = SemanticEngine(
    model='gpt-5.2',
    max_suggestions=20,
    validate_features=True
)

# Alternative: LiteLLM backend
engine = SemanticEngine(
    model='gpt-4o',
    backend='litellm',
    max_suggestions=20
)

Feature Selection

from featcopilot.selection import FeatureSelector

selector = FeatureSelector(
    methods=['mutual_info', 'importance', 'correlation'],
    max_features=30,
    correlation_threshold=0.95
)

X_selected = selector.fit_transform(X, y)

Comparison with Existing Libraries

Feature FeatCopilot Featuretools TSFresh AutoFeat OpenFE CAAFE
Tabular Features โœ… โœ… โŒ โœ… โœ… โœ…
Time Series โœ… โš ๏ธ โœ… โŒ โŒ โŒ
Relational โœ… โœ… โŒ โŒ โŒ โŒ
LLM-Powered โœ… โŒ โŒ โŒ โŒ โœ…
Semantic Understanding โœ… โŒ โŒ โŒ โŒ โš ๏ธ
Code Generation โœ… โŒ โŒ โŒ โŒ โš ๏ธ
Sklearn Compatible โœ… โœ… โœ… โœ… โœ… โŒ
Interpretable โœ… โœ… โš ๏ธ โš ๏ธ โŒ โœ…

Documentation

๐Ÿ“– Full Documentation: https://thinkall.github.io/featcopilot/

Requirements

  • Python 3.10+
  • NumPy, Pandas, Scikit-learn
  • GitHub Copilot SDK (default) or LiteLLM (for 100+ LLM providers)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featcopilot-0.3.7.tar.gz (155.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featcopilot-0.3.7-py3-none-any.whl (107.8 kB view details)

Uploaded Python 3

File details

Details for the file featcopilot-0.3.7.tar.gz.

File metadata

  • Download URL: featcopilot-0.3.7.tar.gz
  • Upload date:
  • Size: 155.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for featcopilot-0.3.7.tar.gz
Algorithm Hash digest
SHA256 25008a4f01efd463e78220965a46895601f0a45949612c82a3cc3000ddf11c15
MD5 b6f6aeee55d025f67d010d346b032a6f
BLAKE2b-256 f7ceec96f574dd53ad6e1c632da1df10c0bb02ee7f3c6c7b5718fc3e4cc2a0b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.3.7.tar.gz:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file featcopilot-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: featcopilot-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 107.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for featcopilot-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0634764d04f8b2333acb26366ac8cbe7dee673b73aef209485e8e70b9f113232
MD5 e3982f3904fea4efe088bf818d0ab067
BLAKE2b-256 7f1026336a182501cddcb37328c5ce4cac1469de82300f1eaa6327619e4bc2bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.3.7-py3-none-any.whl:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page