Skip to main content

Next-generation LLM-powered auto feature engineering framework with GitHub Copilot SDK

Project description

FeatCopilot ๐Ÿš€

Next-Generation LLM-Powered Auto Feature Engineering Framework

Tests codecov

FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanationsโ€”turning raw data into ML-ready features in seconds.

๐ŸŽฌ Introduction Video

FeatCopilot Introduction

๐Ÿ“Š Benchmark Highlights

Simple Models Benchmark (63 Datasets)

Configuration Improved Avg Improvement Best Improvement
Tabular Engine 31 (49%) +7.52% +144% (triple_interaction)

Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge

AutoML Benchmark (FLAML + AutoGluon, 120s budget)

Framework Datasets Improved Avg Improvement
FLAML 10 9 (90%) +1.85%
AutoGluon 10 9 (90%) +1.55%

FE Tools Comparison (FeatCopilot vs autofeat vs featuretools)

Metric FeatCopilot autofeat featuretools
Win Rate 80% ๐Ÿ† 40% 0%
Avg Improvement +1.89% ๐Ÿ† +1.46% -2.71%
Coverage 100% ๐Ÿ† 50% 100%
Composite Score 0.606 ๐Ÿฅ‡ 0.351 ๐Ÿฅ‰ 0.397 ๐Ÿฅˆ

Key Results

  • ๐Ÿ”ฅ +144% improvement on triple_interaction_regression (tabular only)
  • ๐Ÿ“ˆ +104% on xor_regression, +70% on pairwise_product_regression
  • ๐Ÿ† #1 FE tool โ€” beats autofeat and featuretools across 10 datasets
  • ๐Ÿš€ 90% AutoML improvement rate across FLAML and AutoGluon

View Full Benchmark Results

Key Features

  • ๐Ÿ”ง Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
  • ๐Ÿค– LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
  • ๐Ÿ“Š Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
  • ๐Ÿ”Œ Scikit-learn Compatible: Drop-in replacement for sklearn transformers
  • ๐Ÿ“ Interpretable: Every feature comes with human-readable explanations

Installation

# Basic installation
pip install featcopilot

# With LLM capabilities
pip install featcopilot[llm]

# Full installation
pip install featcopilot[full]

Quick Start

Fast Mode (Tabular Only)

from featcopilot import AutoFeatureEngineer

# Sub-second feature engineering
engineer = AutoFeatureEngineer(
    engines=['tabular'],
    max_features=50
)

X_transformed = engineer.fit_transform(X, y)  # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")

LLM Mode (With LiteLLM)

from featcopilot import AutoFeatureEngineer

# LLM-powered semantic features
engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    max_features=50
)

X_transformed = engineer.fit_transform(
    X, y,
    column_descriptions={
        'age': 'Customer age in years',
        'income': 'Annual household income in USD',
        'tenure': 'Months as customer',
    },
    task_description="Predict customer churn"
)  # 30-60 seconds

# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
    print(f"{feature}: {explanation}")

Command-Line Interface

FeatCopilot ships a featcopilot CLI for shell, scripting, and agentic (LLM tool-use) workflows โ€” no Python glue required. All subcommands accept --json for machine-readable stdout; errors are written to stderr with a non-zero exit code so agents can parse failures deterministically.

# Discover capabilities (engines, selection methods, I/O formats)
featcopilot info --json

# Run feature engineering on a CSV / JSON file
featcopilot transform \
    --input data.csv --target label --output features.csv \
    --engines tabular --max-features 50 --json

# Inspect generated features (name, explanation, code) as JSON for an LLM
featcopilot explain --input data.csv --target label

# Equivalent module form
python -m featcopilot info --json

Pass --config config.json to provide nested keys such as llm_config; explicit CLI flags override values from the config file.

Parquet I/O. FeatCopilot's base install does not pin a parquet engine. To use --input file.parquet / --output file.parquet (or the parquet value in --input-format / --output-format), install one of pyarrow or fastparquet. featcopilot info --json reports "parquet_available": true only when an engine is importable in the current environment.

Engines

Tabular Engine

Generates polynomial features, interaction terms, and mathematical transformations.

from featcopilot.engines import TabularEngine

engine = TabularEngine(
    polynomial_degree=2,
    interaction_only=False,
    include_transforms=['log', 'sqrt', 'square']
)

Time Series Engine

Extracts statistical, frequency, and temporal features from time series data.

from featcopilot.engines import TimeSeriesEngine

engine = TimeSeriesEngine(
    features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)

LLM Engine

Uses GitHub Copilot SDK (default) or LiteLLM (100+ providers) for intelligent feature generation.

from featcopilot.llm import SemanticEngine

# Default: GitHub Copilot SDK
engine = SemanticEngine(
    model='gpt-5.2',
    max_suggestions=20,
    validate_features=True
)

# Alternative: LiteLLM backend
engine = SemanticEngine(
    model='gpt-4o',
    backend='litellm',
    max_suggestions=20
)

Feature Selection

from featcopilot.selection import FeatureSelector

selector = FeatureSelector(
    methods=['mutual_info', 'importance', 'correlation'],
    max_features=30,
    correlation_threshold=0.95
)

X_selected = selector.fit_transform(X, y)

Comparison with Existing Libraries

Feature FeatCopilot Featuretools TSFresh AutoFeat OpenFE CAAFE
Tabular Features โœ… โœ… โŒ โœ… โœ… โœ…
Time Series โœ… โš ๏ธ โœ… โŒ โŒ โŒ
Relational โœ… โœ… โŒ โŒ โŒ โŒ
LLM-Powered โœ… โŒ โŒ โŒ โŒ โœ…
Semantic Understanding โœ… โŒ โŒ โŒ โŒ โš ๏ธ
Code Generation โœ… โŒ โŒ โŒ โŒ โš ๏ธ
Sklearn Compatible โœ… โœ… โœ… โœ… โœ… โŒ
Interpretable โœ… โœ… โš ๏ธ โš ๏ธ โŒ โœ…

Documentation

๐Ÿ“– Full Documentation: https://thinkall.github.io/featcopilot/

Requirements

  • Python 3.10+
  • NumPy, Pandas, Scikit-learn
  • GitHub Copilot SDK (default) or LiteLLM (for 100+ LLM providers)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featcopilot-0.4.0.tar.gz (209.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featcopilot-0.4.0-py3-none-any.whl (132.1 kB view details)

Uploaded Python 3

File details

Details for the file featcopilot-0.4.0.tar.gz.

File metadata

  • Download URL: featcopilot-0.4.0.tar.gz
  • Upload date:
  • Size: 209.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for featcopilot-0.4.0.tar.gz
Algorithm Hash digest
SHA256 f587311355772f949a97995fb7bf747fb7cdbf036d816a0acc9b4af80ce4f2d9
MD5 ad288ceae2ac136ab0e1e8f70611006a
BLAKE2b-256 d10965d880bd5224e74e6184c7d66ff77b3013bb28951da5d2b809524f28f384

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.4.0.tar.gz:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file featcopilot-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: featcopilot-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 132.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for featcopilot-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 522fb544494ef8f522c4653547b8810754070c833a44e46b96cc5cf96ebd6ace
MD5 1d4c13aec0cbba36a1d12d25d043c6f6
BLAKE2b-256 1a469039514ea4cee25ff3aadcdd73e9480c93234f58681ea8f5a4c83349ba45

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.4.0-py3-none-any.whl:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page