Skip to main content

Next-generation LLM-powered auto feature engineering framework with GitHub Copilot SDK

Project description

FeatCopilot ๐Ÿš€

Next-Generation LLM-Powered Auto Feature Engineering with GitHub Copilot SDK

FeatCopilot is a unified feature engineering framework that combines the best approaches from existing libraries (Featuretools, TSFresh, AutoFeat, OpenFE) with novel LLM-powered capabilities via GitHub Copilot SDK.

๐Ÿ“Š Benchmark Highlights

Tabular Engine (Fast Mode - <1s)

Task Type Average Improvement Best Case
Text Classification +12.44% +49.02% (News Headlines)
Time Series +1.51% +12.12% (Retail Demand)
Classification +0.54% +4.35%
Regression +0.65% +5.57%

LLM Engine (With Copilot - 30-60s)

Task Type Average Improvement Best Case
Regression +7.79% +19.66% (Retail Demand)
Classification +2.38% +2.87%
  • โœ… 12/12 wins on text classification (tabular mode)
  • ๐Ÿง  +19.66% max improvement with LLM-powered features
  • โšก <1 second (tabular) or 30-60s (with LLM) processing time
  • ๐Ÿ“ˆ Largest gains with simple models (LogisticRegression, Ridge)

View Full Benchmark Results

Key Features

  • ๐Ÿ”ง Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
  • ๐Ÿค– LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
  • ๐Ÿ“Š Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
  • ๐Ÿ”Œ Scikit-learn Compatible: Drop-in replacement for sklearn transformers
  • ๐Ÿ“ Interpretable: Every feature comes with human-readable explanations

Installation

# Basic installation
pip install featcopilot

# With LLM capabilities (requires GitHub Copilot)
pip install featcopilot[llm]

# Full installation
pip install featcopilot[full]

Quick Start

Fast Mode (Tabular Only)

from featcopilot import AutoFeatureEngineer

# Sub-second feature engineering
engineer = AutoFeatureEngineer(
    engines=['tabular'],
    max_features=50
)

X_transformed = engineer.fit_transform(X, y)  # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")

LLM Mode (With Copilot)

from featcopilot import AutoFeatureEngineer

# LLM-powered semantic features (+19.66% max improvement)
engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    max_features=50
)

X_transformed = engineer.fit_transform(
    X, y,
    column_descriptions={
        'age': 'Customer age in years',
        'income': 'Annual household income in USD',
        'tenure': 'Months as customer',
    },
    task_description="Predict customer churn"
)  # 30-60 seconds

# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
    print(f"{feature}: {explanation}")

Engines

Tabular Engine

Generates polynomial features, interaction terms, and mathematical transformations.

from featcopilot.engines import TabularEngine

engine = TabularEngine(
    polynomial_degree=2,
    interaction_only=False,
    include_transforms=['log', 'sqrt', 'square']
)

Time Series Engine

Extracts statistical, frequency, and temporal features from time series data.

from featcopilot.engines import TimeSeriesEngine

engine = TimeSeriesEngine(
    features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)

LLM Engine

Uses GitHub Copilot SDK for intelligent feature generation.

from featcopilot.llm import SemanticEngine

engine = SemanticEngine(
    model='gpt-5',
    max_suggestions=20,
    validate_features=True
)

Feature Selection

from featcopilot.selection import FeatureSelector

selector = FeatureSelector(
    methods=['mutual_info', 'importance', 'correlation'],
    max_features=30,
    correlation_threshold=0.95
)

X_selected = selector.fit_transform(X, y)

Comparison with Existing Libraries

Feature FeatCopilot Featuretools TSFresh AutoFeat OpenFE CAAFE
Tabular Features โœ… โŒ โŒ โœ… โœ… โœ…
Time Series โœ… โŒ โœ… โŒ โŒ โŒ
Relational โœ… โœ… โŒ โŒ โŒ โŒ
LLM-Powered โœ… โŒ โŒ โŒ โŒ โœ…
Semantic Understanding โœ… โŒ โŒ โŒ โŒ โš ๏ธ
Code Generation โœ… โŒ โŒ โŒ โŒ โš ๏ธ
Sklearn Compatible โœ… โœ… โœ… โœ… โœ… โŒ
Interpretable โœ… โš ๏ธ โš ๏ธ โš ๏ธ โŒ โœ…

Documentation

๐Ÿ“– Full Documentation: https://thinkall.github.io/featcopilot/

Requirements

  • Python 3.9+
  • NumPy, Pandas, Scikit-learn
  • GitHub Copilot CLI (for LLM features)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featcopilot-0.1.0.tar.gz (40.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featcopilot-0.1.0-py3-none-any.whl (48.9 kB view details)

Uploaded Python 3

File details

Details for the file featcopilot-0.1.0.tar.gz.

File metadata

  • Download URL: featcopilot-0.1.0.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for featcopilot-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a722b51587b87563d2b435bbb1b80cc99de986972c01b602f7aafe0ebcf2499c
MD5 5b3a3266c95f3c0ed9dd4575ca575705
BLAKE2b-256 c9eb6385d4735767a17decefdfc349b2363534212ed7c51fd80e993f091da31a

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.1.0.tar.gz:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file featcopilot-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: featcopilot-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for featcopilot-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9a64b7eda3758fafccbc9d421aeec900a2058414534c319b1df8c90fe07a0412
MD5 6b8fdeb105f57337d10b3e45b021711b
BLAKE2b-256 878f31bef4643cd8a68ed5ec53425f2d3d9a4c0f3b63780d8511cdbfd6a64233

See more details on using hashes here.

Provenance

The following attestation bundles were made for featcopilot-0.1.0-py3-none-any.whl:

Publisher: publish.yml on thinkall/featcopilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page