Skip to main content

Automated feature engineering using Large Language Models (LLMs) for tabular data

Project description

llm-feat

Python Version License: MIT

Automatically generate feature engineering code for pandas DataFrames using LLMs. Get context-aware, target-specific features that understand your domain.

Installation

pip install llm-feat

Quick Start

import pandas as pd
import llm_feat

llm_feat.set_api_key("your-openai-api-key")  # or set OPENAI_API_KEY env var

# Your data
df = pd.DataFrame({
    'income': [50000, 60000, 70000],
    'expenses': [30000, 35000, 40000],
    'target': [1, 0, 1]
})

# Metadata describing your columns
metadata_df = pd.DataFrame({
    'column_name': ['income', 'expenses', 'target'],
    'description': ['Annual income', 'Annual expenses', 'Binary target'],
    'data_type': ['numeric', 'numeric', 'numeric'],
    'label_definition': [None, None, '1 if positive, 0 if negative']
})

# Generate features
code = llm_feat.generate_features(df, metadata_df, mode='code')
print(code)

Generated Code:

import numpy as np

df['income_to_expense_ratio'] = np.where(df['expenses'] != 0, df['income'] / df['expenses'], np.nan)
df['savings'] = df['income'] - df['expenses']
df['savings_to_income_ratio'] = np.where(df['income'] != 0, df['savings'] / df['income'], np.nan)

Feature Reports

Get detailed explanations of why each feature was generated:

code, report = llm_feat.generate_features(
    df, metadata_df, mode='code', return_report=True
)
print(report)

Example Report:

FEATURE REPORT
==============

1. DOMAIN UNDERSTANDING:
   - Problem: Predicting binary target based on income and expenses
   - Key relationships: Income-to-expense ratios indicate financial health

2. GENERATED FEATURES EXPLANATION:
   - Feature: income_to_expense_ratio
     Rationale: Higher ratios indicate better financial stability
     Domain Relevance: Directly related to predicting positive outcomes

Direct Mode

Add features directly to your DataFrame:

df_with_features = llm_feat.generate_features(
    df, metadata_df, mode='direct', model='gpt-4o-mini'
)

Key Features

  • Context-aware: Uses column descriptions to generate relevant features
  • Target-aware: Generates features specific to your prediction task
  • Categorical support: Automatic encoding for categorical columns
  • Jupyter integration: Code auto-injected into next cell
  • Feature reports: Understand the reasoning behind each feature

Documentation

Development

git clone https://github.com/codeastra2/llm-feat.git
cd llm-feat
conda create -n llm_feat_310 python=3.10.19 -y
conda activate llm_feat_310
poetry install
poetry run pytest

License

MIT License - see LICENSE file for details.

Author

Srinivas Kumar - @codeastra2

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_feat-0.2.2.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_feat-0.2.2-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_feat-0.2.2.tar.gz.

File metadata

  • Download URL: llm_feat-0.2.2.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_feat-0.2.2.tar.gz
Algorithm Hash digest
SHA256 f7c69181382b27bff2a7bbe867995260469a3dab19ac9c57b3cf8f504c73b30e
MD5 d51d1d712c322478d6467682afbb4ae1
BLAKE2b-256 cb64791095812bd1bf294703091cd05eda0b1208ca741a2141d5cb2ecd1930f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_feat-0.2.2.tar.gz:

Publisher: publish.yml on codeastra2/llm-feat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_feat-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: llm_feat-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_feat-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 08445bc53f38b1e03e8a0d4cf46819cb6a61dd9a669ba2bcb7b9d128ab85347f
MD5 0f145af4d3b244c73819ac6fa7105f32
BLAKE2b-256 b1d93145eb648eb470e9a08fc675c416543f65685711b20593e6cd3e45384281

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_feat-0.2.2-py3-none-any.whl:

Publisher: publish.yml on codeastra2/llm-feat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page