Skip to main content

Automated feature engineering using Large Language Models (LLMs) for tabular data

Project description

llm-feat

Python Version License: MIT

Automatically generate feature engineering code for pandas DataFrames using LLMs. Get context-aware, target-specific features that understand your domain.

Installation

pip install llm-feat

Quick Start

import pandas as pd
import llm_feat

llm_feat.set_api_key("your-openai-api-key")  # or set OPENAI_API_KEY env var

# Your data
df = pd.DataFrame({
    'income': [50000, 60000, 70000],
    'expenses': [30000, 35000, 40000],
    'target': [1, 0, 1]
})

# Metadata describing your columns
metadata_df = pd.DataFrame({
    'column_name': ['income', 'expenses', 'target'],
    'description': ['Annual income', 'Annual expenses', 'Binary target'],
    'data_type': ['numeric', 'numeric', 'numeric'],
    'label_definition': [None, None, '1 if positive, 0 if negative']
})

# Generate features
code = llm_feat.generate_features(df, metadata_df, mode='code')
print(code)

Generated Code:

import numpy as np

df['income_to_expense_ratio'] = np.where(df['expenses'] != 0, df['income'] / df['expenses'], np.nan)
df['savings'] = df['income'] - df['expenses']
df['savings_to_income_ratio'] = np.where(df['income'] != 0, df['savings'] / df['income'], np.nan)

Feature Reports

Get detailed explanations of why each feature was generated:

code, report = llm_feat.generate_features(
    df, metadata_df, mode='code', return_report=True
)
print(report)

Example Report:

FEATURE REPORT
==============

1. DOMAIN UNDERSTANDING:
   - Problem: Predicting binary target based on income and expenses
   - Key relationships: Income-to-expense ratios indicate financial health

2. GENERATED FEATURES EXPLANATION:
   - Feature: income_to_expense_ratio
     Rationale: Higher ratios indicate better financial stability
     Domain Relevance: Directly related to predicting positive outcomes

Direct Mode

Add features directly to your DataFrame:

df_with_features = llm_feat.generate_features(
    df, metadata_df, mode='direct', model='gpt-4o-mini'
)

Key Features

  • Context-aware: Uses column descriptions to generate relevant features
  • Target-aware: Generates features specific to your prediction task
  • Categorical support: Automatic encoding for categorical columns
  • Jupyter integration: Code auto-injected into next cell
  • Feature reports: Understand the reasoning behind each feature

Documentation

Development

git clone https://github.com/codeastra2/llm-feat.git
cd llm-feat
conda create -n llm_feat_310 python=3.10 -y
conda activate llm_feat_310
poetry install
poetry run pytest

License

MIT License - see LICENSE file for details.

Author

Srinivas Kumar - @codeastra2

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_feat-0.2.3.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_feat-0.2.3-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_feat-0.2.3.tar.gz.

File metadata

  • Download URL: llm_feat-0.2.3.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_feat-0.2.3.tar.gz
Algorithm Hash digest
SHA256 12a31b37684805da25179099f2e664b0ee815d59b1779d139da3a2d8e891480c
MD5 0f933a327246c9196edf88b638612ed0
BLAKE2b-256 0293b8f68f1234c4d6c07748064350a668a05442419a7f4a82ff52cb7cc7cb35

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_feat-0.2.3.tar.gz:

Publisher: publish.yml on codeastra2/llm-feat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_feat-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: llm_feat-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_feat-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7d2c1bb42e8b794bba22e33aa1a25bffa7bdb80db968f01f20fa9fcde3061749
MD5 076a6aad514b6ffb3cf519b091cf62f2
BLAKE2b-256 906f0d977c31e6d767e4255e9f30bd11cb53474cb99532deb15a6b5abea9a553

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_feat-0.2.3-py3-none-any.whl:

Publisher: publish.yml on codeastra2/llm-feat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page