Automated feature engineering using Large Language Models (LLMs) for tabular data
Project description
llm-feat
Automatically generate feature engineering code for pandas DataFrames using LLMs. Get context-aware, target-specific features that understand your domain.
Installation
pip install llm-feat
Quick Start
import pandas as pd
import llm_feat
llm_feat.set_api_key("your-openai-api-key") # or set OPENAI_API_KEY env var
# Your data
df = pd.DataFrame({
'income': [50000, 60000, 70000],
'expenses': [30000, 35000, 40000],
'target': [1, 0, 1]
})
# Metadata describing your columns
metadata_df = pd.DataFrame({
'column_name': ['income', 'expenses', 'target'],
'description': ['Annual income', 'Annual expenses', 'Binary target'],
'data_type': ['numeric', 'numeric', 'numeric'],
'label_definition': [None, None, '1 if positive, 0 if negative']
})
# Generate features
code = llm_feat.generate_features(df, metadata_df, mode='code')
print(code)
Generated Code:
import numpy as np
df['income_to_expense_ratio'] = np.where(df['expenses'] != 0, df['income'] / df['expenses'], np.nan)
df['savings'] = df['income'] - df['expenses']
df['savings_to_income_ratio'] = np.where(df['income'] != 0, df['savings'] / df['income'], np.nan)
Feature Reports
Get detailed explanations of why each feature was generated:
code, report = llm_feat.generate_features(
df, metadata_df, mode='code', return_report=True
)
print(report)
Example Report:
FEATURE REPORT
==============
1. DOMAIN UNDERSTANDING:
- Problem: Predicting binary target based on income and expenses
- Key relationships: Income-to-expense ratios indicate financial health
2. GENERATED FEATURES EXPLANATION:
- Feature: income_to_expense_ratio
Rationale: Higher ratios indicate better financial stability
Domain Relevance: Directly related to predicting positive outcomes
Direct Mode
Add features directly to your DataFrame:
df_with_features = llm_feat.generate_features(
df, metadata_df, mode='direct', model='gpt-4o-mini'
)
Key Features
- Context-aware: Uses column descriptions to generate relevant features
- Target-aware: Generates features specific to your prediction task
- Categorical support: Automatic encoding for categorical columns
- Jupyter integration: Code auto-injected into next cell
- Feature reports: Understand the reasoning behind each feature
Documentation
- Read the Docs - Full documentation
- API Reference - Complete parameter documentation
- Examples - Jupyter notebook examples
Development
git clone https://github.com/codeastra2/llm-feat.git
cd llm-feat
conda create -n llm_feat_310 python=3.10.19 -y
conda activate llm_feat_310
poetry install
poetry run pytest
License
MIT License - see LICENSE file for details.
Author
Srinivas Kumar - @codeastra2
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_feat-0.2.2.tar.gz.
File metadata
- Download URL: llm_feat-0.2.2.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c69181382b27bff2a7bbe867995260469a3dab19ac9c57b3cf8f504c73b30e
|
|
| MD5 |
d51d1d712c322478d6467682afbb4ae1
|
|
| BLAKE2b-256 |
cb64791095812bd1bf294703091cd05eda0b1208ca741a2141d5cb2ecd1930f0
|
Provenance
The following attestation bundles were made for llm_feat-0.2.2.tar.gz:
Publisher:
publish.yml on codeastra2/llm-feat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_feat-0.2.2.tar.gz -
Subject digest:
f7c69181382b27bff2a7bbe867995260469a3dab19ac9c57b3cf8f504c73b30e - Sigstore transparency entry: 793983206
- Sigstore integration time:
-
Permalink:
codeastra2/llm-feat@4685aa3c9276828e6f9909925e48b89fd93da706 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/codeastra2
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4685aa3c9276828e6f9909925e48b89fd93da706 -
Trigger Event:
release
-
Statement type:
File details
Details for the file llm_feat-0.2.2-py3-none-any.whl.
File metadata
- Download URL: llm_feat-0.2.2-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08445bc53f38b1e03e8a0d4cf46819cb6a61dd9a669ba2bcb7b9d128ab85347f
|
|
| MD5 |
0f145af4d3b244c73819ac6fa7105f32
|
|
| BLAKE2b-256 |
b1d93145eb648eb470e9a08fc675c416543f65685711b20593e6cd3e45384281
|
Provenance
The following attestation bundles were made for llm_feat-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on codeastra2/llm-feat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_feat-0.2.2-py3-none-any.whl -
Subject digest:
08445bc53f38b1e03e8a0d4cf46819cb6a61dd9a669ba2bcb7b9d128ab85347f - Sigstore transparency entry: 793983274
- Sigstore integration time:
-
Permalink:
codeastra2/llm-feat@4685aa3c9276828e6f9909925e48b89fd93da706 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/codeastra2
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4685aa3c9276828e6f9909925e48b89fd93da706 -
Trigger Event:
release
-
Statement type: