Intelligent data enrichment agent for automated feature engineering
Project description
Data Enrichment Agent
The Data Enrichment Agent is a Python-based tool that automatically generates and applies feature engineering transformations to enhance your datasets. It analyzes the structure and content of your data, then uses a Large Language Model (LLM) to suggest and create meaningful new features for machine learning and analytics.
Features
- Automated Feature Engineering: Generate new features using LLM-powered suggestions
- Intelligent Data Profiling: Automatic analysis of input data characteristics
- Multiple Data Formats: Support for CSV, Excel, JSON, and Parquet files
- Configurable Parameters: Customize enrichment behavior and thresholds
- Production-Ready: Type hints, logging, and error handling throughout
- Extensible: Built on a modular framework that allows for easy extension
Feature Engineering Capabilities
The agent can generate various types of features:
-
Time-Based Features
- Date part extraction
- Time differences
- Business day calculations
-
Numeric Transformations
- Polynomial features
- Binning and discretization
- Mathematical transformations
-
Categorical Encodings
- One-hot encoding
- Frequency encoding
- Target encoding
-
Interaction Features
- Arithmetic combinations
- Ratio features
- Conditional features
Prerequisites
- uv – package & environment manager
For quick setup on macOS/Linux:curl -LsSf https://astral.sh/uv/install.sh | sh
- Git
Installation
-
Clone the repository
git clone https://github.com/stepfnAI/data_enrichment_agent.git cd data_enrichment_agent git switch dev
-
Set up the virtual environment and install dependencies
uv venv --python=3.10 venv source venv/bin/activate uv pip install -e ".[dev]"
-
Clone and install the blueprint dependency
cd .. git clone https://github.com/stepfnAI/sfn_blueprint.git cd sfn_blueprint git switch dev uv pip install -e . cd ../data_enrichment_agent
-
export the environment variables
export OPENAI_API_KEY="your_openai_api_key"
Architecture
The Data Enrichment Agent is built with a modular architecture:
data_enrichment_agent/
├── agent.py # Main agent implementation
├── models.py # Pydantic models for data structures
├── utils.py # Helper functions and utilities
├── config.py # Configuration management
├── constants.py # Constants and templates
└── cli.py # Command-line interface
Configuration
The agent can be configured using the EnrichmentConfig class. Here are the available configuration options:
from data_enrichment_agent.models import EnrichmentConfig
config = EnrichmentConfig(
model_name="gpt-4.1-mini", # LLM model to use
model_temperature=0.1, # Temperature for LLM responses
model_max_tokens=2000, # Maximum tokens for LLM responses
ai_provider="openai", # AI provider to use
ai_task_type="feature_suggestions_generator", # Task type for AI requests
Basic Usage
python examples/basic_usage.py
Testing
Run the test suite using pytest:
# Run all tests
pytest tests/ -s
# Run specific test
pytest tests/test_agent.py -s
# Run with coverage report
pytest --cov=data_enrichment_agent tests/ -s
Contributing
Contributions are welcome! Please follow these steps:
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_enrichment_agent-0.1.1.tar.gz.
File metadata
- Download URL: data_enrichment_agent-0.1.1.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecef55e73bbb24ffd65fca0bd9a846d091479934758430fbe43a4ac7b0c2b200
|
|
| MD5 |
b24b3381c0a830d368a1ec2310061c13
|
|
| BLAKE2b-256 |
3d64681ad839daac8b60d01a3b8a8e3d0073ba1e1676773646186b7aca352d5d
|
File details
Details for the file data_enrichment_agent-0.1.1-py3-none-any.whl.
File metadata
- Download URL: data_enrichment_agent-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8c7432d6a23b3957538d5cc32b176e9940fc117b0e8548aa43a58ad31ead3d9
|
|
| MD5 |
f0d53b5f90d0267e7a5180681f94debb
|
|
| BLAKE2b-256 |
c7586957a31527cbd60fc2ddc5f221b5bf247890e54de3a134445a95f7192653
|