Skip to main content

Intelligent data enrichment agent for automated feature engineering

Project description

Data Enrichment Agent

The Data Enrichment Agent is a Python-based tool that automatically generates and applies feature engineering transformations to enhance your datasets. It analyzes the structure and content of your data, then uses a Large Language Model (LLM) to suggest and create meaningful new features for machine learning and analytics.

Features

  • Automated Feature Engineering: Generate new features using LLM-powered suggestions
  • Intelligent Data Profiling: Automatic analysis of input data characteristics
  • Multiple Data Formats: Support for CSV, Excel, JSON, and Parquet files
  • Configurable Parameters: Customize enrichment behavior and thresholds
  • Production-Ready: Type hints, logging, and error handling throughout
  • Extensible: Built on a modular framework that allows for easy extension

Feature Engineering Capabilities

The agent can generate various types of features:

  1. Time-Based Features

    • Date part extraction
    • Time differences
    • Business day calculations
  2. Numeric Transformations

    • Polynomial features
    • Binning and discretization
    • Mathematical transformations
  3. Categorical Encodings

    • One-hot encoding
    • Frequency encoding
    • Target encoding
  4. Interaction Features

    • Arithmetic combinations
    • Ratio features
    • Conditional features

Prerequisites

  • uv – package & environment manager
    For quick setup on macOS/Linux:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • Git

Installation

  1. Clone the repository

    git clone https://github.com/stepfnAI/data_enrichment_agent.git
    cd data_enrichment_agent
    git switch dev
    
  2. Set up the virtual environment and install dependencies

    uv venv --python=3.10 venv
    source venv/bin/activate
    uv pip install -e ".[dev]"
    
  3. Clone and install the blueprint dependency

    cd ..
    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    cd ../data_enrichment_agent
    
  4. export the environment variables

    export OPENAI_API_KEY="your_openai_api_key"
    

Architecture

The Data Enrichment Agent is built with a modular architecture:

data_enrichment_agent/
├── agent.py           # Main agent implementation
├── models.py          # Pydantic models for data structures
├── utils.py           # Helper functions and utilities
├── config.py          # Configuration management
├── constants.py       # Constants and templates
└── cli.py             # Command-line interface

Configuration

The agent can be configured using the EnrichmentConfig class. Here are the available configuration options:

from data_enrichment_agent.models import EnrichmentConfig

config = EnrichmentConfig(
    model_name="gpt-4.1-mini",  # LLM model to use
    model_temperature=0.1,      # Temperature for LLM responses
    model_max_tokens=2000,      # Maximum tokens for LLM responses
    ai_provider="openai",       # AI provider to use
    ai_task_type="feature_suggestions_generator", # Task type for AI requests
    

Basic Usage

python examples/basic_usage.py

Testing

Run the test suite using pytest:

# Run all tests
pytest tests/ -s

# Run specific test
pytest tests/test_agent.py -s

# Run with coverage report
pytest --cov=data_enrichment_agent tests/ -s

Contributing

Contributions are welcome! Please follow these steps:

  1. Create a feature branch (git checkout -b feature/amazing-feature)
  2. Commit your changes (git commit -m 'Add some amazing feature')
  3. Push to the branch (git push origin feature/amazing-feature)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_enrichment_agent-0.1.9.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_enrichment_agent-0.1.9-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file data_enrichment_agent-0.1.9.tar.gz.

File metadata

  • Download URL: data_enrichment_agent-0.1.9.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for data_enrichment_agent-0.1.9.tar.gz
Algorithm Hash digest
SHA256 359c1c3a5a322b53c7ce71b4ac20f9852275994e83a0ee3915e1f399159d9d89
MD5 9d9c36ee86994f07d91fb2abc6133d33
BLAKE2b-256 26c372b80cbd6b08e97bbeb94cbb4f4e8b571cd000fdd64a8717f34b41cfda30

See more details on using hashes here.

File details

Details for the file data_enrichment_agent-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for data_enrichment_agent-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 9e6caf18745aecc4b8112c45b892f53e302cad8c4204045fd77312af9ef6b537
MD5 8f7f63910bd027d698c486dd51a04dd9
BLAKE2b-256 2409959071de4fd5a2ef13a5625864a17d9d2bdbf3582fbf20ca2d12a94ee8a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page