Skip to main content

LLM-driven agent for categorizing data tables based on domain schemas

Project description

Table Categorization Agent

An LLM-driven agent that analyzes data tables and categorizes them based on domain schemas. This agent identifies entities, attributes, and relationships from data and maps them to domain concepts.

Features

  • LLM-Driven Analysis: Uses advanced language models to understand table descriptions and content.
  • Domain Schema Integration: Maps tables to domain entities using JSON schema definitions.
  • Metadata-based Categorization: Categorizes tables based on their descriptions and metadata.

Installation

Prerequisites

  • Git access to required repositories
  • uv – package & environment manager
    Please refer to the official installation guide for the most up-to-date instructions.
    For quick setup on macOS/Linux, you can currently use:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • OpenAI API key

Step-by-Step Installation

  1. Clone the main repository:

    git clone https://github.com/stepfnAI/table_categorization_agent.git
    cd table_categorization_agent
    git switch dev
    uv sync --extra dev
    source .venv/bin/activate
    cd ../
    
  2. Install blueprint:

    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    cd ../table_categorization_agent
    
  3. Configure environment:

    export OPENAI_API_KEY="your-api-key-here"
    

Configuration

You can configure the agent in two ways: using a .env file for project-specific settings or by exporting environment variables for more dynamic, shell-level control. Settings loaded via export will take precedence over those in a .env file.

Available Settings

The following table details the configuration options available:

Environment Variable Description Default
OPENAI_API_KEY or ANTHROPIC_API_KEY (Required) Your OpenAI API key. None
TABLE_AI_PROVIDER The AI provider to use for table categorization. openai
TABLE_MODEL The specific AI model to use. gpt-4o
TABLE_TEMPERATURE AI model temperature (e.g., 0.0 to 2.0). 0.3
TABLE_MAX_TOKENS Maximum tokens for the AI response. 4000

Method 1: Using a .env File (Recommended)

For consistent configuration within your project, create a file named .env in the root directory and add your settings. This method is ideal for storing API keys and project-wide defaults.

  1. Create a file named .env in the root of your project.
  2. Add the key-value pairs for the settings you wish to override.

Example .env file:

# .env

# --- Required Settings ---
# Provide the API key for the provider you select below.
# For this example, we are using Anthropic.
ANTHROPIC_API_KEY="sk-your-anthropic-api-key-here"

# --- Optional Overrides for the Schema Description Agent ---
# Switch the AI provider to Anthropic
TABLE_AI_PROVIDER="anthropic"

# Use a different model from the new provider
TABLE_MODEL="claude-3-haiku-20240307"

# Use a higher temperature for potentially more descriptive responses
TABLE_TEMPERATURE=0.7```

Testing

The agent uses pytest for testing.

Running All Tests

# Run all tests
pytest

Running a Single Test File

To run a specific test file, provide the path to the file:

# Example: Run the main API test
pytest tests/test_agent_new_api.py
# Example: Run the new feature test
pytest tests/test_agent_new_feature.py

The tests are located in the tests/ directory. The main test files for the agent's API are tests/test_agent_new_api.py and tests/test_agent_new_feature.py.

Quick Start

To see a quick demonstration, run the provided example script from the root of the project directory. This will execute the agent with pre-defined metadata and print the result.

python example/basic_usage.py

Here's how to use the agent to categorize tables based on their descriptions:

from table_categorization_agent import TableCategorizationAgent

# 1. Initialize the agent
agent = TableCategorizationAgent()

# 2. Define descriptions for your tables
table_descriptions = {
    "table1": "This table stores borrower profile information, including personal details, contact information, identification numbers, and credit-related attributes. Each row represents a unique borrower record used for managing borrower data.",
    "table2": "This table records loan modification details, including references to borrower and loan IDs, modification attributes, updated terms, and approval information. Each row represents a specific modification event for a loan.",
    "table3": "This table captures loan payment transactions, including breakdowns of principal, interest, insurance, and tax components. Each row represents a transaction with associated loan reference, payment details, and status."
}

# 3. Define your task
# You need a domain schema file 
task_data = {
    "domain_schema": "path/to/your/domain_schema.json",
    "tables_metadata": table_descriptions
}

# 4. Execute the task
# This will make a call to an LLM. Ensure you have the necessary API keys configured.
output = agent.execute_task(task_data)

# 5. Print the result
print(output)

Domain Schema Format

The agent works with domain schemas in JSON format that define entities, attributes, and relationships:

{
  "entities": [
    {
      "iri": ":Borrower_Profile",
      "label": "Borrower Profile",
      "attributes": [
        {
          "iri": ":BorrowerId",
          "label": "BorrowerId",
          "range": [":UUID"]
        }
      ]
    }
  ]
}

Prompt Management

All prompts used by this agent are centralized in src/table_categorization_agent/constants.py for easy review and modification.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For support and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table_categorization_agent-0.1.3.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

table_categorization_agent-0.1.3-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file table_categorization_agent-0.1.3.tar.gz.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b121078dc608330913ad7cf072e84e5063aaa512661a265b39c3ba6444a6b12b
MD5 f2c1c0b5bbd38a02b203ab80e27d9472
BLAKE2b-256 67642892b800ee361f74a3331a1bf2dd39c78e2535d0e42e054772bb7986d58f

See more details on using hashes here.

File details

Details for the file table_categorization_agent-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f2e43db172aad41c881d932017e1202be43f797c7cbee9f734cbb596c6143de0
MD5 951831a57499fe21eb150f529766c56f
BLAKE2b-256 d485119d8214fbd56dfacda9cb6ff5c9113f606acf8ec5a8c9aa197af7b1e10b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page