Skip to main content

LLM-driven agent for categorizing data tables based on domain schemas

Project description

Table Categorization Agent

An LLM-driven agent that analyzes data tables and categorizes them based on domain schemas. This agent identifies entities, attributes, and relationships from data and maps them to domain concepts.

Features

  • LLM-Driven Analysis: Uses advanced language models to understand table descriptions and content.
  • Domain Schema Integration: Maps tables to domain entities using JSON schema definitions.
  • Metadata-based Categorization: Categorizes tables based on their descriptions and metadata.

Installation

Prerequisites

  • Git access to required repositories
  • uv – package & environment manager
    Please refer to the official installation guide for the most up-to-date instructions.
    For quick setup on macOS/Linux, you can currently use:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • OpenAI API key

Step-by-Step Installation

  1. Clone the main repository:

    git clone https://github.com/stepfnAI/table_categorization_agent.git
    cd table_categorization_agent
    git switch dev
    uv sync
    source .venv/bin/activate
    cd ../
    
  2. Install blueprint:

    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    
  3. Configure environment:

    export OPENAI_API_KEY="your-api-key-here"
    

Testing

The agent uses pytest for testing.

Running All Tests

# Run all tests
pytest

Running a Single Test File

To run a specific test file, provide the path to the file:

# Example: Run the main API test
pytest tests/test_agent_new_api.py
# Example: Run the new feature test
pytest tests/test_agent_new_feature.py

The tests are located in the tests/ directory. The main test files for the agent's API are tests/test_agent_new_api.py and tests/test_agent_new_feature.py.

Quick Start

To see a quick demonstration, run the provided example script from the root of the project directory. This will execute the agent with pre-defined metadata and print the result.

python example/basic_usage.py

Here's how to use the agent to categorize tables based on their descriptions:

from table_categorization_agent import TableCategorizationAgent

# 1. Initialize the agent
agent = TableCategorizationAgent()

# 2. Define descriptions for your tables
table_descriptions = {
    "table1": "This table stores borrower profile information, including personal details, contact information, identification numbers, and credit-related attributes. Each row represents a unique borrower record used for managing borrower data.",
    "table2": "This table records loan modification details, including references to borrower and loan IDs, modification attributes, updated terms, and approval information. Each row represents a specific modification event for a loan.",
    "table3": "This table captures loan payment transactions, including breakdowns of principal, interest, insurance, and tax components. Each row represents a transaction with associated loan reference, payment details, and status."
}

# 3. Define your task
# You need a domain schema file 
task_data = {
    "domain_schema": "path/to/your/domain_schema.json",
    "tables_metadata": table_descriptions
}

# 4. Execute the task
# This will make a call to an LLM. Ensure you have the necessary API keys configured.
output = agent.execute_task(task_data)

# 5. Print the result
print(output)

Domain Schema Format

The agent works with domain schemas in JSON format that define entities, attributes, and relationships:

{
  "entities": [
    {
      "iri": ":Borrower_Profile",
      "label": "Borrower Profile",
      "attributes": [
        {
          "iri": ":BorrowerId",
          "label": "BorrowerId",
          "range": [":UUID"]
        }
      ]
    }
  ]
}

Prompt Management

All prompts used by this agent are centralized in src/table_categorization_agent/constants.py for easy review and modification.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For support and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table_categorization_agent-0.1.2.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

table_categorization_agent-0.1.2-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file table_categorization_agent-0.1.2.tar.gz.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0d3ae1fc7275db013796ee8635dec873c2c0f43c6a5eb1c0998f81273d8d551b
MD5 9bf514409ad1ade0b147029096b96a78
BLAKE2b-256 93c36eb32664d3aaa8db71e822fa4997963e1b013175e6e8a315871da8f02f88

See more details on using hashes here.

File details

Details for the file table_categorization_agent-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3698362af14bd79e7dfb2e461275a01b263ff4d5be20477d18704a018c8c6e2a
MD5 afa438188473b5693acd8430d6753aef
BLAKE2b-256 e7e45cba4e6c8346581c7b953ff2d1940f419361c5bacafb96192886c2d0a413

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page