Skip to main content

LLM-driven agent for categorizing data tables based on domain schemas

Project description

Table Categorization Agent

An LLM-driven agent that analyzes data tables and categorizes them based on domain schemas. This agent identifies entities, attributes, and relationships from data and maps them to domain concepts.

Features

  • LLM-Driven Analysis: Uses advanced language models to understand table descriptions and content.
  • Domain Schema Integration: Maps tables to domain entities using JSON schema definitions.
  • Metadata-based Categorization: Categorizes tables based on their descriptions and metadata.

Installation

Prerequisites

  • Git access to required repositories
  • uv – package & environment manager
    Please refer to the official installation guide for the most up-to-date instructions.
    For quick setup on macOS/Linux, you can currently use:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • OpenAI API key

Step-by-Step Installation

  1. Clone the main repository:

    git clone https://github.com/stepfnAI/table_categorization_agent.git
    cd table_categorization_agent
    git switch dev
    uv sync
    source .venv/bin/activate
    cd ../
    
  2. Install blueprint:

    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    
  3. Configure environment:

    export OPENAI_API_KEY="your-api-key-here"
    

Testing

The agent uses pytest for testing.

Running All Tests

# Run all tests
pytest

Running a Single Test File

To run a specific test file, provide the path to the file:

# Example: Run the main API test
pytest tests/test_agent_new_api.py
# Example: Run the new feature test
pytest tests/test_agent_new_feature.py

The tests are located in the tests/ directory. The main test files for the agent's API are tests/test_agent_new_api.py and tests/test_agent_new_feature.py.

Quick Start

To see a quick demonstration, run the provided example script from the root of the project directory. This will execute the agent with pre-defined metadata and print the result.

python example/basic_usage.py

Here's how to use the agent to categorize tables based on their descriptions:

from table_categorization_agent import TableCategorizationAgent

# 1. Initialize the agent
agent = TableCategorizationAgent()

# 2. Define descriptions for your tables
table_descriptions = {
    "table1": "This table stores borrower profile information, including personal details, contact information, identification numbers, and credit-related attributes. Each row represents a unique borrower record used for managing borrower data.",
    "table2": "This table records loan modification details, including references to borrower and loan IDs, modification attributes, updated terms, and approval information. Each row represents a specific modification event for a loan.",
    "table3": "This table captures loan payment transactions, including breakdowns of principal, interest, insurance, and tax components. Each row represents a transaction with associated loan reference, payment details, and status."
}

# 3. Define your task
# You need a domain schema file 
task_data = {
    "domain_schema": "path/to/your/domain_schema.json",
    "tables_metadata": table_descriptions
}

# 4. Execute the task
# This will make a call to an LLM. Ensure you have the necessary API keys configured.
output = agent.execute_task(task_data)

# 5. Print the result
print(output)

Domain Schema Format

The agent works with domain schemas in JSON format that define entities, attributes, and relationships:

{
  "entities": [
    {
      "iri": ":Borrower_Profile",
      "label": "Borrower Profile",
      "attributes": [
        {
          "iri": ":BorrowerId",
          "label": "BorrowerId",
          "range": [":UUID"]
        }
      ]
    }
  ]
}

Prompt Management

All prompts used by this agent are centralized in src/table_categorization_agent/constants.py for easy review and modification.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For support and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table_categorization_agent-0.1.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

table_categorization_agent-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file table_categorization_agent-0.1.0.tar.gz.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7d731cf3ea195bb4ca9db57dc5faf62fa1b7dbf76eddecf67b0d16ddea66e98a
MD5 4dadc867ff1b411e5e47da6c2bc964a2
BLAKE2b-256 1e7a114d8d036803a8a8242f06a9ed590f54802602e44bbe5ddf28897a1e4d0b

See more details on using hashes here.

File details

Details for the file table_categorization_agent-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a55824c2e365d9917931d8221f55d827f1916e96aa9ce66450472e4ce4fb059d
MD5 9b97298ba5b0a5aa94119ef0dbe74940
BLAKE2b-256 a2163e1a07d25928cd657c01208653dd7cf46149aa1a294ceafee65192284192

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page