Skip to main content

LLM-driven agent for categorizing data tables based on domain schemas

Project description

Table Categorization Agent

An LLM-driven agent that analyzes data tables and categorizes them based on domain schemas. This agent identifies entities, attributes, and relationships from data and maps them to domain concepts.

Features

  • LLM-Driven Analysis: Uses advanced language models to understand table descriptions and content.
  • Domain Schema Integration: Maps tables to domain entities using JSON schema definitions.
  • Metadata-based Categorization: Categorizes tables based on their descriptions and metadata.

Installation

Prerequisites

  • Git access to required repositories
  • uv – package & environment manager
    Please refer to the official installation guide for the most up-to-date instructions.
    For quick setup on macOS/Linux, you can currently use:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • OpenAI API key

Step-by-Step Installation

  1. Clone the main repository:

    git clone https://github.com/stepfnAI/table_categorization_agent.git
    cd table_categorization_agent
    git switch dev
    uv sync
    source .venv/bin/activate
    cd ../
    
  2. Install blueprint:

    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    
  3. Configure environment:

    export OPENAI_API_KEY="your-api-key-here"
    

Testing

The agent uses pytest for testing.

Running All Tests

# Run all tests
pytest

Running a Single Test File

To run a specific test file, provide the path to the file:

# Example: Run the main API test
pytest tests/test_agent_new_api.py
# Example: Run the new feature test
pytest tests/test_agent_new_feature.py

The tests are located in the tests/ directory. The main test files for the agent's API are tests/test_agent_new_api.py and tests/test_agent_new_feature.py.

Quick Start

To see a quick demonstration, run the provided example script from the root of the project directory. This will execute the agent with pre-defined metadata and print the result.

python example/basic_usage.py

Here's how to use the agent to categorize tables based on their descriptions:

from table_categorization_agent import TableCategorizationAgent

# 1. Initialize the agent
agent = TableCategorizationAgent()

# 2. Define descriptions for your tables
table_descriptions = {
    "table1": "This table stores borrower profile information, including personal details, contact information, identification numbers, and credit-related attributes. Each row represents a unique borrower record used for managing borrower data.",
    "table2": "This table records loan modification details, including references to borrower and loan IDs, modification attributes, updated terms, and approval information. Each row represents a specific modification event for a loan.",
    "table3": "This table captures loan payment transactions, including breakdowns of principal, interest, insurance, and tax components. Each row represents a transaction with associated loan reference, payment details, and status."
}

# 3. Define your task
# You need a domain schema file 
task_data = {
    "domain_schema": "path/to/your/domain_schema.json",
    "tables_metadata": table_descriptions
}

# 4. Execute the task
# This will make a call to an LLM. Ensure you have the necessary API keys configured.
output = agent.execute_task(task_data)

# 5. Print the result
print(output)

Domain Schema Format

The agent works with domain schemas in JSON format that define entities, attributes, and relationships:

{
  "entities": [
    {
      "iri": ":Borrower_Profile",
      "label": "Borrower Profile",
      "attributes": [
        {
          "iri": ":BorrowerId",
          "label": "BorrowerId",
          "range": [":UUID"]
        }
      ]
    }
  ]
}

Prompt Management

All prompts used by this agent are centralized in src/table_categorization_agent/constants.py for easy review and modification.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For support and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table_categorization_agent-0.1.1.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

table_categorization_agent-0.1.1-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file table_categorization_agent-0.1.1.tar.gz.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.1.tar.gz
Algorithm Hash digest
SHA256 18585989ad5357715b985fdf162cbcb516fb9c65cd733dc4bf83d9c93974bf58
MD5 f3726ce91e3f65dc32aef141fff391ff
BLAKE2b-256 6a715deddcf54c98d0076ffbcdf60280078a4f51020c142c8c6956fdce476a22

See more details on using hashes here.

File details

Details for the file table_categorization_agent-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for table_categorization_agent-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5de4a8d7114bd94de5429b3abd91b8ebd829a6c7d065b741c72f1ff06647d928
MD5 98c02a01a85431c629bedc4fd0ecab23
BLAKE2b-256 04ffb0f65bf3af068b54c735cf7df5744b7461a4dfb9fc3b8cc8ec4f7a78e4b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page