LLM-driven agent for categorizing data tables based on domain schemas
Project description
Table Categorization Agent
An LLM-driven agent that analyzes data tables and categorizes them based on domain schemas. This agent identifies entities, attributes, and relationships from data and maps them to domain concepts.
Features
- LLM-Driven Analysis: Uses advanced language models to understand table descriptions and content.
- Domain Schema Integration: Maps tables to domain entities using JSON schema definitions.
- Metadata-based Categorization: Categorizes tables based on their descriptions and metadata.
Installation
Prerequisites
- Git access to required repositories
- uv – package & environment manager
Please refer to the official installation guide for the most up-to-date instructions.
For quick setup on macOS/Linux, you can currently use:curl -LsSf https://astral.sh/uv/install.sh | sh
- OpenAI API key
Step-by-Step Installation
-
Clone the main repository:
git clone https://github.com/stepfnAI/table_categorization_agent.git cd table_categorization_agent git switch dev uv sync source .venv/bin/activate cd ../
-
Install blueprint:
git clone https://github.com/stepfnAI/sfn_blueprint.git cd sfn_blueprint git switch dev uv pip install -e .
-
Configure environment:
export OPENAI_API_KEY="your-api-key-here"
Testing
The agent uses pytest for testing.
Running All Tests
# Run all tests
pytest
Running a Single Test File
To run a specific test file, provide the path to the file:
# Example: Run the main API test
pytest tests/test_agent_new_api.py
# Example: Run the new feature test
pytest tests/test_agent_new_feature.py
The tests are located in the tests/ directory. The main test files for the agent's API are tests/test_agent_new_api.py and tests/test_agent_new_feature.py.
Quick Start
To see a quick demonstration, run the provided example script from the root of the project directory. This will execute the agent with pre-defined metadata and print the result.
python example/basic_usage.py
Here's how to use the agent to categorize tables based on their descriptions:
from table_categorization_agent import TableCategorizationAgent
# 1. Initialize the agent
agent = TableCategorizationAgent()
# 2. Define descriptions for your tables
table_descriptions = {
"table1": "This table stores borrower profile information, including personal details, contact information, identification numbers, and credit-related attributes. Each row represents a unique borrower record used for managing borrower data.",
"table2": "This table records loan modification details, including references to borrower and loan IDs, modification attributes, updated terms, and approval information. Each row represents a specific modification event for a loan.",
"table3": "This table captures loan payment transactions, including breakdowns of principal, interest, insurance, and tax components. Each row represents a transaction with associated loan reference, payment details, and status."
}
# 3. Define your task
# You need a domain schema file
task_data = {
"domain_schema": "path/to/your/domain_schema.json",
"tables_metadata": table_descriptions
}
# 4. Execute the task
# This will make a call to an LLM. Ensure you have the necessary API keys configured.
output = agent.execute_task(task_data)
# 5. Print the result
print(output)
Domain Schema Format
The agent works with domain schemas in JSON format that define entities, attributes, and relationships:
{
"entities": [
{
"iri": ":Borrower_Profile",
"label": "Borrower Profile",
"attributes": [
{
"iri": ":BorrowerId",
"label": "BorrowerId",
"range": [":UUID"]
}
]
}
]
}
Prompt Management
All prompts used by this agent are centralized in src/table_categorization_agent/constants.py for easy review and modification.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
License
MIT License - see LICENSE file for details.
Support
For support and questions:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file table_categorization_agent-0.1.0.tar.gz.
File metadata
- Download URL: table_categorization_agent-0.1.0.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d731cf3ea195bb4ca9db57dc5faf62fa1b7dbf76eddecf67b0d16ddea66e98a
|
|
| MD5 |
4dadc867ff1b411e5e47da6c2bc964a2
|
|
| BLAKE2b-256 |
1e7a114d8d036803a8a8242f06a9ed590f54802602e44bbe5ddf28897a1e4d0b
|
File details
Details for the file table_categorization_agent-0.1.0-py3-none-any.whl.
File metadata
- Download URL: table_categorization_agent-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a55824c2e365d9917931d8221f55d827f1916e96aa9ce66450472e4ce4fb059d
|
|
| MD5 |
9b97298ba5b0a5aa94119ef0dbe74940
|
|
| BLAKE2b-256 |
a2163e1a07d25928cd657c01208653dd7cf46149aa1a294ceafee65192284192
|