LLM-driven agent for describing data tables based on domain schemas
Project description
Schema Description Agent
The Schema Description Agent is a Python-based tool that automatically generates descriptions for tables and their columns. It analyzes the structure and content of a data file, and then uses a Large Language Model (LLM) to produce accurate and concise documentation.
Features
- Statistical Analysis: Automatically calculates key statistics for each column, such as row count, column count, duplicate rows, missing cells, and more.
- AI-Powered Descriptions: Leverages LLMs to generate human-readable descriptions for tables and columns based on the statistical analysis.
- Configurable: Easily configure the AI provider, model, and other parameters.
- Extensible: Built on a modular framework (
sfn_blueprint) that allows for easy extension and integration.
Installation
Prerequisites
- uv – package & environment manager
Please refer to the official installation guide for the most up-to-date instructions.
For quick setup on macOS/Linux, you can currently use:curl -LsSf https://astral.sh/uv/install.sh | sh
- Git
Steps
-
Clone the repository:
git clone https://github.com/stepfnAI/schema_description_agent.git cd schema_description_agent git switch dev
-
Create virtual environment and install dependencies:
uv sync --extra dev source .venv/bin/activate
-
Clone and install the blueprint dependency: The agent requires the
sfn_blueprintlibrary. Clone it into a sibling directory.cd ../ git clone https://github.com/stepfnAI/sfn_blueprint.git cd sfn_blueprint git switch dev uv pip install -e .
-
Return to the agent directory:
cd ../schema_description_agent
-
** set environment variables:**
export OPENAI_API_KEY='your_openai_api_key'
Testing
To run the tests, use the following command from the root of the schema_description_agent directory:
# Run all tests
pytest tests/ -s
# test agent
pytest tests/test_agent.py -s
# test agent with sample data
pytest tests/test_agent_with_data.py -s
Usage
Here is a simple example of how to use the agent:
python examples/basic_usage.py
Configuration
The agent can be configured via the SchemaDescriptionConfig class. You can modify the default configuration by passing a SchemaDescriptionConfig object to the SchemaDescriptionAgent constructor.
Default Configuration:
ai_provider: "openai"model_name: "gpt-4o"temperature: 0.3max_tokens: 4000
Example of custom configuration:
from schema_description_agent import SchemaDescriptionAgent, SchemaDescriptionConfig
# Create a custom configuration
config = SchemaDescriptionConfig(
ai_provider="anthropic",
model_name="claude-3-opus-20240229",
temperature=0.5
)
# Create an instance of the agent with the custom configuration
agent = SchemaDescriptionAgent(config=config)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schema_description_agent-0.1.1.tar.gz.
File metadata
- Download URL: schema_description_agent-0.1.1.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31992fa155047b174b0cb8d4ea8c42a79e562b064f83f5d4b0242c7d388113c5
|
|
| MD5 |
def4d3d0a099bf94985fffd20c9fdcce
|
|
| BLAKE2b-256 |
c9736e8b7c547296121db49f16cd5b0901210829ecf558c3cd6bc770f1ecf331
|
File details
Details for the file schema_description_agent-0.1.1-py3-none-any.whl.
File metadata
- Download URL: schema_description_agent-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25c8ab4f772b378db2422caa962b79b6a569a7372517521fb97be109fd072892
|
|
| MD5 |
152841cf2db92a34c54ee61ea40495b3
|
|
| BLAKE2b-256 |
4138d782403f71ef99fceda3cb6a42c35a217736a383d034c5e5a8caa779a62a
|