Skip to main content

LLM-driven agent for describing data tables based on domain schemas

Project description

Schema Description Agent

The Schema Description Agent is a Python-based tool that automatically generates descriptions for tables and their columns. It analyzes the structure and content of a data file, and then uses a Large Language Model (LLM) to produce accurate and concise documentation.

Features

  • Statistical Analysis: Automatically calculates key statistics for each column, such as row count, column count, duplicate rows, missing cells, and more.
  • AI-Powered Descriptions: Leverages LLMs to generate human-readable descriptions for tables and columns based on the statistical analysis.
  • Configurable: Easily configure the AI provider, model, and other parameters.
  • Extensible: Built on a modular framework (sfn_blueprint) that allows for easy extension and integration.

Installation

Prerequisites

  • uv – package & environment manager
    Please refer to the official installation guide for the most up-to-date instructions.
    For quick setup on macOS/Linux, you can currently use:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • Git

Steps

  1. Clone the repository:

    git clone https://github.com/stepfnAI/schema_description_agent.git
    cd schema_description_agent
    git switch review
    
  2. Create virtual environment and install dependencies:

    uv sync --extra dev
    source .venv/bin/activate
    
  3. Clone and install the blueprint dependency: The agent requires the sfn_blueprint library. Clone it into a sibling directory.

    cd ../
    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    
  4. Return to the agent directory:

    cd ../schema_description_agent
    
  5. ** set environment variables:**

    export OPENAI_API_KEY='your_openai_api_key'
    

Configuration

You can configure the agent in two ways: using a .env file for project-specific settings or by exporting environment variables for more dynamic, shell-level control. Settings loaded via export will take precedence over those in a .env file.

Available Settings

The following table details the configuration options available:

Environment Variable Description Default
OPENAI_API_KEY or ANTHROPIC_API_KEY (Required) Your OpenAI API key. None
ai_provider_schema_description The AI provider to use for generating schema descriptions. openai
model_name_schema_description The specific AI model to use for schema descriptions. gpt-4o
temperature_schema_description AI model temperature (e.g., 0.0 to 2.0). 0.3
max_tokens_schema_description Maximum tokens for the AI response. 4000

Method 1: Using a .env File (Recommended)

For consistent configuration within your project, create a file named .env in the root directory and add your settings. This method is ideal for storing API keys and project-wide defaults.

  1. Create a file named .env in the root of your project.
  2. Add the key-value pairs for the settings you wish to override.

Example .env file:

# .env

# --- Required Settings ---
# Provide the API key for the provider you select below.
# For this example, we are using Anthropic.
ANTHROPIC_API_KEY="sk-your-anthropic-api-key-here"

# --- Optional Overrides for the Schema Description Agent ---
# Switch the AI provider to Anthropic
AI_PROVIDER_SCHEMA_DESCRIPTION="anthropic"

# Use a different model from the new provider
MODEL_NAME_SCHEMA_DESCRIPTION="claude-3-haiku-20240307"

# Use a higher temperature for potentially more descriptive responses
TEMPERATURE_SCHEMA_DESCRIPTION=0.7```

Testing

To run the tests, use the following command from the root of the schema_description_agent directory:

# Run all tests
pytest tests/ -s

# test agent    
pytest tests/test_agent.py -s

# test agent with sample data
pytest tests/test_agent_with_data.py -s

Usage

Here is a simple example of how to use the agent:

python examples/basic_usage.py
from schema_description_agent import SchemaDescriptionAgent, SchemaDescriptionConfig

# Create a custom configuration
config = SchemaDescriptionConfig(
    ai_provider_schema_description="anthropic",
    model_name_schema_description="claude-3-opus-20240229",
    temperature_schema_description=0.5
)

# Create an instance of the agent with the custom configuration
agent = SchemaDescriptionAgent(config=config)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_description_agent-0.1.6.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

schema_description_agent-0.1.6-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file schema_description_agent-0.1.6.tar.gz.

File metadata

  • Download URL: schema_description_agent-0.1.6.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for schema_description_agent-0.1.6.tar.gz
Algorithm Hash digest
SHA256 76761718bac2d52b05ce4750bd2e56831bb41943a804a9babf7e99f3deede68d
MD5 479312ec7aad416da8bda8cb7395b5f6
BLAKE2b-256 f663f3f69af636eef01894f381ca53ef0174f1e67801bfe9b29f139035ccea20

See more details on using hashes here.

File details

Details for the file schema_description_agent-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for schema_description_agent-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 44d2c4502007a2be31dd2208adf293c8233d6d369f730700d65b4556ede57954
MD5 8eea089694df7bc5494dc46b48f7f8b0
BLAKE2b-256 aa6cb23da7573ca253498c4bf0a42591196a89c01a894de2cbed5de370a1f7dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page