Skip to main content

LLM-driven agent for describing data tables based on domain schemas

Project description

Schema Description Agent

The Schema Description Agent is a Python-based tool that automatically generates descriptions for tables and their columns. It analyzes the structure and content of a data file, and then uses a Large Language Model (LLM) to produce accurate and concise documentation.

Features

  • Statistical Analysis: Automatically calculates key statistics for each column, such as row count, column count, duplicate rows, missing cells, and more.
  • AI-Powered Descriptions: Leverages LLMs to generate human-readable descriptions for tables and columns based on the statistical analysis.
  • Configurable: Easily configure the AI provider, model, and other parameters.
  • Extensible: Built on a modular framework (sfn_blueprint) that allows for easy extension and integration.

Installation

Prerequisites

  • uv – package & environment manager
    Please refer to the official installation guide for the most up-to-date instructions.
    For quick setup on macOS/Linux, you can currently use:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • Git

Steps

  1. Clone the repository:

    git clone https://github.com/stepfnAI/schema_description_agent.git
    cd schema_description_agent
    git switch dev
    
  2. Create virtual environment and install dependencies:

    uv sync --extra dev
    source .venv/bin/activate
    
  3. Clone and install the blueprint dependency: The agent requires the sfn_blueprint library. Clone it into a sibling directory.

    cd ../
    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    
  4. Return to the agent directory:

    cd ../schema_description_agent
    
  5. ** set environment variables:**

    export OPENAI_API_KEY='your_openai_api_key'
    

Testing

To run the tests, use the following command from the root of the schema_description_agent directory:

# Run all tests
pytest tests/ -s

# test agent    
pytest tests/test_agent.py -s

# test agent with sample data
pytest tests/test_agent_with_data.py -s

Usage

Here is a simple example of how to use the agent:

python examples/basic_usage.py

Configuration

The agent can be configured via the SchemaDescriptionConfig class. You can modify the default configuration by passing a SchemaDescriptionConfig object to the SchemaDescriptionAgent constructor.

Default Configuration:

  • ai_provider: "openai"
  • model_name: "gpt-4o"
  • temperature: 0.3
  • max_tokens: 4000

Example of custom configuration:

from schema_description_agent import SchemaDescriptionAgent, SchemaDescriptionConfig

# Create a custom configuration
config = SchemaDescriptionConfig(
    ai_provider="anthropic",
    model_name="claude-3-opus-20240229",
    temperature=0.5
)

# Create an instance of the agent with the custom configuration
agent = SchemaDescriptionAgent(config=config)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_description_agent-0.1.1.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

schema_description_agent-0.1.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file schema_description_agent-0.1.1.tar.gz.

File metadata

  • Download URL: schema_description_agent-0.1.1.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for schema_description_agent-0.1.1.tar.gz
Algorithm Hash digest
SHA256 31992fa155047b174b0cb8d4ea8c42a79e562b064f83f5d4b0242c7d388113c5
MD5 def4d3d0a099bf94985fffd20c9fdcce
BLAKE2b-256 c9736e8b7c547296121db49f16cd5b0901210829ecf558c3cd6bc770f1ecf331

See more details on using hashes here.

File details

Details for the file schema_description_agent-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for schema_description_agent-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 25c8ab4f772b378db2422caa962b79b6a569a7372517521fb97be109fd072892
MD5 152841cf2db92a34c54ee61ea40495b3
BLAKE2b-256 4138d782403f71ef99fceda3cb6a42c35a217736a383d034c5e5a8caa779a62a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page