Skip to main content

SQL Agent Accuracy Evaluation Framework

Project description

SQL Agent Evaluation SDK

A Python SDK for evaluating SQL Agent accuracy with comprehensive test datasets and intelligent LLM-based scoring.

中文文档 | English

Features

  • 🎯 Simple Interface: Easy-to-use SQLAgent interface
  • 📊 44 Test Questions: Covering 6 major SQL optimization scenarios
  • 🤖 LLM Scoring: Intelligent evaluation with detailed explanations
  • 📈 Auto Reports: Automatic result summary and detailed reports
  • 🔧 Flexible Config: Custom database and LLM configuration

Quick Start

1. Install & Configure

pip install -r requirements.txt

Create config/.env:

DATABASE_HOST=localhost
DATABASE_PORT=3306
DATABASE_USER=root
DATABASE_PASSWORD=your_password
DATABASE_NAME=sql_eval_test

EVAL_LLM_API_KEY=your_api_key
EVAL_LLM_BASE_URL=https://api.openai.com/v1
EVAL_LLM_MODEL=gpt-4

2. Create & Evaluate Agent

import asyncio
from sqleval import SQLAgent, SQLEvaluator

class MyAgent(SQLAgent):
    def optimize(self, sql_query: str) -> str:
        # Your optimization logic here
        return "Your optimization suggestions"
    
    def get_name(self) -> str:
        return "My Agent"

async def main():
    evaluator = SQLEvaluator()
    agent = MyAgent()
    
    # Evaluate agent (automatically displays results)
    dataset_results = await evaluator.evaluate(agent, datasets=['example'])
    
    # Extract result and save report
    result = dataset_results['example'][0]
    result.save_report("reports/my_agent_report.md")

asyncio.run(main())

Evaluation Datasets

Dataset Questions Weight Description
index_invalidation 10 20% Index usage issues (functions, type conversion, wildcards)
rule_based_traps 9 20% Scenarios that trap rule-based agents
inefficient_join 10 15% JOIN and subquery optimization
execution_plan_issues 3 15% Execution plan problems (statistics, parameter sniffing)
access_path_issues 6 15% Data access optimization (SELECT *, filtering)
resource_config_issues 6 15% Memory, partitioning, and concurrency issues

Examples

  • examples/quick_start.py - Minimal example to get started
  • examples/custom_agent_example.py - How to create custom agents
  • examples/batch_evaluation_example.py - Compare multiple agents
  • examples/validate_config.py - Validate your configuration

API Reference

SQLAgent Interface

class MyAgent(SQLAgent):
    def optimize(self, sql_query: str) -> str:
        """Return optimization suggestions for the SQL query"""
        return "Your suggestions"
    
    def get_name(self) -> str:
        """Return agent name"""
        return "My Agent"

SQLEvaluator

evaluator = SQLEvaluator()

# Single agent evaluation
dataset_results = await evaluator.evaluate(agent, datasets=['example'])

# Batch evaluation
dataset_results = await evaluator.evaluate([agent1, agent2, agent3])

Note: The evaluate method automatically displays comparison results and saves combined reports (*_combined_report.md). For individual dataset reports, call result.save_report().

Project Structure

sqleval/
├── sqleval/                 # Core SDK
│   ├── core/
│   │   ├── agent.py        # SQLAgent base class
│   │   ├── evaluator.py    # SQLEvaluator main class
│   │   └── result.py       # EvaluationResult class
├── datasets/               # Evaluation datasets
│   ├── index_invalidation/ # Index usage scenarios
│   ├── rule_based_traps/   # Rule-based agent traps
│   ├── inefficient_join/   # JOIN optimization
│   ├── execution_plan_issues/ # Execution plan problems
│   ├── access_path_issues/ # Data access optimization
│   ├── resource_config_issues/ # Resource & config issues
│   └── meta.txt           # Dataset weights
├── examples/              # Usage examples
└── reports/               # Generated reports

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

david_sqleval-0.1.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

david_sqleval-0.1.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file david_sqleval-0.1.0.tar.gz.

File metadata

  • Download URL: david_sqleval-0.1.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for david_sqleval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5c4e6b15c5adbe2b061c3213e165069f903c50c14f7df4c817af909546ff8a8d
MD5 adfc53eef7f8a53ba18e54441adf1aae
BLAKE2b-256 7563e73baeddedb428054321deb80602a00478f8ae718a7838a83dd0ab2bcdae

See more details on using hashes here.

Provenance

The following attestation bundles were made for david_sqleval-0.1.0.tar.gz:

Publisher: publish.yml on davidzhangbj/sqleval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file david_sqleval-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: david_sqleval-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for david_sqleval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90e455372176a2d259777987f42722c6e913634206d86ab5526debd56322080c
MD5 d62cdf9f99caf89cbfa2d8d79536a02a
BLAKE2b-256 417626c68ce3ace2c61770e4a4f513242149c4c024eb9c6eb97d7c8b30734805

See more details on using hashes here.

Provenance

The following attestation bundles were made for david_sqleval-0.1.0-py3-none-any.whl:

Publisher: publish.yml on davidzhangbj/sqleval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page