Skip to main content

A flexible, LLM-agnostic text-to-SQL agent that works with any database

Project description

Text-to-SQL Agent

Key Advantages

1. Performance Optimization

  • Schema pre-loading eliminates redundant lookups
  • Intelligent result truncation reduces LLM token usage
  • Cached table information speeds up complex queries

2. Enhanced User Experience

  • Timestamp context: "Show me orders from last week" just works
  • Conversation metadata tracking for better follow-ups
  • Domain context for specialized behavior

3. Agnostic Design

  • Works with any LangChain BaseChatModel
  • Works with any SQL database
  • Optional domain context (generic by default)
  • No provider or database lock-in

Quick Start

Installation

# Basic installation
pip install text2sql-agent

# With specific providers
pip install text2sql-agent[bedrock]    # AWS Bedrock
pip install text2sql-agent[openai]     # OpenAI
pip install text2sql-agent[anthropic]  # Anthropic Claude
pip install text2sql-agent[ollama]     # Ollama (local)

# With databases
pip install text2sql-agent[postgresql]
pip install text2sql-agent[mysql]

Example

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_aws import ChatBedrock

# Setup (any LLM, any database)
llm = ChatBedrock(model_id="anthropic.claude-3-sonnet-20240229-v1:0")
db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")

# Create agent
agent = SQLAgent(llm=llm, db=db)

# Ask questions
result = agent.query("What tables are in the database?")
print(result["answer"])
print(result["sql_query"])
print(result["results"])  # Proper JSON

LLM Provider Examples

All examples use the same SQLAgent interface - just swap the LLM.

AWS Bedrock (Claude, Titan, etc.)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name="us-east-1"
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-aws boto3
export AWS_DEFAULT_REGION=us-east-1
# Use AWS credentials or IAM role

OpenAI (GPT-4, GPT-3.5)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.1,
    api_key=os.getenv("OPENAI_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-openai
export OPENAI_API_KEY=sk-your-key-here

Anthropic API (Claude direct)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_anthropic import ChatAnthropic
import os

llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0.1,
    api_key=os.getenv("ANTHROPIC_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-anthropic
export ANTHROPIC_API_KEY=your-key-here

Ollama (Local)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="mistral:7b",
    temperature=0.1
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-ollama
# Install Ollama: https://ollama.ai
ollama pull mistral:7b

Google Gemini

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_google_genai import ChatGoogleGenerativeAI
import os

llm = ChatGoogleGenerativeAI(
    model="models/gemini-2.0-flash-exp",
    temperature=0.1,
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-google-genai
export GOOGLE_API_KEY=your-key-here

Database Examples

All examples use the same SQLAgent interface - just swap the database.

PostgreSQL

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
import os

# Connection string format
db = JSONSerializableSQLDatabase.from_uri(
    "postgresql://username:password@localhost:5432/database_name"
)

# With environment variable (recommended)
db = JSONSerializableSQLDatabase.from_uri(os.getenv("DATABASE_URL"))

# With SSL mode
db = JSONSerializableSQLDatabase.from_uri(
    "postgresql://user:pass@host:5432/db?sslmode=require"
)

agent = SQLAgent(llm=llm, db=db)

Setup:

pip install psycopg2-binary
# or
pip install text2sql-agent[postgresql]

Connection String Formats:

postgresql://user:password@host:port/database
postgresql://user:password@host:port/database?sslmode=require

MySQL

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase

# MySQL connection
db = JSONSerializableSQLDatabase.from_uri(
    "mysql+pymysql://username:password@localhost:3306/database_name"
)

# With charset
db = JSONSerializableSQLDatabase.from_uri(
    "mysql+pymysql://user:pass@host:3306/db?charset=utf8mb4"
)

agent = SQLAgent(llm=llm, db=db)

Setup:

pip install pymysql
# or
pip install text2sql-agent[mysql]

Connection String Formats:

mysql+pymysql://user:password@host:port/database
mysql+pymysql://user:password@host:port/database?charset=utf8mb4

SQLite (Local Files)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase

# Relative path
db = JSONSerializableSQLDatabase.from_uri("sqlite:///database.db")

# Absolute path
db = JSONSerializableSQLDatabase.from_uri("sqlite:////absolute/path/to/database.db")

# In-memory (testing)
db = JSONSerializableSQLDatabase.from_uri("sqlite:///:memory:")

agent = SQLAgent(llm=llm, db=db)

Setup:

# No additional dependencies - SQLite is built-in to Python

Advanced Features

1. Schema Pre-loading (70-80% Performance Boost)

Pre-load important table schemas to eliminate redundant lookups:

# Explicit table list (recommended)
agent = SQLAgent(
    llm=llm,
    db=db,
    important_tables=["users", "orders", "products"]
)

# Auto-discovery based on domain
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="e-commerce",
    important_tables="auto"  # Finds relevant tables
)

# Cache all tables (small schemas only)
agent = SQLAgent(
    llm=llm,
    db=db,
    important_tables="all"  # Max 10 tables recommended
)

2. Async Support

Non-blocking query execution:

import asyncio

async def main():
    result = await agent.aquery("How many users registered today?")
    print(result["answer"])

# Concurrent queries
async def concurrent_queries():
    results = await asyncio.gather(
        agent.aquery("How many users?"),
        agent.aquery("How many orders?"),
        agent.aquery("What's the total revenue?")
    )
    return results

# Run async code
asyncio.run(main())

3. Enhanced Logging & Metrics

Track everything:

import logging
from text_to_sql import SQLAgent

# Configure logging
SQLAgent.configure_logging(
    level=logging.INFO,
    log_file="sql_queries.log"
)

agent = SQLAgent(llm=llm, db=db)
result = agent.query("Show me today's orders")

# Access metadata
print(result["metadata"])
# {
#   "sql_query": "SELECT * FROM orders WHERE ...",
#   "result_count": 42,
#   "tables_accessed": ["orders", "customers"],
#   "execution_time": 1.23,
#   "timestamp": "2025-12-30T..."
# }

Log Output:

QUERY_ID: 20251230_143022_12345
USER_QUERY: Show me today's orders
TIMESTAMP: 2025-12-30T14:30:22
DATABASE: postgresql
DOMAIN: e-commerce
GENERATED_SQL:
  SELECT o.*, c.name
  FROM orders o
  JOIN customers c ON o.customer_id = c.id
  WHERE o.created_at >= CURRENT_DATE
EXECUTION_TIME: 1.23s
RESULT_COUNT: 42 rows
SQL_EXECUTION_STATUS: SUCCESS

4. Timestamp Context

Automatic temporal query support:

agent = SQLAgent(llm=llm, db=db, include_timestamp=True)

# These "just work" - no manual date handling needed
agent.query("Show me orders from last week")
agent.query("How many users registered yesterday?")
agent.query("What's revenue for the past 30 days?")
agent.query("Show me appointments scheduled for tomorrow")

The agent automatically injects current date context:

Current date: 2025-12-30
Today's date: 2025-12-30
Yesterday: 2025-12-29
Last week: 2025-12-23 to 2025-12-29

5. Domain Context

Specialized behavior for different domains:

# Medical
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="medical patient records, diagnoses, medications"
)
result = agent.query("How many patients have diabetes?")

# E-commerce
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="e-commerce products, orders, customers, transactions"
)
result = agent.query("What were the top 5 selling products this week?")

# Financial
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="financial transactions, accounts, balances"
)
result = agent.query("Show me suspicious transactions over $10,000")

6. Conversation History

Better follow-up questions:

# First query
result1 = agent.query("Who are the employees in Engineering?")

# Follow-up with context
conversation_history = [
    {"role": "user", "content": "Who are the employees in Engineering?"},
    {"role": "assistant", "content": result1["answer"]}
]

result2 = agent.query(
    "What is their average salary?",  # "their" refers to Engineering employees
    conversation_history=conversation_history
)

Enhanced context tracking:

# Include metadata for better context
conversation_history = [
    {
        "role": "user",
        "content": query1,
        "metadata": result1.get("metadata", {})  # SQL query, tables accessed
    },
    {
        "role": "assistant",
        "content": result1["answer"]
    }
]

7. Error Recovery (8 Error Patterns)

User-friendly error messages with actionable suggestions:

result = agent.query("Show me data from nonexistent_table")

if "error" in result:
    print(result["error_type"])        # "table_not_found"
    print(result["answer"])             # User-friendly message
    print(result["error_suggestion"])   # Actionable steps

Handled Error Patterns:

  1. Iteration Limit - Query too complex
  2. Timeout - Database performance issues
  3. Table Not Found - Shows available tables
  4. Column Not Found - Suggests schema inspection
  5. Syntax Error - Simplification suggestions
  6. Permission Denied - Access issue guidance
  7. Connection Error - Database connectivity tips
  8. Type Mismatch - Data type guidance

8. Result Truncation

Automatically handle large result sets:

agent = SQLAgent(
    llm=llm,
    db=db,
    max_rows_for_llm=20,           # Send up to 20 rows to LLM
    large_result_threshold=100     # 100+ rows = "large"
)

# Query returns 1000 rows
result = agent.query("SELECT * FROM large_table")

# LLM only sees first 20 rows (faster, cheaper)
print(result["answer"])  # Summary based on 20 rows

# Get full results if needed
full_data = agent.get_full_results()  # All 1000 rows

9. Singleton Pattern

Resource management for production:

# Create singleton instance
agent1 = SQLAgent(llm=llm, db=db, use_singleton=True)
agent2 = SQLAgent(llm=llm, db=db, use_singleton=True)

# agent1 and agent2 are THE SAME instance
assert agent1 is agent2

# Prevents:
# - Multiple database connections
# - Redundant schema caching
# - Memory leaks

Complete Configuration

agent = SQLAgent(
    llm=llm,                           # Required: Any BaseChatModel
    db=db,                             # Required: JSONSerializableSQLDatabase
    domain_context="medical records",  # Optional: Domain specialization
    important_tables=["Patient", ...], # Optional: Schema pre-loading
    enable_schema_caching=True,        # Default: True
    max_rows_for_llm=10,               # Default: 10
    large_result_threshold=50,         # Default: 50
    verbose=True,                      # Default: False
    max_iterations=10,                 # Default: 10
    use_singleton=False,               # Default: False
    include_timestamp=True,            # Default: True
)

API Reference

SQLAgent Class

Constructor Parameters

Parameter Type Default Description
llm BaseChatModel Required Any LangChain chat model
db JSONSerializableSQLDatabase Required Database instance
domain_context str None Optional domain description
important_tables List[str] or str None Tables to pre-load ("auto", "all", or list)
enable_schema_caching bool True Enable schema pre-loading
max_rows_for_llm int 10 Max rows to send to LLM
large_result_threshold int 50 Threshold for "large" results
verbose bool False Show agent thinking
max_iterations int 10 Max agent iterations
use_singleton bool False Singleton pattern
include_timestamp bool True Timestamp context

Methods

query(question, conversation_history=None)

Execute a natural language query (synchronous).

Parameters:

  • question (str): Natural language question
  • conversation_history (List[Dict], optional): Previous conversation

Returns:

{
    "answer": "Natural language answer",
    "sql_query": "SELECT * FROM ...",
    "results": '[{"id": 1, ...}]',  # JSON string
    "intermediate_steps": [...],
    "metadata": {
        "sql_query": "...",
        "result_count": 42,
        "tables_accessed": ["orders"],
        "execution_time": 1.23,
        "timestamp": "2025-12-30T..."
    }
}
aquery(question, conversation_history=None) [async]

Execute a natural language query (asynchronous).

Parameters: Same as query()

Returns: Same as query()

Usage:

result = await agent.aquery("How many users?")
get_full_results()

Get complete untruncated results from last query.

Returns: JSON string with all rows

get_schema_info(table_names=None)

Get database schema information.

Parameters:

  • table_names (List[str], optional): Specific tables

Returns: Schema information string

get_table_names()

Get list of all table names.

Returns: List[str]

get_dialect()

Get SQL dialect of connected database.

Returns: str ("postgresql", "mysql", "sqlite", etc.)

configure_logging(level, log_file) [class method]

Configure logging for SQL queries.

Parameters:

  • level (int): logging.INFO, logging.DEBUG, etc.
  • log_file (str): Path to log file

Supported Providers & Databases

LLM Providers

  • AWS Bedrock (Claude, Titan, Llama, etc.)
  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude via API)
  • Ollama (Local: Mistral, Llama, Qwen, etc.)
  • Google (Gemini, PaLM)
  • Any custom LangChain BaseChatModel

Databases

  • PostgreSQL
  • MySQL
  • SQLite
  • Any SQLAlchemy-compatible database

How It Works

  1. Schema First: Agent checks table schemas before writing queries
  2. ReAct Pattern: Uses Thought-Action-Observation loop
  3. Standard SQL: Generates database-agnostic SQL
  4. Smart Truncation: Large results summarized for LLM
  5. Error Recovery: Auto-retries with corrections
  6. Caching: Pre-loaded schemas skip redundant lookups
  7. Metadata Tracking: Full observability of query execution

Limitations

  • Read-only: SELECT queries only (no INSERT, UPDATE, DELETE, DROP)
  • LangChain Required: Must use LangChain-compatible LLM
  • SQLAlchemy Required: Database must work with SQLAlchemy
  • JSON Serialization: Complex custom types may need manual handling

License

MIT License - see LICENSE file for details


Acknowledgments

Built on:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2sql_agent-0.1.0.tar.gz (122.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text2sql_agent-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file text2sql_agent-0.1.0.tar.gz.

File metadata

  • Download URL: text2sql_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 122.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for text2sql_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4097eae99e6f9f322e55487ebfd51f9a9d151a50c74a31bfe7c3627f118cf034
MD5 052b3124f730ceb2e4125739d6c5806a
BLAKE2b-256 ccf345355777fdbbc63dd4d45305dc6b9ea154d7a019a24e438f7e9eff7ade1b

See more details on using hashes here.

File details

Details for the file text2sql_agent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: text2sql_agent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for text2sql_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31ddb8ea32d45ac5eeeb4ef3843dd45c607d7636131cce1b48cf41c111b0ac18
MD5 e8ac24f4aa44ce20f26df30bb4b1e8a2
BLAKE2b-256 f9df4f9288facb86b76019c36200210b8ec500bbf329f460318eba388c368855

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page