A flexible, LLM-agnostic text-to-SQL agent that works with any database

These details have not been verified by PyPI

Project description

Text-to-SQL Agent

Key Advantages

1. Performance Optimization

Schema pre-loading eliminates redundant lookups
Intelligent result truncation reduces LLM token usage
Cached table information speeds up complex queries

2. Enhanced User Experience

Timestamp context: "Show me orders from last week" just works
Conversation metadata tracking for better follow-ups
Domain context for specialized behavior

3. Agnostic Design

Works with any LangChain BaseChatModel
Works with any SQL database
Optional domain context (generic by default)
No provider or database lock-in

Quick Start

Installation

# Basic installation
pip install text2sql-agent

# With specific providers
pip install text2sql-agent[bedrock]    # AWS Bedrock
pip install text2sql-agent[openai]     # OpenAI
pip install text2sql-agent[anthropic]  # Anthropic Claude
pip install text2sql-agent[ollama]     # Ollama (local)

# With databases
pip install text2sql-agent[postgresql]
pip install text2sql-agent[mysql]

Example

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_aws import ChatBedrock

# Setup (any LLM, any database)
llm = ChatBedrock(model_id="anthropic.claude-3-sonnet-20240229-v1:0")
db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")

# Create agent
agent = SQLAgent(llm=llm, db=db)

# Ask questions
result = agent.query("What tables are in the database?")
print(result["answer"])
print(result["sql_query"])
print(result["results"])  # Proper JSON

LLM Provider Examples

All examples use the same SQLAgent interface - just swap the LLM.

AWS Bedrock (Claude, Titan, etc.)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name="us-east-1"
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-aws boto3
export AWS_DEFAULT_REGION=us-east-1
# Use AWS credentials or IAM role

OpenAI (GPT-4, GPT-3.5)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.1,
    api_key=os.getenv("OPENAI_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-openai
export OPENAI_API_KEY=sk-your-key-here

Anthropic API (Claude direct)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_anthropic import ChatAnthropic
import os

llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0.1,
    api_key=os.getenv("ANTHROPIC_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-anthropic
export ANTHROPIC_API_KEY=your-key-here

Ollama (Local)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="mistral:7b",
    temperature=0.1
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-ollama
# Install Ollama: https://ollama.ai
ollama pull mistral:7b

Google Gemini

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
from langchain_google_genai import ChatGoogleGenerativeAI
import os

llm = ChatGoogleGenerativeAI(
    model="models/gemini-2.0-flash-exp",
    temperature=0.1,
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

db = JSONSerializableSQLDatabase.from_uri("postgresql://user:pass@host/db")
agent = SQLAgent(llm=llm, db=db)

Setup:

pip install langchain-google-genai
export GOOGLE_API_KEY=your-key-here

Database Examples

All examples use the same SQLAgent interface - just swap the database.

PostgreSQL

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase
import os

# Connection string format
db = JSONSerializableSQLDatabase.from_uri(
    "postgresql://username:password@localhost:5432/database_name"
)

# With environment variable (recommended)
db = JSONSerializableSQLDatabase.from_uri(os.getenv("DATABASE_URL"))

# With SSL mode
db = JSONSerializableSQLDatabase.from_uri(
    "postgresql://user:pass@host:5432/db?sslmode=require"
)

agent = SQLAgent(llm=llm, db=db)

Setup:

pip install psycopg2-binary
# or
pip install text2sql-agent[postgresql]

Connection String Formats:

postgresql://user:password@host:port/database
postgresql://user:password@host:port/database?sslmode=require

MySQL

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase

# MySQL connection
db = JSONSerializableSQLDatabase.from_uri(
    "mysql+pymysql://username:password@localhost:3306/database_name"
)

# With charset
db = JSONSerializableSQLDatabase.from_uri(
    "mysql+pymysql://user:pass@host:3306/db?charset=utf8mb4"
)

agent = SQLAgent(llm=llm, db=db)

Setup:

pip install pymysql
# or
pip install text2sql-agent[mysql]

Connection String Formats:

mysql+pymysql://user:password@host:port/database
mysql+pymysql://user:password@host:port/database?charset=utf8mb4

SQLite (Local Files)

from text_to_sql import SQLAgent, JSONSerializableSQLDatabase

# Relative path
db = JSONSerializableSQLDatabase.from_uri("sqlite:///database.db")

# Absolute path
db = JSONSerializableSQLDatabase.from_uri("sqlite:////absolute/path/to/database.db")

# In-memory (testing)
db = JSONSerializableSQLDatabase.from_uri("sqlite:///:memory:")

agent = SQLAgent(llm=llm, db=db)

Setup:

# No additional dependencies - SQLite is built-in to Python

Advanced Features

1. Schema Pre-loading (70-80% Performance Boost)

Pre-load important table schemas to eliminate redundant lookups:

# Explicit table list (recommended)
agent = SQLAgent(
    llm=llm,
    db=db,
    important_tables=["users", "orders", "products"]
)

# Auto-discovery based on domain
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="e-commerce",
    important_tables="auto"  # Finds relevant tables
)

# Cache all tables (small schemas only)
agent = SQLAgent(
    llm=llm,
    db=db,
    important_tables="all"  # Max 10 tables recommended
)

2. Async Support

Non-blocking query execution:

import asyncio

async def main():
    result = await agent.aquery("How many users registered today?")
    print(result["answer"])

# Concurrent queries
async def concurrent_queries():
    results = await asyncio.gather(
        agent.aquery("How many users?"),
        agent.aquery("How many orders?"),
        agent.aquery("What's the total revenue?")
    )
    return results

# Run async code
asyncio.run(main())

3. Enhanced Logging & Metrics

Track everything:

import logging
from text_to_sql import SQLAgent

# Configure logging
SQLAgent.configure_logging(
    level=logging.INFO,
    log_file="sql_queries.log"
)

agent = SQLAgent(llm=llm, db=db)
result = agent.query("Show me today's orders")

# Access metadata
print(result["metadata"])
# {
#   "sql_query": "SELECT * FROM orders WHERE ...",
#   "result_count": 42,
#   "tables_accessed": ["orders", "customers"],
#   "execution_time": 1.23,
#   "timestamp": "2025-12-30T..."
# }

Log Output:

QUERY_ID: 20251230_143022_12345
USER_QUERY: Show me today's orders
TIMESTAMP: 2025-12-30T14:30:22
DATABASE: postgresql
DOMAIN: e-commerce
GENERATED_SQL:
  SELECT o.*, c.name
  FROM orders o
  JOIN customers c ON o.customer_id = c.id
  WHERE o.created_at >= CURRENT_DATE
EXECUTION_TIME: 1.23s
RESULT_COUNT: 42 rows
SQL_EXECUTION_STATUS: SUCCESS

4. Timestamp Context

Automatic temporal query support:

agent = SQLAgent(llm=llm, db=db, include_timestamp=True)

# These "just work" - no manual date handling needed
agent.query("Show me orders from last week")
agent.query("How many users registered yesterday?")
agent.query("What's revenue for the past 30 days?")
agent.query("Show me appointments scheduled for tomorrow")

The agent automatically injects current date context:

Current date: 2025-12-30
Today's date: 2025-12-30
Yesterday: 2025-12-29
Last week: 2025-12-23 to 2025-12-29

5. Domain Context

Specialized behavior for different domains:

# Medical
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="medical patient records, diagnoses, medications"
)
result = agent.query("How many patients have diabetes?")

# E-commerce
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="e-commerce products, orders, customers, transactions"
)
result = agent.query("What were the top 5 selling products this week?")

# Financial
agent = SQLAgent(
    llm=llm,
    db=db,
    domain_context="financial transactions, accounts, balances"
)
result = agent.query("Show me suspicious transactions over $10,000")

6. Conversation History

Better follow-up questions:

# First query
result1 = agent.query("Who are the employees in Engineering?")

# Follow-up with context
conversation_history = [
    {"role": "user", "content": "Who are the employees in Engineering?"},
    {"role": "assistant", "content": result1["answer"]}
]

result2 = agent.query(
    "What is their average salary?",  # "their" refers to Engineering employees
    conversation_history=conversation_history
)

Enhanced context tracking:

# Include metadata for better context
conversation_history = [
    {
        "role": "user",
        "content": query1,
        "metadata": result1.get("metadata", {})  # SQL query, tables accessed
    },
    {
        "role": "assistant",
        "content": result1["answer"]
    }
]

7. Error Recovery (8 Error Patterns)

User-friendly error messages with actionable suggestions:

result = agent.query("Show me data from nonexistent_table")

if "error" in result:
    print(result["error_type"])        # "table_not_found"
    print(result["answer"])             # User-friendly message
    print(result["error_suggestion"])   # Actionable steps

Handled Error Patterns:

Iteration Limit - Query too complex
Timeout - Database performance issues
Table Not Found - Shows available tables
Column Not Found - Suggests schema inspection
Syntax Error - Simplification suggestions
Permission Denied - Access issue guidance
Connection Error - Database connectivity tips
Type Mismatch - Data type guidance

8. Result Truncation

Automatically handle large result sets:

agent = SQLAgent(
    llm=llm,
    db=db,
    max_rows_for_llm=20,           # Send up to 20 rows to LLM
    large_result_threshold=100     # 100+ rows = "large"
)

# Query returns 1000 rows
result = agent.query("SELECT * FROM large_table")

# LLM only sees first 20 rows (faster, cheaper)
print(result["answer"])  # Summary based on 20 rows

# Get full results if needed
full_data = agent.get_full_results()  # All 1000 rows

9. Singleton Pattern

Resource management for production:

# Create singleton instance
agent1 = SQLAgent(llm=llm, db=db, use_singleton=True)
agent2 = SQLAgent(llm=llm, db=db, use_singleton=True)

# agent1 and agent2 are THE SAME instance
assert agent1 is agent2

# Prevents:
# - Multiple database connections
# - Redundant schema caching
# - Memory leaks

Complete Configuration

agent = SQLAgent(
    llm=llm,                           # Required: Any BaseChatModel
    db=db,                             # Required: JSONSerializableSQLDatabase
    domain_context="medical records",  # Optional: Domain specialization
    important_tables=["Patient", ...], # Optional: Schema pre-loading
    enable_schema_caching=True,        # Default: True
    max_rows_for_llm=10,               # Default: 10
    large_result_threshold=50,         # Default: 50
    verbose=True,                      # Default: False
    max_iterations=10,                 # Default: 10
    use_singleton=False,               # Default: False
    include_timestamp=True,            # Default: True
)

API Reference

SQLAgent Class

Constructor Parameters

Parameter	Type	Default	Description
`llm`	`BaseChatModel`	Required	Any LangChain chat model
`db`	`JSONSerializableSQLDatabase`	Required	Database instance
`domain_context`	`str`	`None`	Optional domain description
`important_tables`	`List[str]` or `str`	`None`	Tables to pre-load ("auto", "all", or list)
`enable_schema_caching`	`bool`	`True`	Enable schema pre-loading
`max_rows_for_llm`	`int`	`10`	Max rows to send to LLM
`large_result_threshold`	`int`	`50`	Threshold for "large" results
`verbose`	`bool`	`False`	Show agent thinking
`max_iterations`	`int`	`10`	Max agent iterations
`use_singleton`	`bool`	`False`	Singleton pattern
`include_timestamp`	`bool`	`True`	Timestamp context

Methods

`query(question, conversation_history=None)`

Execute a natural language query (synchronous).

Parameters:

question (str): Natural language question
conversation_history (List[Dict], optional): Previous conversation

Returns:

{
    "answer": "Natural language answer",
    "sql_query": "SELECT * FROM ...",
    "results": '[{"id": 1, ...}]',  # JSON string
    "intermediate_steps": [...],
    "metadata": {
        "sql_query": "...",
        "result_count": 42,
        "tables_accessed": ["orders"],
        "execution_time": 1.23,
        "timestamp": "2025-12-30T..."
    }
}

`aquery(question, conversation_history=None)` [async]

Execute a natural language query (asynchronous).

Parameters: Same as query()

Returns: Same as query()

Usage:

result = await agent.aquery("How many users?")

`get_full_results()`

Get complete untruncated results from last query.

Returns: JSON string with all rows

`get_schema_info(table_names=None)`

Get database schema information.

Parameters:

table_names (List[str], optional): Specific tables

Returns: Schema information string

`get_table_names()`

Get list of all table names.

Returns: List[str]

`get_dialect()`

Get SQL dialect of connected database.

Returns: str ("postgresql", "mysql", "sqlite", etc.)

`configure_logging(level, log_file)` [class method]

Configure logging for SQL queries.

Parameters:

level (int): logging.INFO, logging.DEBUG, etc.
log_file (str): Path to log file

Supported Providers & Databases

LLM Providers

AWS Bedrock (Claude, Titan, Llama, etc.)
OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude via API)
Ollama (Local: Mistral, Llama, Qwen, etc.)
Google (Gemini, PaLM)
Any custom LangChain BaseChatModel

Databases

PostgreSQL
MySQL
SQLite
Any SQLAlchemy-compatible database

How It Works

Schema First: Agent checks table schemas before writing queries
ReAct Pattern: Uses Thought-Action-Observation loop
Standard SQL: Generates database-agnostic SQL
Smart Truncation: Large results summarized for LLM
Error Recovery: Auto-retries with corrections
Caching: Pre-loaded schemas skip redundant lookups
Metadata Tracking: Full observability of query execution

Limitations

Read-only: SELECT queries only (no INSERT, UPDATE, DELETE, DROP)
LangChain Required: Must use LangChain-compatible LLM
SQLAlchemy Required: Database must work with SQLAlchemy
JSON Serialization: Complex custom types may need manual handling

License

MIT License - see LICENSE file for details

Acknowledgments

Built on:

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Dec 30, 2025

This version

0.1.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2sql_agent-0.1.0.tar.gz (122.5 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

text2sql_agent-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file text2sql_agent-0.1.0.tar.gz.

File metadata

Download URL: text2sql_agent-0.1.0.tar.gz
Upload date: Dec 30, 2025
Size: 122.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for text2sql_agent-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4097eae99e6f9f322e55487ebfd51f9a9d151a50c74a31bfe7c3627f118cf034`
MD5	`052b3124f730ceb2e4125739d6c5806a`
BLAKE2b-256	`ccf345355777fdbbc63dd4d45305dc6b9ea154d7a019a24e438f7e9eff7ade1b`

See more details on using hashes here.

File details

Details for the file text2sql_agent-0.1.0-py3-none-any.whl.

File metadata

Download URL: text2sql_agent-0.1.0-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 25.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for text2sql_agent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31ddb8ea32d45ac5eeeb4ef3843dd45c607d7636131cce1b48cf41c111b0ac18`
MD5	`e8ac24f4aa44ce20f26df30bb4b1e8a2`
BLAKE2b-256	`f9df4f9288facb86b76019c36200210b8ec500bbf329f460318eba388c368855`

See more details on using hashes here.

text2sql-agent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Text-to-SQL Agent

Key Advantages

Quick Start

Installation

Example

LLM Provider Examples

AWS Bedrock (Claude, Titan, etc.)

OpenAI (GPT-4, GPT-3.5)

Anthropic API (Claude direct)

Ollama (Local)

Google Gemini

Database Examples

PostgreSQL

MySQL

SQLite (Local Files)

Advanced Features

1. Schema Pre-loading (70-80% Performance Boost)

2. Async Support

3. Enhanced Logging & Metrics

4. Timestamp Context

5. Domain Context

6. Conversation History

7. Error Recovery (8 Error Patterns)

8. Result Truncation

9. Singleton Pattern

Complete Configuration

API Reference

SQLAgent Class

Constructor Parameters

Methods

query(question, conversation_history=None)

aquery(question, conversation_history=None) [async]

get_full_results()

get_schema_info(table_names=None)

get_table_names()

get_dialect()

configure_logging(level, log_file) [class method]

Supported Providers & Databases

LLM Providers

Databases

How It Works

Limitations

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`query(question, conversation_history=None)`

`aquery(question, conversation_history=None)` [async]

`get_full_results()`

`get_schema_info(table_names=None)`

`get_table_names()`

`get_dialect()`

`configure_logging(level, log_file)` [class method]