Skip to main content

Text-to-SQL framework with tool-based schema retrieval. Built on LangChain's Deep Agents harness.

Project description

text2sql

Until recently, LLMs couldn't reliably chain more than a handful of tool calls before losing the thread. Though frontier models now make dozens, hundreds, or even thousands of iterative tool calls from a single prompt, reading each result and deciding what to do next. This unlocks a different shape of text-to-SQL system: instead of pre-computing which schema elements are relevant, you can hand the LLM one tool (execute_sql) and let it explore the schema, write queries, test them against real data, and self-correct before returning a final answer. This SDK requires no RAG, semantic layer, schema descriptions, etc. All that is needed is a connection string and a frontier model, as shown in the example below.

As models keep getting better at recursive tool use, the right move is to keep rearchitecting the harness so it constrains the LLM as little as possible (every guardrail you remove is capability you get back).

19/20 (95%) on Spider zero-shot across 80 tables and 20 databases. 20/20 after adding one scenario.

from text2sql import TextSQL

engine = TextSQL(
    "postgresql://user:pass@localhost/mydb",
    trace_file="traces.jsonl",
)
result = engine.ask("Which customers have spent more than $10K this year?")

print(result.sql)   # verified SQL
print(result.data)  # [{'name': 'Acme Corp', 'total': 14302.50}, ...]

How it works

Below is a real agent trace from the Spider benchmark. The agent is connected to a single database containing 80 tables from 20 different schemas — it has to find the right tables for each question on its own.

Question: "List the name of singers in ascending order of net worth."

┌─────────────────────────────────────────────────────────────┐
│ Tool:   execute_sql                                         │
│ Input:  SELECT name FROM sqlite_master                      │
│         WHERE type='table' AND name NOT LIKE 'sqlite_%'     │
│ Output: battle, ship, death, continents, countries,         │
│         car_makers, model_list, car_names, cars_data,       │
│         stadium, singer, concert, singer_in_concert,        │
│         course, teacher, … (80 tables)                      │
├─────────────────────────────────────────────────────────────┤
│ Tool:   execute_sql                                         │
│ Input:  PRAGMA table_info('singer')                         │
│ Output: Singer_ID INT, Name TEXT, Country TEXT,              │
│         Song_Name TEXT, Song_release_year TEXT,              │
│         Age INT, Is_male BOOL                               │
│         ← no Net_Worth column — wrong table                 │
├─────────────────────────────────────────────────────────────┤
│ Tool:   execute_sql                                         │
│ Input:  PRAGMA table_info('singer_solo')                    │
│ Output: Singer_ID INT, Name TEXT, Birth_Year REAL,          │
│         Net_Worth_Millions REAL, Citizenship TEXT            │
│         ← found it                                          │
├─────────────────────────────────────────────────────────────┤
│ Tool:   execute_sql                                         │
│ Input:  SELECT Name FROM singer_solo                        │
│         ORDER BY Net_Worth_Millions ASC                     │
│ Output: Abigail Johnson, Susanne Klatten,                   │
│         Gina Rinehart, Iris Fontbona, …  ✓                  │
└─────────────────────────────────────────────────────────────┘

The agent saw 80 tables, found two singer tables, inspected both, identified which one had the Net_Worth_Millions column, and wrote the correct query. Four tool calls, all autonomous.

Schema retrieval and SQL generation happen in the same loop, not as separate pipeline stages. If the agent picks the wrong table, it goes back and finds the right one. If a query errors, it reads the error message and fixes it. If the output doesn't look right, it rethinks its approach.

Benchmarks

Tested on the Spider benchmark — the most widely used text-to-SQL evaluation, with 10,000+ questions across 200 databases. We merged all 20 dev-set databases into a single 80-table database and ran 20 questions — one per database, randomly selected. The agent had to navigate 80 tables to find the right ones for each question.

19/20 (95%) zero-shot, no examples. The single failure was an ambiguous question — "What is maximum and minimum death toll caused each time?" — where the agent returned per-battle results instead of a global aggregate. After adding a one-line scenario clarifying that "each time" means overall, the agent used lookup_example to retrieve the guidance and got it right: 20/20.

Install

pip install text2sql-framework

# With Anthropic:
pip install "text2sql-framework[anthropic]"

# With OpenAI:
pip install "text2sql-framework[openai]"

Quick start

from text2sql import TextSQL

# Connect to any SQLAlchemy-supported database
engine = TextSQL("sqlite:///company.db")

# Ask a question
result = engine.ask("Top 5 products by total revenue")
print(result.sql)
print(result.data)

# Control how many rows come back
result = engine.ask("All customers in New York", max_rows=50)

LLM providers

# Anthropic (recommended)
engine = TextSQL("sqlite:///mydb.db", model="anthropic:claude-sonnet-4-6")

# OpenAI
engine = TextSQL("sqlite:///mydb.db", model="openai:gpt-4o")

Database support

Any database with a SQLAlchemy driver:

TextSQL("postgresql://user:pass@localhost/mydb")
TextSQL("mysql+pymysql://user:pass@localhost/mydb")
TextSQL("sqlite:///mydb.db")
TextSQL("mssql+pyodbc://user:pass@server/db?driver=ODBC+Driver+17+for+SQL+Server")
TextSQL("snowflake://user:pass@account/db/schema")

The agent automatically detects the SQL dialect and adjusts its schema exploration strategy — information_schema for PostgreSQL/MySQL/Snowflake, PRAGMA for SQLite, sys.tables for SQL Server.

Scenarios and the feedback loop

The agent works out of the box with just a connection string — but real databases have jargon, business logic, and naming conventions that no LLM can guess. That's where scenarios.md comes in.

A scenarios file is a markdown file where each ## heading contains domain knowledge the agent can't infer from the schema alone — business rules, column name translations, tricky join paths, corrective guidance:

## net revenue
Net revenue = gross revenue minus refunds.
Use INNER JOIN between orders and payments, not LEFT JOIN.
- `orders.amt_ttl` is the gross order total
- Refunds are in the `payments` table where `is_refund = 1`

    -- CORRECT
    SELECT SUM(o.amt_ttl) + SUM(p.amt) FROM orders o
    JOIN payments p ON o.order_id = p.order_id WHERE p.is_refund = 1;

At runtime, the agent doesn't get the entire file dumped into its context. It sees a list of scenario titles and gets a lookup_example tool. When it's about to write a query involving revenue, it calls lookup_example("net revenue") and retrieves the full guidance before writing SQL. The agent decides when it needs help, and only pulls in what's relevant.

engine = TextSQL(
    "postgresql://localhost/mydb",
    examples="scenarios.md",
    trace_file="traces.jsonl",
)

Building scenarios automatically with the MCP

You don't have to write scenarios by hand. The SDK saves full traces of every query — which tables the agent explored, what SQL it tried, what errors it hit, how it self-corrected. The MCP server reads these traces, identifies where the agent struggled, and writes corrective scenarios to scenarios.md automatically.

pip install text2sql-mcp

Add to your .mcp.json:

{
  "mcpServers": {
    "text2sql": {
      "command": "text2sql-mcp",
      "env": {
        "TEXT2SQL_DB": "sqlite:///mydb.db",
        "TEXT2SQL_TRACES": "traces.jsonl",
        "TEXT2SQL_EXAMPLES": "scenarios.md",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

The MCP server plugs into Claude Code, Cursor, or any MCP-compatible assistant and exposes two tools:

  • analyze_traces — reads unread traces, sends them to an LLM along with the database schema and current scenarios, and writes improvements to scenarios.md
  • get_summary — quick stats: total traces, success rate, unread count, scenario count

The loop: run queries → traces accumulate → call analyze_traces → scenarios.md gets better → future queries use the improved scenarios via lookup_example. This is how we went from 96% to 100% on Spider — the MCP identified a LEFT vs INNER JOIN pattern the agent kept getting wrong and wrote a corrective scenario that fixed it.

CLI

# Interactive mode
text2sql ask "sqlite:///mydb.db"

# Single question
text2sql query "sqlite:///mydb.db" "How many orders per month?"

# With options
text2sql ask "postgresql://localhost/mydb" --model anthropic:claude-sonnet-4-6

Use with LangChain agents

If you're already building an agent with LangChain's create_agent, you can plug text2sql in as middleware instead of using the standalone TextSQL class. The middleware adds an execute_sql tool, dialect-aware schema-exploration guidance to the system prompt, and (optionally) a lookup_example tool wired to your scenarios file.

pip install "text2sql-framework[langchain,anthropic]"
from langchain.agents import create_agent
from text2sql import Text2SqlMiddleware

t2s = Text2SqlMiddleware(
    db_url="postgresql://user:pass@localhost/mydb",
    examples="scenarios.md",          # optional
    instructions="Revenue is net of refunds.",  # optional
)

agent = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=t2s.tools,
    middleware=[t2s],
)

result = agent.invoke({"messages": [{"role": "user", "content": "Top 5 customers by revenue?"}]})
print(result["messages"][-1].content)

The middleware requires langchain>=1.0. A runnable example lives at examples/with_langchain.py.

Built on Deep Agents

The agent loop is powered by Deep Agents (langchain-ai/deepagents). We use a minimal middleware stack — just automatic context compaction (summarizes older tool calls if the agent is working on a task with many steps) and Anthropic prompt caching (reduces API costs). All other default middleware (filesystem tools, sub-agents, todo lists) is disabled so the agent only sees the text2sql tools it needs.

Architecture

text2sql/
├── core.py          # TextSQL — public API
├── generate.py      # SQLGenerator — builds the agent, parses results
├── connection.py    # Database — SQLAlchemy wrapper
├── tools.py         # execute_sql + lookup_example (LangChain tools)
├── dialects.py      # Per-dialect schema exploration guides
├── examples.py      # ExampleStore — loads scenario markdown
├── tracing.py       # Tracer — captures full agentic loop
├── analyze.py       # AnalysisEngine — deterministic trace analysis
├── models.py        # Pydantic models for analysis reports
└── cli.py           # Click CLI

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2sql_framework-0.3.0.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text2sql_framework-0.3.0-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file text2sql_framework-0.3.0.tar.gz.

File metadata

  • Download URL: text2sql_framework-0.3.0.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for text2sql_framework-0.3.0.tar.gz
Algorithm Hash digest
SHA256 807a96827068b1354971e48a6502087c45fdd9de202514d96394a9d67212e038
MD5 1a2311b3d8ccf9b1632625eca49a9dd0
BLAKE2b-256 4297e432da8dcf39d0355bacef5b01a775b2d9341f2d0fa704c75586bcd3ddff

See more details on using hashes here.

File details

Details for the file text2sql_framework-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for text2sql_framework-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e5753931b254bdbe2957776bcc2ab1dd10ccd218868b305bffd33f9233af706
MD5 b8a7438d15855ac2a4238c4449b7dc08
BLAKE2b-256 db0218e1da55a520009928f4661f5ddb7547487e38c6459b2f70db7b4a504451

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page