Skip to main content

Schema introspection and resolution toolkit for SQL agents

Project description

schemadex

Schema introspection and resolution toolkit for SQL agents.

schemadex is a Rust core with Python bindings that turns a live database into something an LLM agent can actually consume: a token-budgeted, sample-aware, fuzzy-resolvable view of every table and column, cached on disk so the agent doesn't re-pay the introspection cost on every turn.

Install

pip install schemadex                            # core + all backends
pip install "schemadex[langchain,langgraph]"     # with framework adapters

Rust (git, not crates.io):

schemadex-core = { git = "https://github.com/AmiyaMandal1/schemadex", features = ["postgres", "sqlite", "duckdb_backend"] }

Documentation

Full docs: https://amiyamandal1.github.io/schemadex/

10-line example

from schemadex import SchemaCache

cache = SchemaCache.from_url("postgres://localhost/mydb")

for name in cache.list_tables():
    print(name)

result = cache.resolve("public.orders", "customer_idd")
print(result.matched, result.confidence)   # 'customer_id', 0.98

prompt, tokens = cache.describe_for_agent(max_tokens=1500, hint="orders by region")

Benchmarks

Synthetic adversarial corpus (benches/agent-success/): 38 column-name hallucinations modeled after real LLM SQL agent failures — case flips, missing/added underscores, plural/singular drift, semantic near-misses.

Two harnesses, same corpus:

Harness Baseline Treatment Δ
Literal lookup (no LLM) 0.0% 94.7% +94.7 pp
qwen2.5-coder:3b via Ollama (real LLM) 97.4% 100.0% +2.6 pp

Same model in both halves of the LLM row — the difference is describe_for_agent ranking + resolve_column post-correction. The single miss in the LLM baseline is "What is the orderid on shipments?" — the model picked id instead of order_id; resolve_column fixed it.

Median per-record latency: literal harness ~7 µs / 3 µs (schemadex overhead is in the noise); LLM harness ~285 ms (LLM-bound on an M2 Max).

Reproduce:

# Literal (no LLM)
python benches/agent-success/synthetic_corpus.py
python benches/agent-success/run_synthetic.py

# Real LLM via Ollama
ollama serve &
ollama pull qwen2.5-coder:3b
python benches/agent-success/run_ollama.py --model qwen2.5-coder:3b --n 0

This is a micro-benchmark of the resolution path, not an end-to-end LLM agent comparison on BIRD/Spider — those harnesses are scaffolded in baseline.py / treatment.py but require an API key + corpus download. See docs/benchmark.md for methodology.

Why

LLM SQL agents fail in the same three ways over and over:

  1. They hallucinate column names because they got the schema in a thousand-token blob and forgot half of it.
  2. They retry the same broken query because they don't know 'No Delay' is the sentinel value covering 80% of delay_code.
  3. They re-introspect on every step because there's nowhere obvious to cache the schema.

schemadex fixes all three:

  • Resolution: resolve_column(table, candidate) returns a confidence + alternatives instead of letting the agent guess.
  • Sampling: sample_values=True collects top-K, percentiles, and flags any value over 40% frequency as a sentinel.
  • Cache: introspect once, persist to ~/.cache/schemadex/<db>/, refresh on DDL change. On a local 50-table SQLite, warm reads are ~47× cold (cargo bench --bench cache_refresh); on remote Postgres the ratio grows since cold is network-bound.

Backends

Backend Feature flag Status Credentials
Postgres postgres URL only
SQLite sqlite URL only
DuckDB duckdb_backend URL only
MySQL mysql URL only
BigQuery bigquery GOOGLE_APPLICATION_CREDENTIALS or ADC (gcloud auth application-default login)
Snowflake snowflake SNOWFLAKE_USER, SNOWFLAKE_ACCOUNT, SNOWFLAKE_PRIVATE_KEY_PATH (key-pair JWT); optional SNOWFLAKE_WAREHOUSE, SNOWFLAKE_ROLE
MSSQL mssql URL (mssql://user:pass@host:port/database; append ?encrypt=true for TLS)

MCP server

schemadex ships an MCP server. Install with pip install "schemadex[mcp]" and wire it into Claude Code by adding to ~/.claude/mcp.json:

{
  "mcpServers": {
    "schemadex": {
      "command": "schemadex-mcp",
      "args": ["--url", "sqlite:///path/to/db.sqlite"]
    }
  }
}

The agent then has list_tables, describe_for_agent, resolve_column, and run_sql tools without any extra wiring.

Layout

schemadex/
├── crates/
│   ├── schemadex-core/    pure-Rust introspection + cache + resolve
│   └── schemadex-py/      PyO3 bindings (built as `schemadex._native`)
├── python/schemadex/      pure-Python public surface
├── examples/              langchain, langgraph, quickstart
├── benches/agent-success/ benchmark harness (see docs/benchmark.md)
└── docs/                  architecture + benchmark methodology

Project status

Pre-1.0. API is still in motion. See road_map.md for milestone tracking.

License

Licensed under either of MIT or Apache-2.0 at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schemadex-1.0.0.tar.gz (160.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

schemadex-1.0.0-cp39-abi3-win_amd64.whl (18.7 MB view details)

Uploaded CPython 3.9+Windows x86-64

schemadex-1.0.0-cp39-abi3-musllinux_1_2_x86_64.whl (34.6 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

schemadex-1.0.0-cp39-abi3-manylinux_2_28_x86_64.whl (28.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

schemadex-1.0.0-cp39-abi3-macosx_11_0_arm64.whl (21.0 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

schemadex-1.0.0-cp39-abi3-macosx_10_12_x86_64.whl (22.6 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file schemadex-1.0.0.tar.gz.

File metadata

  • Download URL: schemadex-1.0.0.tar.gz
  • Upload date:
  • Size: 160.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schemadex-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8bf1740912da03f94b9036c27ee4981b7fd59de3a84903ee504bcd09ed23847b
MD5 940e61be9dcdc918bc24e38deab20d9a
BLAKE2b-256 f3933a70913e3773e0335d0a148c53327b2700d3debc98a1c6b3640434353e96

See more details on using hashes here.

File details

Details for the file schemadex-1.0.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: schemadex-1.0.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 18.7 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schemadex-1.0.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a077d3ed3e4d24cecb4a6d7f3ae3ac7d217220aad47ef546c4c264b37c6e159b
MD5 a54626746547aa3f16b8c833e84c0428
BLAKE2b-256 61ca435d5e11d1525de5f81a6551ce5bffc2b48c4ad26da364bd5c00d05294da

See more details on using hashes here.

File details

Details for the file schemadex-1.0.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

  • Download URL: schemadex-1.0.0-cp39-abi3-musllinux_1_2_x86_64.whl
  • Upload date:
  • Size: 34.6 MB
  • Tags: CPython 3.9+, musllinux: musl 1.2+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schemadex-1.0.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c110093f1111a420d85bef974afe97594cb60957e2e6a113ef46b5f4064dd14e
MD5 b65b3ad904548cb1bbd02e1100c89446
BLAKE2b-256 9ec14b57d84d76aa997496e8b5d72ef97276134de40ab7c2793922a58f280404

See more details on using hashes here.

File details

Details for the file schemadex-1.0.0-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

  • Download URL: schemadex-1.0.0-cp39-abi3-manylinux_2_28_x86_64.whl
  • Upload date:
  • Size: 28.1 MB
  • Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schemadex-1.0.0-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e046da09bab2d49e80a04fa121dbda2d80e3f77413b223b9ff9a04ca6d229536
MD5 cf4e3cede13f64485f32b4db268a2355
BLAKE2b-256 4c2af34121abbd6a5a5266ccf1ecc18460a5da84b278952ac1c98a0941cad2a1

See more details on using hashes here.

File details

Details for the file schemadex-1.0.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: schemadex-1.0.0-cp39-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 21.0 MB
  • Tags: CPython 3.9+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schemadex-1.0.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c5d3df6927f1ba559fc0399ba6d12abe42c7f6a896cd57e932512c1721fc5a5c
MD5 d17211d405757b3960ed2d1b7b1570eb
BLAKE2b-256 73f1864a92e733d33e6dd8d59b3a49b8ce5b9912b5e38bf405adf20e1fa25839

See more details on using hashes here.

File details

Details for the file schemadex-1.0.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: schemadex-1.0.0-cp39-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 22.6 MB
  • Tags: CPython 3.9+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schemadex-1.0.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 38df7e7f570437de0bf1f335a995ec3c0a15efcd1a5fa765e75645ee3e21c727
MD5 e577950a6e7e9d8e97e3984a04641c53
BLAKE2b-256 40a496f23c848eed407f118c2d5d95c89a75e91be2b4a1d36cdddcfdc7154205

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page