An MCP server that gives AI agents deep semantic understanding of any production database.
Project description
Schema Brain
An MCP server that gives AI agents deep semantic understanding of any production database.
Status: Pre-alpha. Repo is private; preview launch in prep. Feature-complete for Postgres + SQLite.
The problem
AI agents fail when querying real production databases:
- Schemas don't fit in context — a 300-table schema is 50k+ tokens of
CREATE TABLEalone. - Column names are cryptic —
acct_dim_v3,pmt_fct_h,cust_id_v2_legacy. - Joins aren't obvious — which FK is the "right" one when there are three?
- Data has shapes —
statuscould be 5 enum values, 50, or a free-text mess.
Schema Brain fixes all four and serves the result through a stable MCP tool surface that any agent (Claude Desktop, Anthropic SDK, custom) can call.
What it does
- Indexes your database schema, profiles each column, and generates a one-paragraph LLM description per column (Claude Haiku 4.5 by default; Sonnet 4.6 for cryptic abbreviations).
- Embeds the descriptions locally with
BAAI/bge-small-en-v1.5viafastembed— no second API vendor. - Stores everything in a single SQLite file. No Qdrant, no Redis, no ops.
- Serves four MCP tools:
find_relevant_tables,describe_table,describe_column,suggest_joins. Every response includes a token estimate so agents can budget context.
What it looks like in practice
Real Claude Desktop session, validated 2026-05-11 against the bundled e-commerce fixture (6 tables, 24 columns, indexed for $0.0074):
You: Using Schema Brain, write me a SQL query to compute each customer's total spend by product category.
Claude: (calls
describe_tableon users, products, orders, order_items, product_categories — in parallel)Here's the query. Schema is fully verified via Schema Brain.
SELECT u.id AS customer_id, u.full_name, c.name AS category_name, SUM(oi.quantity * oi.unit_price_cents) / 100.0 AS total_spend FROM public.users u JOIN public.orders o ON o.user_id = u.id JOIN public.order_items oi ON oi.order_id = o.id JOIN public.products p ON p.id = oi.product_id JOIN public.product_categories pc ON pc.product_id = p.id JOIN public.categories c ON c.id = pc.category_id GROUP BY u.id, u.full_name, c.name;Things you should know:
- Facts (from the schema): Prices live on
order_items.unit_price_cents, notorders.product_categoriesis a many-to-many composite-PK join table.- Assumptions: No status filter applied —
statusis free TEXT with default'pending'.categories.parent_idexists (nested categories) but I'm grouping by leaf — say if you want a recursive CTE for roll-up.- Caveat: A product in N categories has its spend counted N times (per category). Per-customer cross-category sums will exceed actual spend if products are multi-category in your data.
That's senior-engineer-grade output. None of the caveats — M:N double-counting, recursive-CTE awareness, free-text-status flag — is hardcoded; it falls out of letting Claude reason over the indexed descriptions. To verify Claude's SQL is mechanically correct (and that flagged caveats are the actual data behavior), see Validating SQL Claude generates.
And when the answer doesn't exist
You: Using Schema Brain, where do I find payment processor transaction IDs?
Claude: Short answer: they're not in this database — at least not in anything Schema Brain has indexed.
Most likely: payments live in a different system (Stripe/Adyen dashboard), or there's a payments table that wasn't indexed, or payment IDs are stored externally and only
orders.statusreflects the result.What to do next:
SELECT table_schema, table_name FROM information_schema.tables WHERE table_name ILIKE '%payment%'to confirm.
The qualifier — "at least not in anything Schema Brain has indexed" — matters. Most LLM-over-database tools confidently invent a payments table or shoehorn the answer into orders.total_cents. Schema Brain doesn't.
Quickstart
Five minutes from git clone to a working Claude Desktop integration. Three caveats up front — they tripped real users:
| Gotcha | Fix |
|---|---|
psql is not on macOS by default |
We use docker exec -i sb-pg psql ... instead — runs psql inside the postgres container, no host install needed |
uv sync --extra dev and schemabrain index are each silent for ~30–60s on first run |
Don't kill them. uv sync downloads ~75 wheels; first index downloads the ONNX embedding model (~67 MB) and makes 24 LLM calls. Progress bars land in v0. |
ANTHROPIC_API_KEY propagation |
Run export ANTHROPIC_API_KEY=sk-ant-... in the same terminal you'll run index from |
1. Install
git clone git@github.com:Arun-kc/schemabrain.git
cd schemabrain && uv sync --extra dev
PyPI publish is on the launch checklist — until then, install from source.
2. Boot Postgres + apply the bundled fixture (or point at your own DB)
docker run --rm -d -p 5432:5432 -e POSTGRES_PASSWORD=local --name sb-pg postgres:16-alpine
docker exec -i sb-pg psql -U postgres -d postgres \
< $(python -c "import schemabrain.eval, pathlib; print(pathlib.Path(schemabrain.eval.__file__).parent / 'fixtures/ecommerce.sql')")
For your own database, skip docker and use your real postgresql+psycopg:// URL.
3. Index it
export ANTHROPIC_API_KEY=sk-ant-...
schemabrain index "postgresql+psycopg://postgres:local@localhost:5432/postgres" \
--store-path ./schemabrain.db
Expect ~30–60 seconds of silence on the first run, then:
Indexed 6 table(s): 6 changed, 0 unchanged, 0 removed.
Columns: +24/~0/-0. LLM: 24 descriptions ($0.0074). Embeddings: 24
4. Wire into Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"schemabrain": {
"command": "/ABSOLUTE/PATH/TO/.venv/bin/schemabrain",
"args": [
"serve",
"--source",
"postgresql+psycopg://postgres:local@localhost:5432/postgres",
"--store-path",
"/ABSOLUTE/PATH/TO/schemabrain.db"
]
}
}
}
Both paths must be absolute. Quit Claude Desktop fully (Cmd+Q) and relaunch. The 🔌 tools panel should now show "schemabrain" with 4 tools.
For the headless Anthropic-SDK path, see examples/anthropic_demo.py and docs/setup.md.
Roadmap
Next:
- Query log mining via
pg_stat_statements— the differentiator vs schema-only competitors - 5th MCP tool:
get_example_queries— returns real SQL from your query log matching agent intent - BIRD Mini-Dev automated eval harness with reproducible CI
- Drift CLI:
schemabrain reindex --diff - Polished CLI (typer + progress bars)
Later (v1):
- Semantic layer: entities, metrics, canonical joins as first-class persisted definitions
- LLM-suggested entity definitions from existing column descriptions + FK graph
- Snowflake / BigQuery / MySQL connectors
- Hosted SaaS UI
- Multi-tenant access controls
Documentation
docs/architecture.md— pipeline, retrieval contract, cache logic, cost model, eval, what's validateddocs/mcp-tools.md— full reference for the 4 MCP tools with example responsesdocs/setup.md— Claude Desktop wiring + Anthropic SDK demo, with troubleshootingCONTRIBUTING.md— dev setup, TDD expectations, conventional commits, architecture invariantsexamples/— copy-paste-ready Claude Desktop config + headless agent loop using the officialmcpPython SDK
Contributing
PRs welcome. The bar is high — see CONTRIBUTING.md for the test-first / 99%-coverage / conventional-commits / architecture-invariants checklist. CI enforces all of it.
Bugs and feature requests use the structured templates in .github/ISSUE_TEMPLATE/. Issues without a reproduction (bugs) or a clear underlying problem (features) get closed with a request to re-open with the right info.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schemabrain-0.1.0a1.tar.gz.
File metadata
- Download URL: schemabrain-0.1.0a1.tar.gz
- Upload date:
- Size: 285.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1fab3534fd28e9c91bcd05d0c333a5eacfb0e01276db8363c0ab20dc1269af0
|
|
| MD5 |
c54df1295d91ce0121fa62b928fd9215
|
|
| BLAKE2b-256 |
abcd20203f0dcba9e9fb7abb823499e18208b34833a6eaa4d9ce2a15e3247cb6
|
Provenance
The following attestation bundles were made for schemabrain-0.1.0a1.tar.gz:
Publisher:
publish.yml on Arun-kc/schemabrain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schemabrain-0.1.0a1.tar.gz -
Subject digest:
b1fab3534fd28e9c91bcd05d0c333a5eacfb0e01276db8363c0ab20dc1269af0 - Sigstore transparency entry: 1508767944
- Sigstore integration time:
-
Permalink:
Arun-kc/schemabrain@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Arun-kc
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file schemabrain-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: schemabrain-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 77.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f62800b91e4fc8b7a96de056e9f81947c79e32122550b887854ba043839a7de2
|
|
| MD5 |
8d35e7a46bbd6d717a213a58012b6141
|
|
| BLAKE2b-256 |
eb2b2859cc95fae364fe71f35b0d49ce02a8253f5aa4ba4387fc3d878b391e21
|
Provenance
The following attestation bundles were made for schemabrain-0.1.0a1-py3-none-any.whl:
Publisher:
publish.yml on Arun-kc/schemabrain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schemabrain-0.1.0a1-py3-none-any.whl -
Subject digest:
f62800b91e4fc8b7a96de056e9f81947c79e32122550b887854ba043839a7de2 - Sigstore transparency entry: 1508768148
- Sigstore integration time:
-
Permalink:
Arun-kc/schemabrain@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Arun-kc
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3 -
Trigger Event:
workflow_dispatch
-
Statement type: