An MCP server that gives AI agents deep semantic understanding of any production database.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arunkc

These details have not been verified by PyPI

Project description

Schema Brain

An MCP server that gives AI agents deep semantic understanding of any production database.

Status: Pre-alpha. Repo is private; preview launch in prep. Feature-complete for Postgres + SQLite.

The problem

AI agents fail when querying real production databases:

Schemas don't fit in context — a 300-table schema is 50k+ tokens of CREATE TABLE alone.
Column names are cryptic — acct_dim_v3, pmt_fct_h, cust_id_v2_legacy.
Joins aren't obvious — which FK is the "right" one when there are three?
Data has shapes — status could be 5 enum values, 50, or a free-text mess.

Schema Brain fixes all four and serves the result through a stable MCP tool surface that any agent (Claude Desktop, Anthropic SDK, custom) can call.

What it does

Indexes your database schema, profiles each column, and generates a one-paragraph LLM description per column (Claude Haiku 4.5 by default; Sonnet 4.6 for cryptic abbreviations).
Embeds the descriptions locally with BAAI/bge-small-en-v1.5 via fastembed — no second API vendor.
Stores everything in a single SQLite file. No Qdrant, no Redis, no ops.
Serves four MCP tools: find_relevant_tables, describe_table, describe_column, suggest_joins. Every response includes a token estimate so agents can budget context.

What it looks like in practice

Real Claude Desktop session, validated 2026-05-11 against the bundled e-commerce fixture (6 tables, 24 columns, indexed for $0.0074):

You: Using Schema Brain, write me a SQL query to compute each customer's total spend by product category.

Claude: (calls describe_table on users, products, orders, order_items, product_categories — in parallel)

Here's the query. Schema is fully verified via Schema Brain.
SELECT u.id AS customer_id, u.full_name, c.name AS category_name,
       SUM(oi.quantity * oi.unit_price_cents) / 100.0 AS total_spend
FROM   public.users u
JOIN   public.orders o              ON o.user_id     = u.id
JOIN   public.order_items oi        ON oi.order_id   = o.id
JOIN   public.products p            ON p.id          = oi.product_id
JOIN   public.product_categories pc ON pc.product_id = p.id
JOIN   public.categories c          ON c.id          = pc.category_id
GROUP BY u.id, u.full_name, c.name;
Things you should know:

Facts (from the schema): Prices live on order_items.unit_price_cents, not orders. product_categories is a many-to-many composite-PK join table.

Assumptions: No status filter applied — status is free TEXT with default 'pending'. categories.parent_id exists (nested categories) but I'm grouping by leaf — say if you want a recursive CTE for roll-up.

Caveat: A product in N categories has its spend counted N times (per category). Per-customer cross-category sums will exceed actual spend if products are multi-category in your data.

That's senior-engineer-grade output. None of the caveats — M:N double-counting, recursive-CTE awareness, free-text-status flag — is hardcoded; it falls out of letting Claude reason over the indexed descriptions. To verify Claude's SQL is mechanically correct (and that flagged caveats are the actual data behavior), see Validating SQL Claude generates.

And when the answer doesn't exist

You: Using Schema Brain, where do I find payment processor transaction IDs?

Claude: Short answer: they're not in this database — at least not in anything Schema Brain has indexed.

Most likely: payments live in a different system (Stripe/Adyen dashboard), or there's a payments table that wasn't indexed, or payment IDs are stored externally and only orders.status reflects the result.

What to do next: SELECT table_schema, table_name FROM information_schema.tables WHERE table_name ILIKE '%payment%' to confirm.

The qualifier — "at least not in anything Schema Brain has indexed" — matters. Most LLM-over-database tools confidently invent a payments table or shoehorn the answer into orders.total_cents. Schema Brain doesn't.

Quickstart

Five minutes from git clone to a working Claude Desktop integration. Three caveats up front — they tripped real users:

Gotcha	Fix
`psql` is not on macOS by default	We use `docker exec -i sb-pg psql ...` instead — runs psql inside the postgres container, no host install needed
`uv sync --extra dev` and `schemabrain index` are each silent for ~30–60s on first run	Don't kill them. `uv sync` downloads ~75 wheels; first `index` downloads the ONNX embedding model (~67 MB) and makes 24 LLM calls. Progress bars land in v0.
`ANTHROPIC_API_KEY` propagation	Run `export ANTHROPIC_API_KEY=sk-ant-...` in the same terminal you'll run `index` from

1. Install

git clone git@github.com:Arun-kc/schemabrain.git
cd schemabrain && uv sync --extra dev

PyPI publish is on the launch checklist — until then, install from source.

2. Boot Postgres + apply the bundled fixture (or point at your own DB)

docker run --rm -d -p 5432:5432 -e POSTGRES_PASSWORD=local --name sb-pg postgres:16-alpine

docker exec -i sb-pg psql -U postgres -d postgres \
  < $(python -c "import schemabrain.eval, pathlib; print(pathlib.Path(schemabrain.eval.__file__).parent / 'fixtures/ecommerce.sql')")

For your own database, skip docker and use your real postgresql+psycopg:// URL.

3. Index it

export ANTHROPIC_API_KEY=sk-ant-...

schemabrain index "postgresql+psycopg://postgres:local@localhost:5432/postgres" \
  --store-path ./schemabrain.db

Expect ~30–60 seconds of silence on the first run, then:

Indexed 6 table(s): 6 changed, 0 unchanged, 0 removed.
Columns: +24/~0/-0. LLM: 24 descriptions ($0.0074). Embeddings: 24

4. Wire into Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "schemabrain": {
      "command": "/ABSOLUTE/PATH/TO/.venv/bin/schemabrain",
      "args": [
        "serve",
        "--source",
        "postgresql+psycopg://postgres:local@localhost:5432/postgres",
        "--store-path",
        "/ABSOLUTE/PATH/TO/schemabrain.db"
      ]
    }
  }
}

Both paths must be absolute. Quit Claude Desktop fully (Cmd+Q) and relaunch. The 🔌 tools panel should now show "schemabrain" with 4 tools.

For the headless Anthropic-SDK path, see examples/anthropic_demo.py and docs/setup.md.

Roadmap

Next:

Query log mining via pg_stat_statements — the differentiator vs schema-only competitors
5th MCP tool: get_example_queries — returns real SQL from your query log matching agent intent
BIRD Mini-Dev automated eval harness with reproducible CI
Drift CLI: schemabrain reindex --diff
Polished CLI (typer + progress bars)

Later (v1):

Semantic layer: entities, metrics, canonical joins as first-class persisted definitions
LLM-suggested entity definitions from existing column descriptions + FK graph
Snowflake / BigQuery / MySQL connectors
Hosted SaaS UI
Multi-tenant access controls

Documentation

docs/architecture.md — pipeline, retrieval contract, cache logic, cost model, eval, what's validated
docs/mcp-tools.md — full reference for the 4 MCP tools with example responses
docs/setup.md — Claude Desktop wiring + Anthropic SDK demo, with troubleshooting
CONTRIBUTING.md — dev setup, TDD expectations, conventional commits, architecture invariants
examples/ — copy-paste-ready Claude Desktop config + headless agent loop using the official mcp Python SDK

Contributing

PRs welcome. The bar is high — see CONTRIBUTING.md for the test-first / 99%-coverage / conventional-commits / architecture-invariants checklist. CI enforces all of it.

Bugs and feature requests use the structured templates in .github/ISSUE_TEMPLATE/. Issues without a reproduction (bugs) or a clear underlying problem (features) get closed with a request to re-open with the right info.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arunkc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0a1 pre-release

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schemabrain-0.1.0a1.tar.gz (285.6 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

schemabrain-0.1.0a1-py3-none-any.whl (77.7 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file schemabrain-0.1.0a1.tar.gz.

File metadata

Download URL: schemabrain-0.1.0a1.tar.gz
Upload date: May 11, 2026
Size: 285.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for schemabrain-0.1.0a1.tar.gz
Algorithm	Hash digest
SHA256	`b1fab3534fd28e9c91bcd05d0c333a5eacfb0e01276db8363c0ab20dc1269af0`
MD5	`c54df1295d91ce0121fa62b928fd9215`
BLAKE2b-256	`abcd20203f0dcba9e9fb7abb823499e18208b34833a6eaa4d9ce2a15e3247cb6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for schemabrain-0.1.0a1.tar.gz:

Publisher: publish.yml on Arun-kc/schemabrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: schemabrain-0.1.0a1.tar.gz
- Subject digest: b1fab3534fd28e9c91bcd05d0c333a5eacfb0e01276db8363c0ab20dc1269af0
- Sigstore transparency entry: 1508767944
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: Arun-kc/schemabrain@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Arun-kc
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3
- Trigger Event: workflow_dispatch

File details

Details for the file schemabrain-0.1.0a1-py3-none-any.whl.

File metadata

Download URL: schemabrain-0.1.0a1-py3-none-any.whl
Upload date: May 11, 2026
Size: 77.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for schemabrain-0.1.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f62800b91e4fc8b7a96de056e9f81947c79e32122550b887854ba043839a7de2`
MD5	`8d35e7a46bbd6d717a213a58012b6141`
BLAKE2b-256	`eb2b2859cc95fae364fe71f35b0d49ce02a8253f5aa4ba4387fc3d878b391e21`

See more details on using hashes here.

Provenance

The following attestation bundles were made for schemabrain-0.1.0a1-py3-none-any.whl:

Publisher: publish.yml on Arun-kc/schemabrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: schemabrain-0.1.0a1-py3-none-any.whl
- Subject digest: f62800b91e4fc8b7a96de056e9f81947c79e32122550b887854ba043839a7de2
- Sigstore transparency entry: 1508768148
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: Arun-kc/schemabrain@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Arun-kc
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2310e8ed467f4bb4ed25e1bcdaf7fa72c9eae5d3
- Trigger Event: workflow_dispatch

schemabrain 0.1.0a1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Schema Brain

The problem

What it does

What it looks like in practice

And when the answer doesn't exist

Quickstart

1. Install

2. Boot Postgres + apply the bundled fixture (or point at your own DB)

3. Index it

4. Wire into Claude Desktop

Roadmap

Documentation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance