SQL codebase indexer with column-level lineage, impact analysis, and MCP server support
Project description
SQLPrism
An MCP server that indexes SQL codebases into a queryable knowledge graph backed by DuckDB. Instead of grepping through files, ask structural questions: what touches this table, where is this column transformed, what's the blast radius of this PR.
Built for SQL-heavy data projects — works with raw SQL, SQLMesh, and dbt.
Why Not Just Grep?
Grep finds strings. This tool understands SQL structure.
| Capability | Grep | SQLPrism |
|---|---|---|
| Find table references | Yes | Yes |
| CTE-to-CTE data flow | No — manual file reading | Yes — edges tracked in graph |
| Column lineage with transforms (CAST, COALESCE, SUM) | No | Yes — parsed from AST |
| Usage type (WHERE vs SELECT vs JOIN vs GROUP BY) | Fragile regex | Precise — parsed from AST |
| Multi-hop impact analysis | Manual tracing | Automatic graph traversal |
| PR blast radius | DIY with git diff | One call |
| Cross-CTE column tracing | Basically impossible | Built-in |
On a 200-model SQLMesh project, a column impact query returns 75 structured results in ~5,000 tokens. The grep equivalent would need 40-60 files opened, ~100,000+ tokens, and still wouldn't tell you whether a column appears in a WHERE filter or a SELECT.
Setup
1. Install
git clone https://github.com/darkcofy/sqlprism.git && cd sqlprism
uv sync
2. Configure
uv run sqlprism init # creates sqlprism.yml in the current directory
# edit sqlprism.yml to add your repos (see Configuration below)
uv run sqlprism reindex # index plain SQL repos
For dbt and SQLMesh projects, use reindex-dbt and reindex-sqlmesh respectively. See the CLI guide for full options.
3. Connect your MCP client
Claude Code:
claude mcp add sqlprism -- uv run --directory /path/to/sqlprism sqlprism serve
Claude Desktop / Cursor / Continue.dev (.mcp.json):
{
"mcpServers": {
"sqlprism": {
"command": "uv",
"args": ["run", "--directory", "/path/to/sqlprism", "sqlprism", "serve"]
}
}
}
Replace /path/to/sqlprism with the absolute path to your clone.
4. Reindex on Save
The graph stays fresh automatically when you set up on-save hooks. There are two modes depending on your editor.
Claude Code
Add a PostToolUse hook so the index updates whenever Claude writes or edits a file. Save this as .claude/settings.json in your project root:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit",
"hooks": [
{
"type": "command",
"command": "FILE=$(cat | jq -r '.tool_input.file_path // empty'); [ -n \"$FILE\" ] && [[ \"$FILE\" =~ \\.sql$ ]] && sqlprism reindex-file \"$FILE\" || true"
}
]
}
]
}
}
This extracts the file path from the hook's stdin JSON, checks it's a .sql file, and calls the CLI to reindex it. Run /hooks in Claude Code to verify the hook is active.
Other MCP clients (Cursor, Continue.dev)
The reindex_files MCP tool accepts absolute file paths and reindexes only the affected models. Plain SQL reindexes in ~50ms; dbt/SQLMesh models compile + reindex in ~2-5s. Calls are debounced per repo (500ms for SQL, 2s for rendered models) so rapid saves batch into a single operation.
Configure your client to call reindex_files with the saved file's path on save.
Editors without MCP (Vim, Neovim, Emacs, VS Code tasks)
The reindex-file CLI command works standalone — no running server needed:
sqlprism reindex-file /path/to/model.sql
Vim / Neovim:
autocmd BufWritePost *.sql silent !sqlprism reindex-file %:p
Emacs:
(add-hook 'after-save-hook
(lambda ()
(when (string-match-p "\\.sql\\'" buffer-file-name)
(start-process "sqlprism" nil "sqlprism" "reindex-file" buffer-file-name))))
VS Code (using the Run on Save extension, .vscode/settings.json):
{
"emeraldwalk.runonsave": {
"commands": [
{
"match": "\\.sql$",
"cmd": "sqlprism reindex-file ${file}"
}
]
}
}
Configuration
sqlprism init creates a default config at sqlprism.yml in the working directory. YAML is the default format; JSON is also supported (--format json). Existing sqlprism.json files are auto-discovered for backwards compatibility. Override the config path with --config PATH on any command.
db_path: ~/.sqlprism/graph.duckdb
sql_dialect: null
repos:
my-queries: /path/to/sql/repo
multi-dialect-repo:
path: /path/to/repo
dialect: starrocks
dialect_overrides:
athena/: athena
postgres/: postgres
sqlmesh_repos:
my-sqlmesh-project:
project_path: /path/to/sqlmesh/folder
env_file: /path/to/.env
dialect: athena
variables:
GRACE_PERIOD: 7
dbt_repos:
my-dbt-project:
project_path: /path/to/dbt/project
env_file: /path/to/.env
target: dev
dialect: starrocks
dbt_command: uv run dbt
| Field | Description |
|---|---|
db_path |
Path to the DuckDB database file. Defaults to ~/.sqlprism/graph.duckdb. |
sql_dialect |
Global default SQL dialect. null for auto-detect. |
repos |
Plain SQL repos. Value is a path string or an object with path, dialect, dialect_overrides. |
dialect |
Per-repo dialect override (e.g. "starrocks", "athena", "bigquery"). |
dialect_overrides |
Per-directory overrides using prefix matching or glob patterns. |
sqlmesh_repos |
SQLMesh projects. Renders models before parsing. |
dbt_repos |
dbt projects. Compiles models before parsing. |
SQL Dialect Support
Powered by sqlglot, the indexer supports 33 SQL dialects out of the box:
Athena, BigQuery, ClickHouse, Databricks, Doris, Dremio, Drill, Druid, DuckDB, Dune, Exasol, Fabric, Hive, Materialize, MySQL, Oracle, Postgres, Presto, PRQL, Redshift, RisingWave, SingleStore, Snowflake, Spark, Spark2, SQLite, StarRocks, Tableau, Teradata, Trino, TSQL.
Pass the dialect name as a lowercase string (e.g. "starrocks", "bigquery", "athena"). Dialect-specific quoting and identifier case normalization are handled automatically.
CLI Commands
Full reference: CLI guide
| Command | Description |
|---|---|
sqlprism init |
Create default config file. |
sqlprism reindex |
Incremental reindex of plain SQL repos. |
sqlprism reindex-file |
Fast on-save reindex of specific files. |
sqlprism reindex-dbt |
Compile and index a dbt project. |
sqlprism reindex-sqlmesh |
Render and index a SQLMesh project. |
sqlprism serve |
Start the MCP server (stdio or HTTP). |
sqlprism conventions init |
Generate sqlprism.conventions.yml from inferred conventions. |
sqlprism conventions refresh |
Re-run convention inference after reindex. |
sqlprism conventions diff |
Show what changed since last --init. |
sqlprism status |
Show index status. |
sqlprism query search |
Find entities by name pattern. |
sqlprism query references |
Find inbound/outbound dependencies. |
sqlprism query column-usage |
Find column usage across models. |
sqlprism query trace |
Multi-hop dependency tracing. |
sqlprism query lineage |
End-to-end column lineage chains. |
MCP Tools
Full reference: MCP tools guide
When running as an MCP server (sqlprism serve), 24 tools are exposed:
| Tool | Description |
|---|---|
search |
Find entities by name pattern with pagination. |
find_references |
Inbound/outbound dependencies with snippets. |
find_column_usage |
Column usage — type, transforms, aliases. |
trace_dependencies |
Multi-hop upstream/downstream chains. |
trace_column_lineage |
End-to-end column lineage through CTEs. |
get_schema |
Table/view schema with columns, types, and dependencies. |
get_context |
One-call comprehensive context dump for a model. |
find_path |
Shortest path between two models (DuckPGQ). |
find_critical_models |
Rank models by PageRank importance (DuckPGQ). |
detect_cycles |
Find circular dependencies in the graph. |
find_subgraphs |
Identify disconnected clusters and orphaned models (DuckPGQ). |
find_bottlenecks |
High fan-out models with risk classification. |
check_impact |
Column-level impact analysis before making changes. |
pr_impact |
Structural diff + blast radius since a base commit. |
reindex |
Background incremental reindex of SQL repos. |
reindex_files |
Fast on-save reindex with per-repo debounce. |
reindex_dbt |
Background dbt compile + index. |
reindex_sqlmesh |
Background SQLMesh render + index. |
get_conventions |
Inferred project conventions — naming, references, columns. |
find_similar_models |
Find existing models similar to what you're building. |
suggest_placement |
Recommend where to place a new model based on references. |
search_by_tag |
Find models by semantic tag (business domain concept). |
list_tags |
List all semantic tags with model counts and confidence. |
index_status |
Index stats, cross-repo edges, and name collisions. |
Architecture
src/sqlprism/
types.py <- ParseResult, NodeResult, EdgeResult, ColumnUsageResult, parse_repo_config
languages/
__init__.py <- SQL_EXTENSIONS, is_sql_file()
sql.py <- sqlglot: tables, views, CTEs, column lineage, transforms
sqlmesh.py <- SQLMesh renderer (full project + selective render_models)
dbt.py <- dbt renderer (full project + selective render_models via --select)
utils.py <- Shared helpers (find_venv_dir, parse_dotenv, build_env, enrich_nodes)
core/
graph.py <- DuckDB storage layer (MVCC), queries, snippets, repo_type tracking
indexer.py <- Orchestrator: scan -> checksum -> parse -> store; file-level reindex with repo-type dispatch
mcp_tools.py <- FastMCP tool definitions (24 tools, non-blocking reindex, per-repo debounce)
conventions.py <- Convention inference engine: layers, naming, references, tags, overrides
cli.py <- Click CLI: serve, reindex, reindex-file, reindex-sqlmesh, reindex-dbt, conventions, status, init
The SQL parser extracts:
- Nodes: tables, views, CTEs, queries (with schema metadata and dialect-aware case normalization)
- Edges: table references, CTE references, JOINs (with context like "FROM clause", "JOIN clause")
- Column usage: per-column tracking with usage type (select, where, join_on, group_by, order_by, having, partition_by, window_order), transforms (CAST, COALESCE, SUM, etc.), output aliases, and WHERE filter expressions
- Column lineage: end-to-end tracing through CTEs and subqueries back to source tables, with SELECT * expansion when schema catalog is available
Full architecture docs: Architecture overview | DuckDB schema
DuckPGQ Graph Analytics
SQLPrism optionally integrates with DuckPGQ for advanced graph analytics. When installed, these tools become available: find_path, find_critical_models, find_subgraphs, find_bottlenecks (clustering enrichment). DuckPGQ is installed automatically on first use — no manual setup needed.
Development
uv sync
uv run pytest # run tests (510+ tests)
uv run pytest --cov=sqlprism # run with coverage report
uv run pytest --cov=sqlprism --cov-report=html:coverage_html # HTML report
Code Coverage
License
Apache License 2.0 — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlprism-1.2.0.tar.gz.
File metadata
- Download URL: sqlprism-1.2.0.tar.gz
- Upload date:
- Size: 296.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc0ec0f412f21b4e5f940bbd29b1a08739db6c6ad8343f2f178b9db7360a74c0
|
|
| MD5 |
b03a3a9462633e0665360cd8cc92a2bf
|
|
| BLAKE2b-256 |
d27f974975363c8980f0639b8f17e9e34beb65fd1530a31fcc3239f05a4a4652
|
File details
Details for the file sqlprism-1.2.0-py3-none-any.whl.
File metadata
- Download URL: sqlprism-1.2.0-py3-none-any.whl
- Upload date:
- Size: 107.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dbaa7f4f05477a0801676a537558063e3024e9ff99970c9906d1b539503d6f9
|
|
| MD5 |
f31152aefabb0a60cc6be52ee59081be
|
|
| BLAKE2b-256 |
4ee0db2feb7202a44f92cf695f79f89461d4bc40fcd8169aabbbbbb2b31943bc
|