SQL code graph analyzer and lineage tracer
Project description
sql-code-graph
SQL lineage and dependency analysis as an MCP server for Claude Code.
Indexes a directory of .sql files into a graph database and exposes lineage
queries as MCP tools — so Claude can answer questions like "what tables does
this view depend on?" or "where is orders.customer_id derived from?"
without reading every file.
Quick start
Choose one:
Permanent install (recommended):
uv tool install sql-code-graph # Fast, managed, no isolation needed
sqlcg install # Register MCP server in Claude Code
One-shot try (cold cache warning):
uvx sql-code-graph # First run is slow (downloads deps)
# Subsequent runs use cache, ~1s startup
Restart Claude Code, then inside your project ask:
Index my SQL files at ./sql --dialect snowflake
That's it. The MCP tools are now available to Claude in every conversation for that project.
Workflow (3 steps)
- Initialize:
sqlcg db init - Index:
sqlcg index ./sql --dialect snowflake - Keep fresh:
sqlcg git install-hooks(optional)
Full setup (recommended)
# 1. Install
pip install sql-code-graph
# 2. Register with Claude Code (~/.claude/settings.json)
sqlcg install
# 3. Restart Claude Code
# 4. Index your SQL repo
# Only git-tracked files are indexed — build artefacts, node_modules,
# and .venv are ignored automatically.
sqlcg db init
sqlcg index ./sql --dialect snowflake # or: bigquery, postgres, ansi
# 5. (Optional) Keep the graph fresh on branch switches
cd /your/sql/repo
sqlcg git install-hooks
Step 5 installs a post-checkout git hook that re-indexes automatically
whenever you switch branches. Without it the graph may be stale after a
git checkout until you re-run sqlcg index manually.
Dialect config
To avoid passing --dialect every time, create .sqlcg.toml in your repo root:
[sqlcg]
dialect = "snowflake" # snowflake | bigquery | postgres | ansi
The git hook and sqlcg index --dialect auto both read this file.
Add to your project CLAUDE.md (recommended)
Adding a short note to your project's CLAUDE.md helps Claude know the tools
are available and when to use them:
## SQL lineage
This project uses sql-code-graph. MCP tools are available:
- `db_info` — check graph health and parse quality before running lineage queries
- `index_repo` — index or re-index a directory of SQL files
- `find_table_usages` — find all queries that read a table
- `trace_column_lineage` — trace where a column's value comes from
- `get_upstream_dependencies` / `get_downstream_dependencies` — dependency chains
- `search_sql_pattern` — full-text search across all indexed SQL
- `execute_cypher` — raw graph query for advanced analysis
The MCP server works without this — Claude can discover the tools on its own — but the CLAUDE.md snippet ensures they get used proactively.
Parse quality
After indexing, sqlcg gain shows a parse quality breakdown that tells you how
much column-level lineage was extracted:
| Quality | Meaning | Tools affected |
|---|---|---|
FULL |
Column-level lineage extracted | All tools work |
TABLE_ONLY |
Table edges only — no column lineage | trace_column_lineage, get_*_dependencies return empty |
SCRIPTING_FALLBACK |
sqlglot fell back to raw command node | Partial table edges; column lineage unavailable |
FAILED |
File failed to parse entirely | File invisible to all queries |
Quality is shown per-file after sqlcg index and in sqlcg gain Section F.
list_dialects_and_repos() warns when scripting fallback exceeds 20% of queries.
What causes TABLE_ONLY? Mostly SELECT * — sqlglot can't trace column names through
a wildcard. Alias those selects to get FULL coverage.
What causes SCRIPTING_FALLBACK? Snowflake $$ procedure bodies or BEGIN…END scripting
blocks. sqlglot parses the block as a raw Command node and extracts DML via tokenizer
fallback. Table edges are usually correct; column edges are not.
Check sqlcg db info for the parsing mode distribution across all indexed queries.
MCP tools reference
| Tool | Description |
|---|---|
index_repo(repo_path, dialect) |
Index a directory of SQL files |
trace_column_lineage(table_col) |
Trace column lineage upstream |
find_table_usages(table_name) |
Find all queries that read a table |
get_upstream_dependencies(table_col) |
Full upstream dependency chain |
get_downstream_dependencies(table_col) |
Full downstream dependency chain |
search_sql_pattern(query) |
Full-text search across indexed SQL |
list_dialects_and_repos() |
List indexed repos and dialects (catalogue) |
db_info() |
Graph health, node counts, parse quality breakdown, warnings |
execute_cypher(query) |
Raw Cypher query against the graph |
LLM agent tip: call
db_info()before lineage queries to check thatSqlColumn > 0andwarningsis empty. Ifparse_quality["scripting_block"]is high, column lineage will be limited for those files — use table-level tools (find_table_usages,get_*_dependencies) instead.
CLI reference
Full option reference: docs/cli.md
sqlcg install # register MCP server in Claude Code
sqlcg db init # initialise graph database
sqlcg index <path> --dialect <d> # index SQL files
sqlcg index <path> --dialect auto # read dialect from .sqlcg.toml
sqlcg watch <path> # watch for file changes
sqlcg git install-hooks # install post-checkout hook
sqlcg gain # show usage metrics
sqlcg report # generate FP/error report
sqlcg mcp start # start MCP server manually
sqlcg version # show installed version
Supported dialects
snowflake · bigquery · postgres · ansi · tsql · dbt (via optional extra)
Development
git clone https://github.com/Warhorze/sql-code-graph
cd sql-code-graph
uv sync --all-extras
uv run pytest tests/unit
Issues
Bug reports and feature requests: github.com/Warhorze/sql-code-graph/issues
Questions and discussion: github.com/Warhorze/sql-code-graph/discussions
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sql_code_graph-0.3.0.tar.gz.
File metadata
- Download URL: sql_code_graph-0.3.0.tar.gz
- Upload date:
- Size: 424.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdc705508a44889bc402e57fb02a139d5013eac834be9b57d08321c9919f64ee
|
|
| MD5 |
14ba4dbd209c875c503b7123c96cb14f
|
|
| BLAKE2b-256 |
f76332f1fd84604174f6a702aca02d509238ad450120f2c25db8157a178d70ae
|
Provenance
The following attestation bundles were made for sql_code_graph-0.3.0.tar.gz:
Publisher:
release.yml on Warhorze/sql-code-graph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sql_code_graph-0.3.0.tar.gz -
Subject digest:
bdc705508a44889bc402e57fb02a139d5013eac834be9b57d08321c9919f64ee - Sigstore transparency entry: 1449265512
- Sigstore integration time:
-
Permalink:
Warhorze/sql-code-graph@61c72860dd452aca73a127dc664895b8584f49c8 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Warhorze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@61c72860dd452aca73a127dc664895b8584f49c8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sql_code_graph-0.3.0-py3-none-any.whl.
File metadata
- Download URL: sql_code_graph-0.3.0-py3-none-any.whl
- Upload date:
- Size: 71.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c30f208c869a133293496b01927708f170cac42775ea4f033d1d9ff1968df03
|
|
| MD5 |
41456509f908d3715b8809f7a5d69fa9
|
|
| BLAKE2b-256 |
9f8742ee645c177acfbbfbb711dbf49bf34094de6ac0ade3094ed8e4d7a7dc70
|
Provenance
The following attestation bundles were made for sql_code_graph-0.3.0-py3-none-any.whl:
Publisher:
release.yml on Warhorze/sql-code-graph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sql_code_graph-0.3.0-py3-none-any.whl -
Subject digest:
8c30f208c869a133293496b01927708f170cac42775ea4f033d1d9ff1968df03 - Sigstore transparency entry: 1449265537
- Sigstore integration time:
-
Permalink:
Warhorze/sql-code-graph@61c72860dd452aca73a127dc664895b8584f49c8 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Warhorze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@61c72860dd452aca73a127dc664895b8584f49c8 -
Trigger Event:
push
-
Statement type: