SQL code graph analyzer and lineage tracer

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

workhorze

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Database
- Software Development :: Libraries :: Python Modules

Project description

sql-code-graph

Pre-1.0 — expect breaking changes. APIs, CLI flags, and graph schema may change between releases without a deprecation period. Pin to an exact version in production. Re-indexing is always the migration path.

Dialect support. sqlcg is built on sqlglot, so in theory it can work for any dialect sqlglot parses. In practice we've only tested it against a production Snowflake warehouse (~1,400 SQL files) — and getting that one dialect working properly was already challenging. Other dialects (bigquery, postgres, ansi, tsql, dbt) may well work, but their lineage hasn't been validated, so expect rough edges. We'd gladly collaborate to get another dialect integrated — open an issue with a minimal, anonymised corpus we can test against.

SQL lineage and dependency analysis as an MCP server for Claude Code.

Indexes a directory of .sql files into a graph database and exposes lineage queries as MCP tools — so Claude can answer questions like "what tables does this view depend on?" or "where is orders.customer_id derived from?" without reading every file.

Quick start

Choose one:

Permanent install (recommended):

uv tool install sql-code-graph    # Fast, managed, no isolation needed
sqlcg install                     # Register MCP server in Claude Code

One-shot try (cold cache warning):

uvx sql-code-graph                # First run is slow (downloads deps)
                                  # Subsequent runs use cache, ~1s startup

Restart Claude Code, then inside your project ask:

Index my SQL files at ./sql --dialect snowflake

That's it. The MCP tools are now available to Claude in every conversation for that project.

Workflow (3 steps)

Initialize: sqlcg db init
Index: sqlcg index ./sql --dialect snowflake
Keep fresh: sqlcg git install-hooks (optional)

Full setup (recommended)

# 1. Install
pip install sql-code-graph

# 2. Register with Claude Code (~/.claude/settings.json)
sqlcg install

# 3. Restart Claude Code

# 4. Index your SQL repo
# Only git-tracked files are indexed — build artefacts, node_modules,
# and .venv are ignored automatically.
sqlcg db init
sqlcg index ./sql --dialect snowflake   # snowflake is the only tested dialect

# 5. (Optional) Keep the graph fresh on branch switches
cd /your/sql/repo
sqlcg git install-hooks

Step 5 installs a post-checkout git hook that re-indexes automatically whenever you switch branches. Without it the graph may be stale after a git checkout until you re-run sqlcg index manually.

Dialect config

To avoid passing --dialect every time, create .sqlcg.toml in your repo root:

[sqlcg]
dialect = "snowflake"   # the only tested dialect

The git hook and sqlcg index --dialect auto both read this file.

Add to your project CLAUDE.md (recommended)

Adding a short note to your project's CLAUDE.md helps Claude know the tools are available and when to use them:

## SQL lineage
This project uses sql-code-graph. MCP tools are available:
- `db_info` — check graph health and parse quality before running lineage queries
- `index_repo` — index or re-index a directory of SQL files
- `find_table_usages` — find all queries that read a table
- `trace_column_lineage` — trace where a column's value comes from
- `get_upstream_dependencies` / `get_downstream_dependencies` — dependency chains
- `search_sql_pattern` — full-text search across all indexed SQL
- `execute_cypher` — raw graph query for advanced analysis

The MCP server works without this — Claude can discover the tools on its own — but the CLAUDE.md snippet ensures they get used proactively.

Parse quality

After indexing, sqlcg gain shows a parse quality breakdown that tells you how much column-level lineage was extracted:

Quality	Meaning	Tools affected
`FULL`	Column-level lineage extracted	All tools work
`TABLE_ONLY`	Table edges only — no column lineage	`trace_column_lineage`, `get_*_dependencies` return empty
`SCRIPTING_FALLBACK`	sqlglot fell back to raw command node	Partial table edges; column lineage unavailable
`FAILED`	File failed to parse entirely	File invisible to all queries

Quality is shown per-file after sqlcg index and in sqlcg gain Section F. list_dialects_and_repos() warns when scripting fallback exceeds 20% of queries.

What causes TABLE_ONLY? Mostly SELECT * — sqlglot can't trace column names through a wildcard without knowing the source table's columns. See Resolving SELECT * below for how sqlcg expands wildcards automatically from your DDL and CTAS bodies.

What causes SCRIPTING_FALLBACK? Snowflake $$ procedure bodies or BEGIN…END scripting blocks. sqlglot parses the block as a raw Command node and extracts DML via tokenizer fallback. Table edges are usually correct; column edges are not.

Check sqlcg db info for the parsing mode distribution across all indexed queries.

Resolving SELECT *

A SELECT * ETL produces TABLE_ONLY parse quality because sqlglot needs the source table's column list to expand the wildcard into individual columns. sqlcg resolves this automatically, with no extra setup — there is no CSV to export or command to run.

Wildcards are expanded from two sources harvested while indexing:

DDL files — CREATE TABLE / CREATE VIEW statements give sqlcg the column list for any table they define.
Cross-file CTAS bodies — a CREATE TABLE … AS SELECT (or CTE) in one file is used to resolve SELECT * against that table in another file.

So the only thing you need to do is index the DDL alongside your ETLs — point sqlcg index at a path that contains both (e.g. the repo root, or both ddl/ and etl/). The more of your CREATE statements sqlcg can see, the more wildcards it resolves.

sqlcg index . --dialect snowflake   # index DDL + ETLs together

After indexing, sqlcg db info shows non-zero STAR_EXPANSION lineage edges, and trace_column_lineage returns results for queries that previously returned empty.

Note: earlier versions accepted an exported INFORMATION_SCHEMA CSV (sqlcg load-schema). That path was removed — profiling showed it added zero lineage edges over DDL + cross-file CTAS resolution on a real warehouse. DDL is now the single source of column truth; no CSV is needed or accepted.

MCP tools reference

Tool	Description
`index_repo(repo_path, dialect)`	Index a directory of SQL files
Lineage & dependencies
`trace_column_lineage(table_col)`	Trace a column's value upstream to its sources
`get_upstream_dependencies(table_col)`	Full upstream dependency chain
`get_downstream_dependencies(table_col)`	Full downstream dependency chain
`find_table_usages(table_name)`	Find all queries that read a table
`find_definition(table_qualified)`	Find where a table/view is defined
Change impact
`get_change_scope(table_qualified)`	Blast radius of changing a table (impact + risk)
`diff_impact(changed_files)`	What a set of changed files affects downstream
`get_backfill_order(table_qualified)`	Topological rebuild/backfill order
`scope_change(target)`	Synthesised change-scope summary for a target
Search & meta
`search_sql_pattern(query)`	Full-text search across indexed SQL
`list_dialects_and_repos()`	List indexed repos and dialects (catalogue)
`db_info()`	Graph health, node counts, parse quality breakdown, warnings, freshness (indexed SHA vs HEAD)
`execute_cypher(query)`	Raw Cypher query against the graph
`submit_feedback(...)`	Report a false positive/negative to improve metrics

Input format: lineage/dependency tools expect a schema-qualified column reference — schema.table.column (e.g. ba.orders.customer_id), not a bare table.column. Each returned node carries both name (the bare column) and table (the owning schema.table), so results are navigable without a second lookup.

Provenance fields: lineage edges now carry file, line, and expression (where the lineage was derived from), a confidence of 1.0 for plainly-parsed facts (lower for inferred edges, with a reason), and a table_kind (table / cte / derived / external) so CTE and derived aliases are distinguishable from real tables.

LLM agent tip: call db_info() before lineage queries to check that SqlColumn > 0 and warnings is empty. If parse_quality["scripting_block"] is high, column lineage will be limited for those files — use table-level tools (find_table_usages, get_*_dependencies) instead.

CLI reference

Full option reference: docs/cli.md

sqlcg install                          # register MCP server in Claude Code
sqlcg db init                          # initialise graph database
sqlcg index <path> --dialect snowflake # index SQL files (snowflake is the tested dialect)
sqlcg index <path> --dialect auto      # read dialect from .sqlcg.toml
sqlcg index <path> --profile           # index + print per-stage timing and slowest files
sqlcg index <path> --include-working-tree  # also index uncommitted changes (marks graph dirty)
sqlcg reindex <path> --from <sha> --to <sha>  # incremental resync of only changed files
sqlcg analyze unused                   # tables with no query references
sqlcg analyze upstream/downstream      # trace lineage from the CLI
sqlcg find table/column/pattern        # search the graph
sqlcg watch <path>                     # watch for file changes
sqlcg db info                          # graph stats + freshness (indexed SHA vs HEAD)
sqlcg git install-hooks                # install post-checkout + post-merge resync hooks
sqlcg gain                             # show usage metrics
sqlcg report                           # generate FP/error report
sqlcg mcp best-practices               # print the fact/heuristic boundary for the MCP tools
sqlcg mcp start                        # start MCP server manually
sqlcg mcp status                       # server status JSON (via control socket)
sqlcg mcp stop                         # stop the running MCP server gracefully
sqlcg mcp restart                      # stop the server (client must respawn it)
sqlcg version                          # show installed version

Reads while the server is running (v1.2.0)

KuzuDB allows a single writer, so while the MCP server is live it holds the database lock. CLI read commands (find, analyze, db info, list-repos, gain) automatically route their query through the running server over its control socket and return rows as usual — no flag, no config. When no server is running they open the database directly, exactly as before. If the server is mid-reindex the read waits for it to finish rather than failing with "Database is locked".

Supported dialects

sqlcg is built on sqlglot, so other dialects can be parsed in theory. Only Snowflake has been tested against a real corpus, though — the table reflects what's actually been exercised, not what sqlglot can do.

Dialect	Status
`snowflake`	✅ Tested against a production DWH (~1,400 files)
`bigquery`	⚠️ Unproven — parses via sqlglot, lineage not validated
`postgres`	⚠️ Unproven — parses via sqlglot, lineage not validated
`ansi`	⚠️ Unproven — parses via sqlglot, lineage not validated
`tsql`	⚠️ Unproven — parses via sqlglot, lineage not validated
`dbt`	⚠️ Unproven — via optional extra, lineage not validated

Want another dialect properly supported? We'd be glad to collaborate — open an issue with a minimal, anonymised corpus we can develop and test against. Getting Snowflake right was real work, so a representative corpus is what makes the difference.

Development

git clone https://github.com/Warhorze/sql-code-graph
cd sql-code-graph
uv sync --all-extras
uv run pytest tests/unit

Issues

Bug reports and feature requests: github.com/Warhorze/sql-code-graph/issues

Questions and discussion: github.com/Warhorze/sql-code-graph/discussions

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

workhorze

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Database
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

1.5.1

Jun 8, 2026

1.5.0

Jun 8, 2026

1.4.3

Jun 7, 2026

1.4.1

Jun 6, 2026

1.4.0

Jun 6, 2026

This version

1.3.0

Jun 4, 2026

1.2.2

Jun 3, 2026

1.1.3

Jun 1, 2026

1.1.0

Jun 1, 2026

1.0.2

May 30, 2026

1.0.1

May 30, 2026

1.0.0

May 30, 2026

0.3.0

May 6, 2026

0.2.1

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_code_graph-1.3.0.tar.gz (1.6 MB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sql_code_graph-1.3.0-py3-none-any.whl (183.0 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file sql_code_graph-1.3.0.tar.gz.

File metadata

Download URL: sql_code_graph-1.3.0.tar.gz
Upload date: Jun 4, 2026
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sql_code_graph-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`50955bd7042a1d72c20371e727a63c1284b37a2ce912e8f4f6f897ef6c3e3485`
MD5	`84c0b5787e21626b135d7666b5b884fb`
BLAKE2b-256	`904fb7c0920625e7f34334e160ec3e496b1cd94917a3e00bb6bae5b4934cc9d0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_code_graph-1.3.0.tar.gz:

Publisher: release.yml on Warhorze/sql-code-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sql_code_graph-1.3.0.tar.gz
- Subject digest: 50955bd7042a1d72c20371e727a63c1284b37a2ce912e8f4f6f897ef6c3e3485
- Sigstore transparency entry: 1723294433
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: Warhorze/sql-code-graph@c7f1dfef31938e30328c3334124ec199bbed6634
- Branch / Tag: refs/tags/v1.3.0
- Owner: https://github.com/Warhorze
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c7f1dfef31938e30328c3334124ec199bbed6634
- Trigger Event: push

File details

Details for the file sql_code_graph-1.3.0-py3-none-any.whl.

File metadata

Download URL: sql_code_graph-1.3.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 183.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sql_code_graph-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8bf0132f0feefa3803309a6019457411d91687fe5ff1c4a1a3103b79373392bf`
MD5	`d9033fbd37666cec65f34b2aedb381f1`
BLAKE2b-256	`25ecbe8527bc04877eb08e0ed72f323208a05dceba33b285e1b0740f88332a2b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_code_graph-1.3.0-py3-none-any.whl:

Publisher: release.yml on Warhorze/sql-code-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sql_code_graph-1.3.0-py3-none-any.whl
- Subject digest: 8bf0132f0feefa3803309a6019457411d91687fe5ff1c4a1a3103b79373392bf
- Sigstore transparency entry: 1723294537
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: Warhorze/sql-code-graph@c7f1dfef31938e30328c3334124ec199bbed6634
- Branch / Tag: refs/tags/v1.3.0
- Owner: https://github.com/Warhorze
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c7f1dfef31938e30328c3334124ec199bbed6634
- Trigger Event: push

sql-code-graph 1.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

sql-code-graph

Quick start

Workflow (3 steps)

Full setup (recommended)

Dialect config

Add to your project CLAUDE.md (recommended)

Parse quality

Resolving SELECT *

MCP tools reference

CLI reference

Reads while the server is running (v1.2.0)

Supported dialects

Development

Issues

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance