SQL inspection CLI: static checks, lineage, PR review, cost diff
Project description
querylens
Catch risky analytics SQL before it runs and explain what complex queries actually do.
querylens scans SQL repositories, ranks the worst files first, reviews pull requests for regressions, and highlights reliability and cost risks such as full table scans, wide joins, missing filters, and expensive query patterns before they merge and hit production.
Quick Demo
Input SQL:
SELECT
c.customer_id,
SUM(o.amount) AS revenue
FROM customers c
JOIN orders o
ON c.id = o.customer_id
GROUP BY c.customer_id
Command:
cargo run -- lineage examples/revenue.sql
Output:
examples/revenue.sql
Projections:
revenue
└─ SUM(orders.amount)
o.customer_id
└─ orders.customer_id
Install
Prebuilt binary:
curl -L -o querylens.tar.gz https://github.com/kraftaa/querylens/releases/download/v0.1.11/querylens-macos-aarch64.tar.gz
tar -xzf querylens.tar.gz
./querylens --help
Homebrew:
brew install kraftaa/tap/querylens
Build from source:
cargo build
# enable AWS Bedrock provider if needed
# cargo build --features bedrock
Scan estimates (optional)
- Athena:
querylens explain --athena-query-execution-id <id> --athena-region us-east-1 --file your.sql(usesaws athena get-query-executionand your AWS creds). - Manual override:
--scan-tb 1.5or--scan-bytes 1500000000000. - Table stats file (offline, no cloud calls):
--stats-file stats.json
Example stats.json:
{
"tables": {
"athena.sales": {
"total_bytes": 1200000000000,
"row_count": 3500000000,
"partition_columns": ["ds"],
"partitions_per_year": 365
},
"athena.customers": {
"total_bytes": 8000000000
}
}
}
Heuristics:
- No WHERE → assume full scan.
- Filters on non-partition columns → ~0.7 of table size.
- Equality/range on partition column → small slice (2–5% by default, or days/partitions if provided).
- Sum per-table bytes to show “Estimated scan” and PR cost deltas.
Features
| Feature | Description |
|---|---|
| Static SQL checks | Detect risky patterns (SELECT *, possible Cartesian joins, wildcard LIKE, etc.) |
| Column lineage | Trace projection, filter, and join lineage |
| Query explanation | Summarize tables, joins, aggregations, and likely query meaning |
| Table extraction | List tables used by a query |
| Folder scanning | Analyze a directory of SQL files |
| Rule controls | Disable rules or override severity by rule_id |
| Athena mode | Extra heuristics for partition/cost patterns |
| CI thresholds | Fail on `low |
Why This Exists
SQL pipelines grow quickly and are hard to review.
When a metric looks wrong, teams need to:
- trace where output columns come from
- catch risky query patterns before they hit production
- understand query intent quickly
querylens helps with deterministic checks and optional LLM explanations.
Detect Risky SQL Patterns
Example query:
SELECT *
FROM orders o
JOIN customers c
Command:
cargo run -- --file examples/bad_join.sql --static-only
Expected findings include:
SELECT *- possible Cartesian join (
JOINwithoutON/USING)
Example subquery pattern:
cargo run -- --file examples/subquery.sql --static-only
Expected suggestion includes:
- consider replacing
IN (SELECT ...)withJOINorEXISTS
How Is This Different From dbt?
dbt builds and runs transformation pipelines.
querylens analyzes SQL itself:
- detect risky query patterns
- trace lineage in query text
- explain query logic
They complement each other: dbt for orchestration/modeling, querylens for query inspection.
Distribution
GitHub Releases (prebuilt binaries)
Tagging v* triggers .github/workflows/release.yml and publishes:
querylens-macos-aarch64.tar.gzquerylens-linux-x86_64.tar.gzSHA256SUMS
Create a release tag:
git tag v0.1.2
git push origin v0.1.2
Homebrew tap
Use the formula template at:
packaging/homebrew/querylens.rb
For each release:
- Set
version(withoutv). - Fill
__SHA256_MACOS_AARCH64__and__SHA256_LINUX_X86_64__fromSHA256SUMS. - Commit the formula in your tap repo (for example
kraftaa/homebrew-tap) asFormula/querylens.rb. - Users install with:
brew install kraftaa/tap/querylens
Usage
Note: all analysis commands are static-only; they never execute your SQL against a database.
Subcommands
cargo run -- lineage <file.sql>
cargo run -- lineage <file.sql> --column revenue
cargo run -- risk <file.sql>
cargo run -- risk <file.sql> --summary-only
cargo run -- guard <file.sql> --max-risk high --deny-rule CROSS_JOIN
cargo run -- simulate <file.sql> --limit 100
cargo run -- tables <file.sql>
cargo run -- explain <file.sql>
cargo run -- analyze <dir> --glob "*.sql"
cargo run -- analyze <dir> --glob "*.sql" --changed-only
cargo run -- analyze <dir> --glob "*.sql" --changed-only --changed-base main
cargo run -- analyze <dir> --glob "*.sql" --top 10
cargo run -- analyze <dir> --glob "*.sql" --top 10 --verbose
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql"
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql" --ci
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql" --markdown
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql" --cost-diff --stats-file stats.json
PR review mode
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql"
Example output:
SQL Inspect PR Review
Base: main
Head: HEAD
PR status: PASS
No new SQL risk regressions detected.
1 changed SQL file
0 new HIGH-risk queries
0 query lost partition filter
0 ORDER BY without LIMIT regressions
0 possible join amplification regressions
0 files increased estimated scan cost
File: models/example.sql
Previous risk: HIGH
Current risk: HIGH
Risk trend: unchanged
Still risky because:
- SELECT_STAR
Estimated scan: unknown -> unknown
Estimated scan delta: unknown
Compact CI mode:
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql" --ci
PR status: PASS
Changed SQL files: 1
New HIGH-risk queries: 0
Partition filter regressions: 0
ORDER BY without LIMIT regressions: 0
Join amplification regressions: 0
Files with higher estimated scan: 0
Markdown mode for PR comments:
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql" --markdown
Cost regression mode:
cargo run -- pr-review --base main --head HEAD --dir models --glob "*.sql" --cost-diff --stats-file stats.json
Example output:
SQL Cost Regression
1 changed SQL file
File: models/revenue.sql
Estimated scan change:
Before: 22 GB
After: 1.40 TB
Increase: 63.6x
Cost regression: HIGH
Reason:
Filter removed: orders.order_date >= DATE '2026-01-01'
Recommendation:
Restore a selective WHERE or partition predicate.
Repo scan summary
analyze now highlights severity shape and hotspots, not only rule counts.
Useful flags:
--top <N>number of hotspot files to display (default5)--changed-onlylimit to changed files in working tree/staging--changed-base <ref>(with--changed-only) limit to committed diff vs ref (for examplemain)
Example:
SQL Inspect Report
Scope: current selection
Analyzed 225 SQL files
Top risks:
1. 37 HIGH-risk files
2. 117 files likely scan full tables
3. 18 files have complex multi-join patterns
4. 12 files contain CROSS JOIN or likely Cartesian behavior
Severity shape:
HIGH: 37 files
MEDIUM: 88 files
LOW: 100 files
Most severe files:
- models/a.sql HIGH SELECT_STAR, FULL_TABLE_SCAN_LIKELY
- models/b.sql HIGH WIDE_JOIN_GRAPH, MISSING_WHERE
Verbose mode:
cargo run -- analyze . --glob "*.sql" --top 10 --verbose
Query Risk Scanner
cargo run -- risk examples/bad_join.sql
cargo run -- risk examples/bad_join.sql --summary-only
cargo run -- risk examples/bad_join.sql --scan-tb 2.3
cargo run -- risk examples/bad_join.sql --scan-bytes 2300000000000
Example output:
examples/bad_join.sql
Risk score: HIGH
Reasons:
- select star: SELECT *
- missing where: No WHERE clause
Estimated scan: 2.30 TB
Compact risk summary:
SQL Inspect Risk
File: examples/bad_join.sql
Risk: HIGH
Estimated scan: unknown
Top reasons:
- select star: SELECT *
- missing where: No WHERE clause
- full table scan likely: Likely full table scan
Use one of:
--scan-tb <value>for TB units directly--scan-bytes <value>to auto-convert bytes to TB--athena-query-execution-id <id>to fetchDataScannedInBytesfrom Athena viaawsCLI--athena-region <region>optional region override for Athena lookup--stats-file <path.json>to estimate scan bytes from table-level stats
Example stats.json:
{
"tables": {
"orders": { "bytes": 1000000000000 },
"customers": 200000000000
}
}
Query explanation
cargo run -- explain examples/query.sql
Example output:
Query explanation
Meaning: total amount per id, created at, email
Tables: orders, customers, order_items
Join: customers.id = orders.customer_id; order_items.order_id = orders.id
Aggregation: SUM(order_items.quantity * order_items.unit_price) AS total_amount
Block dangerous queries
cargo run -- guard examples/bad_join.sql --max-risk high --deny-rule CROSS_JOIN --deny-rule FULL_TABLE_SCAN_LIKELY
Example output:
SQL Inspect Guard
Policy: default
Status: FAIL
Risk: HIGH
Blocking violations
- FULL_TABLE_SCAN_LIKELY
- SELECT_STAR
Why blocked
This query shape is likely to scan most rows and can materially increase cost
Exit code: 2
Exit code is 2 when blocked, so this works directly in CI.
Simulate safer preview query
cargo run -- simulate examples/query.sql --limit 100
Column-specific lineage
cargo run -- lineage examples/revenue.sql --column revenue
Example output:
examples/revenue.sql
Projections:
revenue
└─ SUM(orders.amount)
Main Analyze Command (LLM + static)
Provide one of:
--sql "<query>"--file <path>--dir <path>
Optional:
--provider openai|bedrock|local--dialect generic|athena--static-only--fail-on low|medium|high--glob "*.sql"--config querylens.toml--json
OpenAI example:
export OPENAI_API_KEY="..."
export OPENAI_MODEL="gpt-4.1-mini"
cargo run -- --provider openai --file examples/query.sql
Bedrock example:
export AWS_REGION="us-east-1"
export BEDROCK_MODEL_ID="anthropic.claude-3-5-sonnet-20241022-v2:0"
cargo run -- --provider bedrock --file examples/query.sql
Local OpenAI-compatible server example:
export LOCAL_LLM_BASE_URL="http://127.0.0.1:8080"
export LOCAL_LLM_MODEL="llama_instruct.gguf"
cargo run -- --provider local --file examples/query.sql --json
Config
Create querylens.toml:
dialect = "athena"
fail_on = "high"
glob = "*.sql"
suggest_limit_for_exploratory = true
static_only = false
[rules.SELECT_STAR]
enabled = true
severity = "high"
[rules.MISSING_WHERE]
enabled = true
severity = "medium"
Rule controls:
enabled = falsedisables a finding byrule_idseverity = "low|medium|high"overrides severity
Example:
[rules.SELECT_STAR]
enabled = false
[rules.MISSING_WHERE]
severity = "low"
CI Usage
Fail a build when risky SQL is found:
cargo run -- --dir models --dialect athena --fail-on high
Or subcommand mode:
cargo run -- analyze models --glob "*.sql"
cargo run -- analyze models --glob "*.sql" --changed-only
Inline rule suppression
Add suppression comments directly in SQL when needed:
-- querylens: disable=SELECT_STAR,MISSING_WHERE
SELECT * FROM some_small_reference_table
Suppression applies to matching rule_id values for that file.
Examples Folder
Ready-to-run examples:
examples/query.sqlexamples/revenue.sqlexamples/bad_join.sqlexamples/subquery.sqlexamples/silver_proposal_attachments.sql
Project Layout
querylens/
Cargo.toml
src/
main.rs
analyzer.rs
insights.rs
config.rs
prompt.rs
providers/
openai.rs
bedrock.rs
local.rs
examples/
Troubleshooting
- missing env vars: set required provider vars
- unexpected model JSON shape: run with
--jsonand inspect response - no secrets in repo:
.envand.env.*are gitignored
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file querylens-0.1.13-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: querylens-0.1.13-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c23c5ee0d59b5b78baad876495840d35fc278e99a3b4c56715b46b5b66cd4853
|
|
| MD5 |
1395676a47cbf066ac75f809ac28d6eb
|
|
| BLAKE2b-256 |
551e09209ef35c2a814e2daf60bbafdec7fd7ecaae1c2ccd14dd1b7168add6c3
|
Provenance
The following attestation bundles were made for querylens-0.1.13-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on kraftaa/querylens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
querylens-0.1.13-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
c23c5ee0d59b5b78baad876495840d35fc278e99a3b4c56715b46b5b66cd4853 - Sigstore transparency entry: 1162468212
- Sigstore integration time:
-
Permalink:
kraftaa/querylens@d88dc5017e667519bdbbf810259ce4f33079c11e -
Branch / Tag:
refs/tags/v0.1.13 - Owner: https://github.com/kraftaa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d88dc5017e667519bdbbf810259ce4f33079c11e -
Trigger Event:
push
-
Statement type:
File details
Details for the file querylens-0.1.13-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: querylens-0.1.13-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 2.3 MB
- Tags: Python 3, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df556d9f37cef00ffb795b877fe1e5742535653ae6d94d5bd2eb5d1f37dc7700
|
|
| MD5 |
80030168b913651c6bedf29dac28600f
|
|
| BLAKE2b-256 |
a4b233d1b48947997b8e0623b563edcb3b3fddeb2116b0f4fac73b465fa4ac69
|
Provenance
The following attestation bundles were made for querylens-0.1.13-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on kraftaa/querylens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
querylens-0.1.13-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
df556d9f37cef00ffb795b877fe1e5742535653ae6d94d5bd2eb5d1f37dc7700 - Sigstore transparency entry: 1162468288
- Sigstore integration time:
-
Permalink:
kraftaa/querylens@d88dc5017e667519bdbbf810259ce4f33079c11e -
Branch / Tag:
refs/tags/v0.1.13 - Owner: https://github.com/kraftaa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d88dc5017e667519bdbbf810259ce4f33079c11e -
Trigger Event:
push
-
Statement type:
File details
Details for the file querylens-0.1.13-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: querylens-0.1.13-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9115c6eac9bb46342bbdd40d6b184a311ebd34eff2a61114be5f005716ff986
|
|
| MD5 |
d710897caf4865d663570de22bfbde7f
|
|
| BLAKE2b-256 |
7175ba4a56453c35144b20a6d8c35da1235adb5102abd613fd88ebb61b9c47a9
|
Provenance
The following attestation bundles were made for querylens-0.1.13-py3-none-macosx_11_0_arm64.whl:
Publisher:
release.yml on kraftaa/querylens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
querylens-0.1.13-py3-none-macosx_11_0_arm64.whl -
Subject digest:
a9115c6eac9bb46342bbdd40d6b184a311ebd34eff2a61114be5f005716ff986 - Sigstore transparency entry: 1162468151
- Sigstore integration time:
-
Permalink:
kraftaa/querylens@d88dc5017e667519bdbbf810259ce4f33079c11e -
Branch / Tag:
refs/tags/v0.1.13 - Owner: https://github.com/kraftaa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d88dc5017e667519bdbbf810259ce4f33079c11e -
Trigger Event:
push
-
Statement type:
File details
Details for the file querylens-0.1.13-py3-none-macosx_10_12_x86_64.whl.
File metadata
- Download URL: querylens-0.1.13-py3-none-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.3 MB
- Tags: Python 3, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f6e62e53264ec170ee54fd368e7596ee5df53b3628be4dd074910d16acfb107
|
|
| MD5 |
c9c8724693333be73114525f208f023d
|
|
| BLAKE2b-256 |
10170fbdf8b681877e8a8b42c0acc042062c3ee010c80407b4e8828f92849586
|
Provenance
The following attestation bundles were made for querylens-0.1.13-py3-none-macosx_10_12_x86_64.whl:
Publisher:
release.yml on kraftaa/querylens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
querylens-0.1.13-py3-none-macosx_10_12_x86_64.whl -
Subject digest:
5f6e62e53264ec170ee54fd368e7596ee5df53b3628be4dd074910d16acfb107 - Sigstore transparency entry: 1162468337
- Sigstore integration time:
-
Permalink:
kraftaa/querylens@d88dc5017e667519bdbbf810259ce4f33079c11e -
Branch / Tag:
refs/tags/v0.1.13 - Owner: https://github.com/kraftaa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d88dc5017e667519bdbbf810259ce4f33079c11e -
Trigger Event:
push
-
Statement type: