Skip to main content

SQL safety validator for LLM-generated queries

Project description

sql-guardrail

CI PyPI version Python versions License: MIT

A safety check for SQL that an LLM wrote.

If your app lets a language model write SQL, sql-guardrail reads each query before it runs and blocks the dangerous ones — a write when only reads are allowed, a WHERE 1=1, a missing tenant filter, a banned column, a runaway scan — with a clear reason you can log or feed back to the model.

It never touches your database, never rewrites your SQL, and never raises. You write a policy; you call verify(); you get back allowed plus a list of reasons. If it says yes, you run the exact string you gave it.

from sqlguard import Policy, verify

policy = Policy.from_yaml("policy.yml")          # load once at startup

sql = my_llm_app(question)                       # the query your model wrote
result = verify(sql, policy, context={"tenant_id": user.tenant_id})

if not result.allowed:
    raise PermissionError([v.message for v in result.violations])
conn.execute(sql)

One guardrail, not the only one. Pair it with a read-only DB role, row-level security, and a statement timeout — see Where it fits.

Why this exists

SQL injection used to be solved: you wrote the query once with placeholders and passed user input as values, so input could never change the shape of the query. Text-to-SQL breaks that — the LLM writes the whole query as text every time, so there's no fixed shape to protect and prepared statements can't help. A read-only role and row-level security catch permission errors, but the database will still run SELECT password_hash ... WHERE 1=1 and fail silently, telling you nothing. sql-guardrail is the fast first check that blocks the obvious problems with a clear reason, before the query reaches the database.

Install

uv add sql-guardrail        # or: pip install sql-guardrail

Python 3.10+. Runtime deps: sqlglot, pydantic, PyYAML, click.

The policy

A policy lists which tables and columns the LLM may touch and the rules every query must follow. A small one for a multi-tenant orders table:

# policy.yml
read_only: true                   # only reads, no writes

tables:
  - name: orders
    allow_columns: [id, product_name, account_id, total]
    require_predicate:            # every query must filter by tenant
      column: account_id
      op: "="
      value: "${tenant_id}"       # filled per request from context=

forbid:
  select_star: true               # no `SELECT *`
  always_true_predicates: true    # no `WHERE 1=1`

limits:
  max_limit_value: 1000           # no `LIMIT 1000000`

context carries per-request values that fill in ${...} placeholders. The tenant_id must come from the authenticated session, never the LLM prompt — if the LLM picks its own tenant id, every tenant rule collapses.

See examples/policy.yml for a fully-annotated policy and examples/backend_integration.py for a FastAPI-style request handler.

Validate the string you actually run. If you strip comments or normalize the SQL after verify(), call verify() again on the final form.

What it checks

  • Statement shapeSELECT only (DML when read_only=false); no multi-statement, DDL/DCL, data-modifying CTEs, locks, or SELECT INTO.
  • Always-true filtersWHERE 1=1, id = id, every constant-folded shape, across WHERE / JOIN ON / HAVING / QUALIFY / MERGE.
  • Tables & columns — per-table/column allowlists, plus a denylist that blocks a column in any position (even JOIN USING); no SELECT *.
  • Tenant isolationrequire_predicate forces every alias of a table to carry your tenant filter in WHERE / INNER JOIN ON.
  • Function allowlist (opt-in) — every function call must be approved; closes pg_sleep, pg_read_file, and unvetted helpers.
  • DoS caps — joins, nesting depth, LIMIT, OFFSET, SQL length, AST size; blocks cartesian and recursive queries.

Full field reference and per-check semantics: docs/POLICY.md.

Where it fits

Safety is layers. Each catches different problems; no single one is enough.

┌──────────────────────────────────────────┐
│ 1. Prompt design + tool wrappers         │  Shape what the LLM tries.
├──────────────────────────────────────────┤
│ 2. sql-guardrail.verify()   ← THIS LIB   │  Check the SQL against a policy.
├──────────────────────────────────────────┤  (only if allowed)
│ 3. Least-privileged database user        │  Read-only, no catalog access.
├──────────────────────────────────────────┤
│ 4. Row-level security (RLS)              │  DB rule pinned to the user.
├──────────────────────────────────────────┤
│ 5. Statement timeout                     │  Slow queries get killed.
└──────────────────────────────────────────┘
                     ▼  Database

This library is layer 2: fast, in-process, no network calls. It blocks obviously bad queries with a reason you can act on. Layers 3–5 are the hard outer wall — even if a query slips past, a read-only user can't write, RLS hides other tenants' rows, and the timeout kills runaways. If this library is all you have, you've improved things but you're not safe. Set up the database layers too.

What it does NOT catch

  • Bare INSERT ... VALUES / MERGE tenant isolation — no WHERE on the target row; use DB CHECK constraints or RLS write policies.
  • Column-arithmetic tautologies (id*0=0) — safe by construction; lean on require_predicate + RLS.
  • Plan-time cost surprises — only the planner knows; use statement_timeout.
  • Bugs in your auth — if the tenant id comes from the LLM, not the session, the library can't know.

More in docs/POLICY.md.

CLI & HTTP server

Beyond the Python API, sqlguard verify validates a file, stdin, or a CI step:

echo 'SELECT id FROM orders WHERE 1=1' | sqlguard verify --policy policy.yml --stdin

Or run the HTTP sidecar — same rules, same response shape — from the image:

docker run --rm -p 8000:8000 \
  -v "$(pwd)/policy.yml:/etc/sqlguard/policy.yml:ro" \
  ghcr.io/nickusevich/sql-guardrail:latest

POST /verify returns the verdict in the body; GET /docs is the OpenAPI UI. The server does no auth — keep it internal. Exit codes, every endpoint, and env config live in docs/DEPLOYMENT.md.

Docs

  • docs/POLICY.md — every policy field and what each check does, the result object, and what the library doesn't catch.
  • docs/DEPLOYMENT.md — CLI flags and exit codes, HTTP endpoints, env-var config, Docker.
  • docs/ARCHITECTURE.md — the pipeline, module map, and the four structural defenses.
  • CONTRIBUTING.md — local checks, test layout, how to add a rule.
  • SECURITY.md — how to report a bypass privately.

The test suite is 500+ tests organized by attack class, run on Python 3.10–3.13 across Ubuntu and macOS with ruff, mypy --strict, and coverage. Every tests/malicious/*.sql must be denied and every tests/benign/*.sql allowed.

Versioning

0.5.x, SemVer. Patch bumps fix bugs (including security bypasses); minor bumps may add fields without changing existing meaning. 1.0 will commit to the public API: verify, Policy, VerificationResult, Violation, ViolationCode, ViolationCategory.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_guardrail-0.5.2.tar.gz (94.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_guardrail-0.5.2-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file sql_guardrail-0.5.2.tar.gz.

File metadata

  • Download URL: sql_guardrail-0.5.2.tar.gz
  • Upload date:
  • Size: 94.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sql_guardrail-0.5.2.tar.gz
Algorithm Hash digest
SHA256 648233cf75d2f5076b5e6c8f6d1118f3dcc3f099834d7f910612d00031771143
MD5 b5dff886e05e45598a1437451acea531
BLAKE2b-256 e6c452ca3d5e7ceca657b6a2bba014be27fa536b2ae5292bf63890ca760dda5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_guardrail-0.5.2.tar.gz:

Publisher: release.yml on nickusevich/sql-guardrail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql_guardrail-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: sql_guardrail-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 54.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sql_guardrail-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7937c03fd39ed3999dcf7a213d1840efeb682dae29a3946fd325ece7ab3c34f6
MD5 47065e166980ec02a81bda02d35a89d5
BLAKE2b-256 4432c77b26f60113de10683a9e85595f51ea67b23cca00dc4c069b9c0018ff4b

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_guardrail-0.5.2-py3-none-any.whl:

Publisher: release.yml on nickusevich/sql-guardrail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page