Natural-language SQL for production-scale Postgres schemas
Project description
PromptQuery
Natural-language SQL for production-scale Postgres schemas.
PromptQuery is an open-source CLI that lets you query Postgres in plain English — engineered for real production schemas with hundreds of tables, not toy demos. It introspects your schema, generates SQL, shows it for confirmation, and runs it read-only.
$ prq postgresql://prod-db/mycompany
✓ 675 tables found (sql: openai/gpt-4o, selector: openai/gpt-4o-mini)
PromptQuery — ask a question in plain English, or type "exit".
? unpaid invoices over EUR 1000 with the customer name
Selecting from 50 candidates...
Generating SQL...
Using 14 tables: account_move, res_partner, account_payment, ...
SELECT am.name AS invoice,
am.amount_total AS total,
p.name AS customer
FROM account_move am
JOIN res_partner p ON p.id = am.partner_id
WHERE am.move_type = 'out_invoice'
AND am.payment_state IN ('not_paid', 'partial')
AND am.amount_total > 1000
ORDER BY am.amount_total DESC;
Run? [y/N] y
invoice │ total │ customer
───────────────┼─────────┼──────────────────────
INV/2026/0042 │ 1899.00 │ Marco Rossi
INV/2026/0067 │ 1299.00 │ Acme Industries SRL
2 row(s)
The numbers
Two independent production-scale schemas. SQL generation: gpt-4o. Table selection: gpt-4o-mini.
Odoo 18 ERP — 675 tables (eval/fixtures/odoo.schema.json)
| Pipeline | Accuracy | Tokens / query | Latency |
|---|---|---|---|
| Naive (full schema in prompt) | 84.0 % | ~50,000 | 3.4 s |
| PromptQuery v0.1 (TF-IDF only) | 76.0 % | ~2,000 | 2.0 s |
| PromptQuery v0.2 (TF-IDF + LLM selector) | 100.0 % | ~5,000 | 5.6 s |
EMBL-EBI RNAcentral — 216 tables, biology domain, public read-only DB
| Pipeline | Accuracy | Tokens / query | Latency |
|---|---|---|---|
| Naive (full schema in prompt) | 82.0 % | ~22,000 | 3.0 s |
| PromptQuery v0.1 (TF-IDF only) | 74.0 % | ~2,000 | 1.9 s |
| PromptQuery v0.2 (TF-IDF + LLM selector) | 94.0 % | ~5,000 | 4.8 s |
Pattern across both benchmarks: PromptQuery v0.2 wins by +12 to +16 percentage-points over the naive "stuff the whole schema into a prompt" baseline, at ~5-10× lower per-query token cost, validated independently on two different production schemas and domains.
Receipts in eval/results_odoo_v2.json and eval/results_rnacentral.json. Reproduce both with one command each — see Benchmark below.
Quick start
pip install promptquery
# Set ONE of these (PromptQuery auto-detects):
export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
# Connect and start asking:
prq postgresql://localhost/mydb
prq and pquery are short aliases for promptquery. All three commands work identically.
One-shot mode (scripting / CI)
--query skips the REPL and returns machine-readable output. Progress messages go to stderr, results to stdout — pipe-friendly:
prq --query "how many users in Italy" postgresql://localhost/mydb # JSON to stdout
prq --query "top 10 orders by total" --out csv postgresql://... > out.csv
prq --query "..." --out table postgresql://... # rich-formatted table
Exit codes: 0 success · 1 LLM/connection error · 2 safety-guard rejection · 3 execution error.
Try it without setting up a database
EMBL-EBI publishes a public read-only Postgres with real biological RNA-sequence data (216 tables). Install PromptQuery and try it in under a minute:
pip install promptquery
export OPENAI_API_KEY=...
prq --query "show me the 5 latest blog posts with their title" \
--selector-model gpt-4o-mini \
postgresql://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs
(Credentials above are EMBL-EBI's public read-only — published for tutorial use.)
How it works
question
│
▼
┌───────────────────┐
│ TF-IDF (stemmed) │ Microseconds. Free. Surfaces ~50 candidate tables
│ retriever │ by lexical match on names, columns, and comments.
└────────┬──────────┘
│
▼
┌───────────────────┐
│ LLM table │ One small LLM call. Handles semantic mismatches
│ selector │ TF-IDF cannot: "invoice" → `account_move`,
│ (cheap model) │ "shipment" → `stock_picking`. Picks ~15 tables.
└────────┬──────────┘
│
▼
┌───────────────────┐
│ FK-graph │ One hop outward + inward to pick up join targets
│ expansion │ the question didn't name explicitly. Cap at 25.
└────────┬──────────┘
│
▼
┌───────────────────┐
│ SQL generator │ Your real LLM call. Receives ~25 tables, not 675.
│ (frontier model) │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ Safety guard │ sqlglot validator: rejects anything that isn't a
│ (sqlglot) │ pure SELECT/CTE/UNION. Catches CTEs that hide DML.
└────────┬──────────┘
│
▼
"Run? [y/N]" → execute against a read-only Postgres session
See ARCHITECTURE.md for the deep dive (file inventory, design bets, the patent-landmine non-goals).
Configuration
| Flag | Default | Description |
|---|---|---|
--model |
auto-detect | LLM for SQL generation (e.g. gpt-4o, claude-sonnet-4-6, anthropic/claude-opus-4-7) |
--selector-model |
same as --model |
LLM for the table-selector step. A cheaper model is recommended (e.g. gpt-4o-mini) |
--top-k |
50 | TF-IDF candidates passed to the LLM selector |
--select |
15 | Tables the LLM selector picks from those candidates |
--max-tables |
25 | Cap after FK expansion — what the SQL generator actually sees |
--no-selector |
— | Skip the LLM selector (v0.1 behaviour: TF-IDF + FK only) |
-y, --yes |
— | Skip the confirmation prompt before running |
Environment
| Variable | Purpose |
|---|---|
OPENAI_API_KEY |
Use OpenAI as the LLM provider |
ANTHROPIC_API_KEY |
Use Anthropic as the LLM provider |
If both are set, Anthropic is preferred. Override either with --model anthropic/<name> or --model openai/<name>.
Safety
PromptQuery has two independent layers so a write is impossible, even if one layer fails:
- Session-level: every Postgres session opens with
default_transaction_read_only = onand a 60-secondstatement_timeout. The database itself refuses non-SELECT operations. - Pre-execution: every generated query is parsed with
sqlglotand rejected unless it's a singleSELECT/WITH/UNION/INTERSECT/EXCEPT. The validator also catches CTEs that hide DML (WITH x AS (DELETE …) SELECT * FROM x) and dangerous-function calls (pg_terminate_backend,set_config,lo_export,dblink_exec).
Every query is also shown to you before it runs. Confirm with y.
Benchmark
The eval suite is part of the repo and reproducible:
# End-to-end (real Postgres + execution-equality scoring on the shop schema):
docker compose -f eval/docker-compose.yml up -d
PGPASSWORD=promptquery psql -h 127.0.0.1 -p 55432 -U promptquery -d shop \
-f eval/fixtures/shop.sql \
-f eval/fixtures/shop_seed.sql
python -m eval.end_to_end --model gpt-4o --pad 0 --pad 200
# Parsing-mode on Odoo 18 (675 tables):
python -m eval.parsing_bench \
--fixture eval/fixtures/odoo.schema.json \
--questions eval.questions.odoo \
--model gpt-4o --selector-model gpt-4o-mini
# Parsing-mode on EMBL-EBI's public RNAcentral (216 tables, real biology data):
python -m eval.parsing_bench \
--fixture eval/fixtures/rnacentral.schema.json \
--questions eval.questions.rnacentral \
--model gpt-4o --selector-model gpt-4o-mini
The committed eval/results_*.json files are receipts of every bench we've run — including unfavourable ones, on purpose.
See eval/END_TO_END.md for the harness internals.
What PromptQuery does NOT do (yet)
- No writes.
SELECTonly, by design and by belt-and-suspenders. - Postgres only. MySQL and SQLite are on the v0.4 roadmap.
- One database at a time. No multi-DB sessions.
- No data visualisation. Rows out, that's it. Pipe to
csv/jq/ your tool of choice.
Roadmap
- v0.2 (shipped) — LLM-assisted table selector, stemmed TF-IDF.
- v0.3 — local LLMs (Ollama), schema anonymisation (GDPR-by-default), query-history-as-few-shot.
- v0.4 — MySQL + SQLite adapters, MCP server mode, public competitor benchmark.
Development
git clone https://github.com/Cyberfilo/promptquery
cd promptquery
python3.12 -m venv .venv
.venv/bin/pip install -e ".[dev,openai]"
# Run the unit tests:
.venv/bin/pytest
# Run the retrieval eval (no API key needed, no DB needed):
.venv/bin/python -m eval.retrieval
37 tests, all pure-Python — no live database or API key required for the core suite.
License
Apache-2.0. Apache-2.0 was chosen over MIT specifically for its explicit patent grant and automatic termination clauses, which matter for a tool that operates in an active NL-to-SQL patent landscape.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptquery-0.2.1.tar.gz.
File metadata
- Download URL: promptquery-0.2.1.tar.gz
- Upload date:
- Size: 172.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e367421d5faca7ed33d707ca5fbeb7961c808903151dd051f09b72192bbfdb73
|
|
| MD5 |
0d81d1f8edf61fa3e3ae94fbf8b59004
|
|
| BLAKE2b-256 |
f6b0371b08f59b964c46f3286fa492156d715bd45d26925d31660a22b1799879
|
Provenance
The following attestation bundles were made for promptquery-0.2.1.tar.gz:
Publisher:
publish.yml on Cyberfilo/promptquery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptquery-0.2.1.tar.gz -
Subject digest:
e367421d5faca7ed33d707ca5fbeb7961c808903151dd051f09b72192bbfdb73 - Sigstore transparency entry: 1646800728
- Sigstore integration time:
-
Permalink:
Cyberfilo/promptquery@e206415f2c6aa0335646299aed78c29081df7539 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Cyberfilo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e206415f2c6aa0335646299aed78c29081df7539 -
Trigger Event:
push
-
Statement type:
File details
Details for the file promptquery-0.2.1-py3-none-any.whl.
File metadata
- Download URL: promptquery-0.2.1-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bf3c794b018ddf64945b44b3e576a4f21b2c4cf4bd877673e87833625f31fb1
|
|
| MD5 |
2783ac0ffe936d1b4c9ffb0b5c8124d0
|
|
| BLAKE2b-256 |
f9c768e8fc6a5976be982a6e614c91eb1e787dc843f557f28a2b98ad4c786351
|
Provenance
The following attestation bundles were made for promptquery-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on Cyberfilo/promptquery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptquery-0.2.1-py3-none-any.whl -
Subject digest:
9bf3c794b018ddf64945b44b3e576a4f21b2c4cf4bd877673e87833625f31fb1 - Sigstore transparency entry: 1646800794
- Sigstore integration time:
-
Permalink:
Cyberfilo/promptquery@e206415f2c6aa0335646299aed78c29081df7539 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Cyberfilo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e206415f2c6aa0335646299aed78c29081df7539 -
Trigger Event:
push
-
Statement type: