Deterministic, validation-first Text-to-SQL system for business databases
Project description
AetherDialect — validation-first Text-to-SQL for PostgreSQL and Databricks
aetherdialect turns analytical questions into read-only SELECT pipelines: a structured intent representation, multi-stage validation (including dialect EXPLAIN), template reuse from accepted answers, and bounded learning from rejections. The language model fills bounded slots in that intent; it does not author unconstrained SQL.
Why this exists
Teams need answers from relational data without shipping opaque generated SQL. AetherDialect targets analysts and integrators who want a repeatable path from question to result: the same question can return cached SQL with no model round-trip, schema drift surfaces as an explicit migration stop instead of silent breakage, and every generated statement is checked against the catalog and engine before it runs.
Install
pip install aetherdialect
pip install "aetherdialect[postgresql]"
pip install "aetherdialect[databricks]"
pip install "aetherdialect[postgresql,databricks]"
Requires Python 3.10 or newer. Configure the LLM and database via a TOML config_file (recommended) and/or process environment; the full key list lives in the API reference.
Quick start
from aetherdialect import SchemaContext, Text2SQL
t2s = Text2SQL(
SchemaContext(),
artifacts_dir="./my_run",
config_file="./aetherdialect.toml",
)
t2s.run_interactive()
run_interactive prompts once per invocation; call it again for another question. For programmatic UIs, use Text2SQL.session() or Text2SQL.asession() and drive SessionStep objects — see the Integrator guide.
dry_run_warmup exercises a newline-delimited seed question file through validation and execution without persisting templates; see the User guide — Seed warmup.
What makes this different
-
Constant-learning cache: exact
q_normreuse returns SQL with zero LLM calls; near-paraphrases (token Levenshtein at most 2) reuse the same template with one bounded LLM call that only extracts parameters. (How it works) -
Schema overrides are a JSON file you read, edit, and version. Every override (descriptions, roles, sensitivity, added or suppressed foreign keys, primary key endorsements) is replayed on every cache invalidation. (API reference)
-
Migration is never silent. When the catalog changes structurally, the engine writes a
schema_migration_map.jsonskeleton and stops. You decide the action; it resumes. (User guide) -
Generated SQL passes through four validation layers (intent JSON, dialect AST, schema/catalog alignment, dialect EXPLAIN). The LLM never emits raw SQL; it fills bounded slots in a structured intent IR. (Security)
-
Reader / writer split is built in. Many readers can ask questions; the engine drains
write_queue.jsonlat the start of every writer-mode turn under the artifacts lock so learning persists without readers touching the partitioned template store files. (Integrator guide)
Documentation
| Doc | When to read it |
|---|---|
| User guide | Install, first run, asking questions, notes, overrides, migration, seed warmup and dry-run warmup, pitfalls. |
| Integrator guide | Embedding patterns, sessions, multi-turn relay, threading, reader/writer split and queue, audit and diagnostics, cache reset. |
| API reference | Types, config_file TOML schema, methods, schema overrides JSON, diagnostic codes, exceptions. |
| How it works | Architecture diagrams, schema build, engine storage, question pipeline, migration, overrides, validation, learning model, configuration, observability, warmup/QSim, offline-mock design pointer. |
| Offline testing and mock LLM (design) | Planned mock LLM provider and fixture workflow for hermetic tests; links to dev_workspace/mock.txt (not on PyPI). |
| Security | Threat model, LLM context inventory, on-disk inventory, sensitivity model, deny lists, raw-SQL impossibility, EXPLAIN gate, network. |
| Support matrix | Per-engine table, IR-unsupported constructs and reformulations. |
License
See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aetherdialect-0.1.7.tar.gz.
File metadata
- Download URL: aetherdialect-0.1.7.tar.gz
- Upload date:
- Size: 958.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a58fafd3fe3972bf655ef19f56e4f2c1209ca6117778e8350938f4649154c2de
|
|
| MD5 |
a248328482b6e5dff6f02175b219eebb
|
|
| BLAKE2b-256 |
477cbc09f953bd39cb7acccce11ed9f5f5ed415237fc1c959ddc924eb4f3f3e1
|
File details
Details for the file aetherdialect-0.1.7-py3-none-any.whl.
File metadata
- Download URL: aetherdialect-0.1.7-py3-none-any.whl
- Upload date:
- Size: 659.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
328edc535d502a23084637db5a75ef403c6b1e7f8bb79e68e3065858c426f915
|
|
| MD5 |
87457f0eaa330ba7f6079275636dc0aa
|
|
| BLAKE2b-256 |
854a3186065bee1c9ebc247f7afe86555248801e010eb23998af43d418a012c2
|