AST-driven SQL pattern extraction and index recommendation engine.
Project description
AuraIndex
AuraIndex is an AST-driven index recommendation engine for databases. It statically scans application code, extracts SQL and ORM query patterns, parses them into a SQL AST, aggregates workload-style usage signals, and proposes guarded index recommendations.
Why this project
Traditional index tuning tools depend on live database telemetry. AuraIndex targets earlier design stages by mining query intent from source code:
- Language AST frontends extract query intent from code.
- A shared SQL AST backend normalizes patterns across sources.
- Heuristics generate index suggestions with explicit confidence and review flags.
This architecture makes it practical to support multiple languages over time while keeping one recommendation core.
Current scope
- Tree-sitter WASM extraction for Python, JavaScript/TypeScript, and PHP
- raw SQL call detection (
execute,executemany,query,raw,text) - SQL-like query builder chain lifting (
where/filter/join/order/grouppatterns)
- raw SQL call detection (
- SQL file extraction (
.sql) - SQL AST analysis with
sqlglot - Index heuristics:
- single-column filter keys
- join keys
- composite filter keys
- where+order composite patterns
- Impact model (enabled by default):
- read benefit vs write amplification scoring
- net impact adjustment on suggestion ranking
- Schema-aware checks (enabled by default):
- existing index detection from SQL DDL (
CREATE INDEX) - exact/prefix overlap detection (including unique/partial overlaps)
- automatic suppression of already-covered suggestions
- existing index detection from SQL DDL (
- Migration artifact generation (enabled by default):
- up/down SQL script for suggested indexes
- Optional EXPLAIN validation:
- sqlite built-in runner (
EXPLAIN QUERY PLAN) - custom external command runner for postgres/mysql
- sqlite built-in runner (
- Engine-aware index type hints (enabled by default):
- btree / hash / brin / gin heuristics
- typed DDL variant rendering (when applicable)
- Schema drift watch (enabled by default):
- possibly unused existing index candidates
- missing index candidate summary
- Query optimization advisories (enabled by default):
SELECT *detectionORDER BYwithoutLIMIT- leading wildcard
LIKEwarnings
- Guardrails:
- deduplication
- confidence scoring
- per-table cap on suggestions
- manual-review flags
- Unified Tree-sitter WASM frontend (
web-tree-sitter) for supported languages
Architecture
-
Extractors (language-specific frontend)
WasmTreeSitterExtractorinvokes a Node script using Tree-sitter WASM for Python/JS/TS/PHP.extract_sql_fileloads SQL statements directly from.sqlfiles.
-
SQL AST analyzer (shared backend)
- Parses candidate SQL with
sqlglot. - Extracts tables, predicate columns, join columns, ordering/grouping columns.
- Builds query-level feature records.
- Parses candidate SQL with
-
Workload aggregation
- Counts column participation by role (
WHERE,JOIN,ORDER BY,GROUP BY). - Tracks composite filter patterns and where+order pair frequency.
- Counts column participation by role (
-
Index suggestion engine
- Applies heuristic scoring with confidence.
- Applies impact scoring (read gain vs write cost).
- Performs schema-aware overlap checks against discovered existing indexes.
- Emits dialect-aware
CREATE INDEXstatements. - Adds ORM snippet equivalents for SQLAlchemy and Django.
Installation
cd AuraIndex
python -m venv .venv
.venv\Scripts\activate
pip install -e .[dev]
AuraIndex wheels now bundle WASM runtime + grammar assets (Python/JS/TS/TSX/PHP), so npm setup is not required for normal package usage.
Optional (source-development override of local grammars/runtime):
npm install web-tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-php
For live database introspection (Postgres/MySQL drivers):
pip install -e .[dev,db]
pip install AuraIndex
Quick start
auraindex --path . --dialect postgres --min-support 2 --show-top
Outputs:
- By default, AuraIndex writes reports to a temp folder:
%TEMP%\auraindex-report-* - On completion, AuraIndex automatically opens the interactive HTML report in your default browser.
- Artifacts:
auraindex_report.json: extraction, parsing, workload, and suggestion detailsauraindex_indexes.sql: executable SQL index script (DDL = Data Definition Language)auraindex_migration.sql: up/down migration SQL (create + rollback)auraindex_orm_snippets.md: ORM equivalentsauraindex_report.md: detailed human-readable report with ranking and evidenceauraindex_report.html: interactive report with filters, copy buttons, and evidence drill-down
Ranked suggestions column meaning
In HTML/Markdown ranked tables:
#: final rank after scoring and enabled adjustments.Index: suggested index name.Table: target table.Columns: indexed columns in order (critical for composite indexes).Kind: suggestion class (filter_key,join_key,composite_filter,where_order_composite, etc.).Score: ranking priority signal.Confidence: strength of static evidence.Manual:Yesmeans human review is recommended before rollout.
HTML filter usage
The interactive HTML report supports:
- free-text search
- token filters:
table:<name>,column:<name>,kind:<type>,manual:yes|no - quick filter chips and filter suggestions in the search input
Useful CLI flags
--report-dir D:\reports\scan-01: write outputs to a fixed directory.--json-out custom.json,--ddl-out custom.sql,--migration-out migration.sql,--orm-out custom.md,--md-out details.md,--html-out ui.html--show-top: print ranked recommendations in terminal.--limit-suggestions 25: optionally limit ranked suggestions shown in markdown/HTML/terminal; by default AuraIndex shows all suggestions.--strict-parse: exit code2when parse failures are found.--disable-wasm: scan only.sqlfiles (debug mode).--disable-impact-scoring: turn off net impact rank adjustment.--disable-schema-checks: turn off existing-index overlap suppression.--disable-index-type-hints: turn off engine-aware index type recommendation metadata.--disable-drift-watch: turn off schema drift watch output.--disable-query-advisories: turn off query optimization advisories.--max-query-advisories 150: cap advisory rows in report metadata.--enable-db-introspection: enable live DB schema/index introspection.--db-dialect postgres|mysql|sqlite: live DB dialect (defaults to--dialectif omitted).--db-host,--db-port,--db-user,--db-password/--db-password-env,--db-name--db-path(sqlite),--db-schema(postgres schema filter),--db-timeout-seconds--disable-migration-output: skip writing migration SQL artifact.--explain-validate --explain-db-path D:\db.sqlite: run sqlite EXPLAIN validation.--explain-validate --explain-command "psql ... -c \"EXPLAIN {sql}\"": run custom EXPLAIN command.
Use help anytime:
auraindex --help
WASM runtime assets
npm install web-tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-php
auraindex --path .
AuraIndex bundles default WASM grammar/runtime assets in the wheel.
It can also auto-detect external grammars from local node_modules, and supports AURAINDEX_WASM_GRAMMAR_DIR.
If WASM grammars are unavailable, AuraIndex now degrades gracefully:
- Python files are still scanned using a built-in Python AST fallback.
- SQL files continue to be scanned normally.
- JS/TS/PHP source files are scanned with a built-in text fallback extractor (raw SQL + query-builder pattern lifting) if WASM is unavailable.
Default excluded directories include generated assets and build outputs:
.git,node_modules,__pycache__,.venv,venvbuild,dist,staticfiles,.next
This avoids parse noise from minified vendor bundles in production/static folders.
Cardinality hints (optional)
You can provide a JSON file to bias recommendations:
{
"users.email": "high",
"orders.status": "low"
}
Use:
auraindex --path . --cardinality-hints cardinality_hints.json
Hints are guardrails only; they do not replace runtime telemetry.
Example report fields
auraindex_report.json includes:
summary: scanned files, candidate count, parse success/failures, suggestion countcandidates: extracted SQL withfile,line,function,source_kindworkload: per-table and per-column usage metricssuggestions: index DDL, confidence, evidence, impact metadata, schema checks, and manual-review flagsdetailed_index_evidence: per-index support counts, matched query excerpts, and source locations
How confidence is computed
Confidence is a bounded heuristic score in [0.50, 0.98], not a probabilistic guarantee.
- Join-key suggestions start higher (
~0.80) and rise with repeated JOIN evidence. - Filter-key suggestions start lower (
~0.74) and rise with repeated WHERE evidence. - Composite suggestions start higher (
~0.82-0.84) because repeated multi-column patterns are stronger signals. - Cardinality hints can nudge confidence (
highraises,lowlowers). - Manual-review flag is set for lower-confidence suggestions and wider composites.
In short: more repeated, structurally strong evidence => higher confidence.
What --min-support means
--min-support is the minimum repetition threshold before AuraIndex recommends an index.
Example:
--min-support 1: include one-off patterns (higher recall, more noise)--min-support 3: include only repeated patterns (cleaner, more conservative)
Use lower values for exploration, higher values for production-oriented recommendations.
Composite index handling
AuraIndex explicitly models composite opportunities and prunes redundant suggestions:
- Detects repeated multi-column WHERE patterns
- Detects WHERE + ORDER BY patterns for left-prefix composites
- Suppresses redundant single-column suggestions when a stronger composite prefix exists
- Limits per-table output via
--max-indexes-per-table
Impact scoring model (default ON)
AuraIndex estimates whether an index is likely to be worth it:
- Read benefit: weighted by WHERE/JOIN/ORDER/GROUP usage and support evidence.
- Write cost: estimated from table write ratio and index width.
- Net impact = read benefit - write cost.
This net impact adjusts ranking (score_adjustment) and is shown in JSON/Markdown/HTML reports.
Disable with --disable-impact-scoring when you want raw heuristic-only ranking.
Schema-aware checks (default ON)
AuraIndex scans discovered SQL DDL and extracts existing indexes, then checks each suggestion for:
- exact match (
skip_exact_existing) - covered by existing left-prefix (
skip_covered_by_existing_prefix) - narrower overlap requiring review (
review_overlaps_existing_prefix) - unique/partial overlap signals
Suggestions already covered by existing indexes are skipped by default to reduce noise.
Disable with --disable-schema-checks if you want full raw suggestions.
With --enable-db-introspection, AuraIndex also pulls existing indexes directly from live DB metadata:
- PostgreSQL:
pg_indexes - MySQL:
INFORMATION_SCHEMA.STATISTICS - SQLite:
PRAGMA index_list+PRAGMA index_info
This significantly improves practical overlap detection when schema is not fully represented in code migrations.
Migration SQL (up/down)
AuraIndex emits auraindex_migration.sql with:
- Up: suggested
CREATE INDEX ... - Down: corresponding rollback (
DROP INDEX ...)
Disable file generation with --disable-migration-output.
Optional EXPLAIN validation
AuraIndex can annotate suggestions with execution-plan checks:
- SQLite mode:
--explain-validate --explain-db-path <path> - External mode:
--explain-validate --explain-command "<command with {sql}>"
Results are included in report metadata and rendered in Markdown/HTML.
Engine-aware index type hints (default ON)
AuraIndex recommends an index access method per suggestion (dialect-aware):
btreeas baseline defaulthashfor equality-heavy single-column patterns (low write ratio)brinfor time-correlated columns in append-style workloadsginfor JSON/metadata-like columnsivfflathint for vector-like columns (manual operator-class tuning required)
The report includes rationale and typed DDL variant when safely renderable.
Disable with --disable-index-type-hints.
Schema drift watch (default ON)
AuraIndex emits drift-oriented metadata:
- possibly unused existing indexes (non-unique indexes with no observed column usage)
- missing index candidates from current scan
Rendered in Markdown/HTML under Schema Drift Watch.
Disable with --disable-drift-watch.
When live DB introspection is enabled, drift watch uses live index inventory as the primary signal.
Query optimization advisories (default ON)
AuraIndex includes static query-level improvement hints:
SELECT *projection warningORDER BYwithoutLIMITwarning- leading wildcard
LIKE '%x'warning
Rendered in Markdown/HTML under Query Optimization Advisories.
Disable with --disable-query-advisories.
Latest end-to-end validation snapshot
Run command:
auraindex --path D:\SuccessPilot --dialect postgres --min-support 2 --report-dir D:\AutoCure\AuraIndex\reports\successpilot-strong-adds
Observed output snapshot (latest run):
- scanned files:
1002 - query candidates:
62 - parsed queries:
62 - parse failures:
0 - suggested indexes:
17 - schema checks: existing index discovery and overlap checks enabled by default
Generated artifacts:
auraindex_report.jsonauraindex_indexes.sqlauraindex_migration.sqlauraindex_orm_snippets.mdauraindex_report.mdauraindex_report.html
Live DB mode examples:
# PostgreSQL
auraindex --path . --dialect postgres --enable-db-introspection --db-dialect postgres --db-host localhost --db-port 5432 --db-user app --db-password-env DB_PASSWORD --db-name appdb --db-schema public
# MySQL
auraindex --path . --dialect mysql --enable-db-introspection --db-dialect mysql --db-host localhost --db-port 3306 --db-user app --db-password-env DB_PASSWORD --db-name appdb
# SQLite
auraindex --path . --dialect sqlite --enable-db-introspection --db-dialect sqlite --db-path D:\data\app.db
Per-service verification (SuccessPilot microservices):
ai-worker:fail=0analytics:fail=0backend:fail=0(after excludingstaticfilesgenerated assets)frontend:fail=0(after excludingbuildartifacts)frontend_new:fail=0live-updates:fail=0onboarding:fail=0
Limitations
- Static analysis cannot observe runtime frequency, parameter distributions, or write pressure.
- ORM-to-SQL reconstruction is approximate for complex query builders.
- EXPLAIN validation is optional and environment-dependent (DB connectivity and compatible SQL needed).
- Suggestions are intentionally conservative and should still be reviewed before production rollout.
Testing
pytest -q
Build and publish (PyPI)
python -m pip install -U build twine
python -m build
twine check dist/*
# twine upload dist/*
Release checklist:
- Package name available on PyPI.
- Version bumped (
pyproject.toml). - Long description renders (
twine checkpasses). LICENSEandREADME.mdincluded in source distribution.
Roadmap
- Add richer ORM semantic lifting (SQLAlchemy Core/ORM constructs).
- Add log-based dynamic signal fusion.
- Add adaptive suggestion ranking using execution metrics.
- Expand language frontends while retaining the shared SQL AST backend.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auraindex-0.1.4.tar.gz.
File metadata
- Download URL: auraindex-0.1.4.tar.gz
- Upload date:
- Size: 766.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cce508c357482486f7954d79e3133b351aee10fbb46d6c5a23985a554fe516ae
|
|
| MD5 |
1469ab9ac30ade19e47b115260909113
|
|
| BLAKE2b-256 |
c0fee6a9375b5eb7ddf5409893c99613a348d990fd64d6f2e34fb3e6b7f2a859
|
File details
Details for the file auraindex-0.1.4-py3-none-any.whl.
File metadata
- Download URL: auraindex-0.1.4-py3-none-any.whl
- Upload date:
- Size: 765.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdb2f7de4a0c0fe64ee63f034c620a339502032b2d1a8b71828e7b66a50dc965
|
|
| MD5 |
d8f41065c237d99e989cd6e892c86702
|
|
| BLAKE2b-256 |
64a904d5cdf307f29cb59e7b456e8c586c610de05fa04b2a557b4530d1866797
|