Skip to main content

AST-driven SQL pattern extraction and index recommendation engine.

Project description

AuraIndex

AuraIndex is an AST-driven index recommendation engine for databases. It statically scans application code, extracts SQL and ORM query patterns, parses them into a SQL AST, aggregates workload-style usage signals, and proposes guarded index recommendations.

Why this project

Traditional index tuning tools depend on live database telemetry. AuraIndex targets earlier design stages by mining query intent from source code:

  • Language AST frontends extract query intent from code.
  • A shared SQL AST backend normalizes patterns across sources.
  • Heuristics generate index suggestions with explicit confidence and review flags.

This architecture makes it practical to support multiple languages over time while keeping one recommendation core.

Current scope

  • Tree-sitter WASM extraction for Python, JavaScript/TypeScript, and PHP
    • raw SQL call detection (execute, executemany, query, raw, text)
    • SQL-like query builder chain lifting (where/filter/join/order/group patterns)
  • SQL file extraction (.sql)
  • SQL AST analysis with sqlglot
  • Index heuristics:
    • single-column filter keys
    • join keys
    • composite filter keys
    • where+order composite patterns
  • Impact model (enabled by default):
    • read benefit vs write amplification scoring
    • net impact adjustment on suggestion ranking
  • Schema-aware checks (enabled by default):
    • existing index detection from SQL DDL (CREATE INDEX)
    • exact/prefix overlap detection (including unique/partial overlaps)
    • automatic suppression of already-covered suggestions
  • Migration artifact generation (enabled by default):
    • up/down SQL script for suggested indexes
  • Optional EXPLAIN validation:
    • sqlite built-in runner (EXPLAIN QUERY PLAN)
    • custom external command runner for postgres/mysql
  • Engine-aware index type hints (enabled by default):
    • btree / hash / brin / gin heuristics
    • typed DDL variant rendering (when applicable)
  • Schema drift watch (enabled by default):
    • possibly unused existing index candidates
    • missing index candidate summary
  • Query optimization advisories (enabled by default):
    • SELECT * detection
    • ORDER BY without LIMIT
    • leading wildcard LIKE warnings
  • Guardrails:
    • deduplication
    • confidence scoring
    • per-table cap on suggestions
    • manual-review flags
  • Unified Tree-sitter WASM frontend (web-tree-sitter) for supported languages

Architecture

  1. Extractors (language-specific frontend)

    • WasmTreeSitterExtractor invokes a Node script using Tree-sitter WASM for Python/JS/TS/PHP.
    • extract_sql_file loads SQL statements directly from .sql files.
  2. SQL AST analyzer (shared backend)

    • Parses candidate SQL with sqlglot.
    • Extracts tables, predicate columns, join columns, ordering/grouping columns.
    • Builds query-level feature records.
  3. Workload aggregation

    • Counts column participation by role (WHERE, JOIN, ORDER BY, GROUP BY).
    • Tracks composite filter patterns and where+order pair frequency.
  4. Index suggestion engine

    • Applies heuristic scoring with confidence.
    • Applies impact scoring (read gain vs write cost).
    • Performs schema-aware overlap checks against discovered existing indexes.
    • Emits dialect-aware CREATE INDEX statements.
    • Adds ORM snippet equivalents for SQLAlchemy and Django.

Installation

cd AuraIndex
python -m venv .venv
.venv\Scripts\activate
pip install -e .[dev]

AuraIndex wheels now bundle WASM runtime + grammar assets (Python/JS/TS/TSX/PHP), so npm setup is not required for normal package usage.

Optional (source-development override of local grammars/runtime):

npm install web-tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-php

For live database introspection (Postgres/MySQL drivers):

pip install -e .[dev,db]
pip install AuraIndex

Quick start

auraindex --path . --dialect postgres --min-support 2 --show-top

Outputs:

  • By default, AuraIndex writes reports to a temp folder: %TEMP%\auraindex-report-*
  • On completion, AuraIndex automatically opens the interactive HTML report in your default browser.
  • Artifacts:
    • auraindex_report.json: extraction, parsing, workload, and suggestion details
    • auraindex_indexes.sql: executable SQL index script (DDL = Data Definition Language)
    • auraindex_migration.sql: up/down migration SQL (create + rollback)
    • auraindex_orm_snippets.md: ORM equivalents
    • auraindex_report.md: detailed human-readable report with ranking and evidence
    • auraindex_report.html: interactive report with filters, copy buttons, and evidence drill-down

Ranked suggestions column meaning

In HTML/Markdown ranked tables:

  • #: final rank after scoring and enabled adjustments.
  • Index: suggested index name.
  • Table: target table.
  • Columns: indexed columns in order (critical for composite indexes).
  • Kind: suggestion class (filter_key, join_key, composite_filter, where_order_composite, etc.).
  • Score: ranking priority signal.
  • Confidence: strength of static evidence.
  • Manual: Yes means human review is recommended before rollout.

HTML filter usage

The interactive HTML report supports:

  • free-text search
  • token filters: table:<name>, column:<name>, kind:<type>, manual:yes|no
  • quick filter chips and filter suggestions in the search input

Useful CLI flags

  • --report-dir D:\reports\scan-01: write outputs to a fixed directory.
  • --json-out custom.json, --ddl-out custom.sql, --migration-out migration.sql, --orm-out custom.md, --md-out details.md, --html-out ui.html
  • --show-top: print ranked recommendations in terminal.
  • --limit-suggestions 25: optionally limit ranked suggestions shown in markdown/HTML/terminal; by default AuraIndex shows all suggestions.
  • --strict-parse: exit code 2 when parse failures are found.
  • --disable-wasm: scan only .sql files (debug mode).
  • --disable-impact-scoring: turn off net impact rank adjustment.
  • --disable-schema-checks: turn off existing-index overlap suppression.
  • --disable-index-type-hints: turn off engine-aware index type recommendation metadata.
  • --disable-drift-watch: turn off schema drift watch output.
  • --disable-query-advisories: turn off query optimization advisories.
  • --max-query-advisories 150: cap advisory rows in report metadata.
  • --enable-db-introspection: enable live DB schema/index introspection.
  • --db-dialect postgres|mysql|sqlite: live DB dialect (defaults to --dialect if omitted).
  • --db-host, --db-port, --db-user, --db-password / --db-password-env, --db-name
  • --db-path (sqlite), --db-schema (postgres schema filter), --db-timeout-seconds
  • --disable-migration-output: skip writing migration SQL artifact.
  • --explain-validate --explain-db-path D:\db.sqlite: run sqlite EXPLAIN validation.
  • --explain-validate --explain-command "psql ... -c \"EXPLAIN {sql}\"": run custom EXPLAIN command.

Use help anytime:

auraindex --help

WASM runtime assets

npm install web-tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-php
auraindex --path .

AuraIndex bundles default WASM grammar/runtime assets in the wheel. It can also auto-detect external grammars from local node_modules, and supports AURAINDEX_WASM_GRAMMAR_DIR.

If WASM grammars are unavailable, AuraIndex now degrades gracefully:

  • Python files are still scanned using a built-in Python AST fallback.
  • SQL files continue to be scanned normally.
  • JS/TS/PHP source files are scanned with a built-in text fallback extractor (raw SQL + query-builder pattern lifting) if WASM is unavailable.

Default excluded directories include generated assets and build outputs:

  • .git, node_modules, __pycache__, .venv, venv
  • build, dist, staticfiles, .next

This avoids parse noise from minified vendor bundles in production/static folders.

Cardinality hints (optional)

You can provide a JSON file to bias recommendations:

{
  "users.email": "high",
  "orders.status": "low"
}

Use:

auraindex --path . --cardinality-hints cardinality_hints.json

Hints are guardrails only; they do not replace runtime telemetry.

Example report fields

auraindex_report.json includes:

  • summary: scanned files, candidate count, parse success/failures, suggestion count
  • candidates: extracted SQL with file, line, function, source_kind
  • workload: per-table and per-column usage metrics
  • suggestions: index DDL, confidence, evidence, impact metadata, schema checks, and manual-review flags
  • detailed_index_evidence: per-index support counts, matched query excerpts, and source locations

How confidence is computed

Confidence is a bounded heuristic score in [0.50, 0.98], not a probabilistic guarantee.

  • Join-key suggestions start higher (~0.80) and rise with repeated JOIN evidence.
  • Filter-key suggestions start lower (~0.74) and rise with repeated WHERE evidence.
  • Composite suggestions start higher (~0.82-0.84) because repeated multi-column patterns are stronger signals.
  • Cardinality hints can nudge confidence (high raises, low lowers).
  • Manual-review flag is set for lower-confidence suggestions and wider composites.

In short: more repeated, structurally strong evidence => higher confidence.

What --min-support means

--min-support is the minimum repetition threshold before AuraIndex recommends an index.

Example:

  • --min-support 1: include one-off patterns (higher recall, more noise)
  • --min-support 3: include only repeated patterns (cleaner, more conservative)

Use lower values for exploration, higher values for production-oriented recommendations.

Composite index handling

AuraIndex explicitly models composite opportunities and prunes redundant suggestions:

  • Detects repeated multi-column WHERE patterns
  • Detects WHERE + ORDER BY patterns for left-prefix composites
  • Suppresses redundant single-column suggestions when a stronger composite prefix exists
  • Limits per-table output via --max-indexes-per-table

Impact scoring model (default ON)

AuraIndex estimates whether an index is likely to be worth it:

  • Read benefit: weighted by WHERE/JOIN/ORDER/GROUP usage and support evidence.
  • Write cost: estimated from table write ratio and index width.
  • Net impact = read benefit - write cost.

This net impact adjusts ranking (score_adjustment) and is shown in JSON/Markdown/HTML reports.

Disable with --disable-impact-scoring when you want raw heuristic-only ranking.

Schema-aware checks (default ON)

AuraIndex scans discovered SQL DDL and extracts existing indexes, then checks each suggestion for:

  • exact match (skip_exact_existing)
  • covered by existing left-prefix (skip_covered_by_existing_prefix)
  • narrower overlap requiring review (review_overlaps_existing_prefix)
  • unique/partial overlap signals

Suggestions already covered by existing indexes are skipped by default to reduce noise.

Disable with --disable-schema-checks if you want full raw suggestions.

With --enable-db-introspection, AuraIndex also pulls existing indexes directly from live DB metadata:

  • PostgreSQL: pg_indexes
  • MySQL: INFORMATION_SCHEMA.STATISTICS
  • SQLite: PRAGMA index_list + PRAGMA index_info

This significantly improves practical overlap detection when schema is not fully represented in code migrations.

Migration SQL (up/down)

AuraIndex emits auraindex_migration.sql with:

  • Up: suggested CREATE INDEX ...
  • Down: corresponding rollback (DROP INDEX ...)

Disable file generation with --disable-migration-output.

Optional EXPLAIN validation

AuraIndex can annotate suggestions with execution-plan checks:

  • SQLite mode: --explain-validate --explain-db-path <path>
  • External mode: --explain-validate --explain-command "<command with {sql}>"

Results are included in report metadata and rendered in Markdown/HTML.

Engine-aware index type hints (default ON)

AuraIndex recommends an index access method per suggestion (dialect-aware):

  • btree as baseline default
  • hash for equality-heavy single-column patterns (low write ratio)
  • brin for time-correlated columns in append-style workloads
  • gin for JSON/metadata-like columns
  • ivfflat hint for vector-like columns (manual operator-class tuning required)

The report includes rationale and typed DDL variant when safely renderable.

Disable with --disable-index-type-hints.

Schema drift watch (default ON)

AuraIndex emits drift-oriented metadata:

  • possibly unused existing indexes (non-unique indexes with no observed column usage)
  • missing index candidates from current scan

Rendered in Markdown/HTML under Schema Drift Watch.

Disable with --disable-drift-watch.

When live DB introspection is enabled, drift watch uses live index inventory as the primary signal.

Query optimization advisories (default ON)

AuraIndex includes static query-level improvement hints:

  • SELECT * projection warning
  • ORDER BY without LIMIT warning
  • leading wildcard LIKE '%x' warning

Rendered in Markdown/HTML under Query Optimization Advisories.

Disable with --disable-query-advisories.

Latest end-to-end validation snapshot

Run command:

auraindex --path D:\SuccessPilot --dialect postgres --min-support 2 --report-dir D:\AutoCure\AuraIndex\reports\successpilot-strong-adds

Observed output snapshot (latest run):

  • scanned files: 1002
  • query candidates: 62
  • parsed queries: 62
  • parse failures: 0
  • suggested indexes: 17
  • schema checks: existing index discovery and overlap checks enabled by default

Generated artifacts:

  • auraindex_report.json
  • auraindex_indexes.sql
  • auraindex_migration.sql
  • auraindex_orm_snippets.md
  • auraindex_report.md
  • auraindex_report.html

Live DB mode examples:

# PostgreSQL
auraindex --path . --dialect postgres --enable-db-introspection --db-dialect postgres --db-host localhost --db-port 5432 --db-user app --db-password-env DB_PASSWORD --db-name appdb --db-schema public

# MySQL
auraindex --path . --dialect mysql --enable-db-introspection --db-dialect mysql --db-host localhost --db-port 3306 --db-user app --db-password-env DB_PASSWORD --db-name appdb

# SQLite
auraindex --path . --dialect sqlite --enable-db-introspection --db-dialect sqlite --db-path D:\data\app.db

Per-service verification (SuccessPilot microservices):

  • ai-worker: fail=0
  • analytics: fail=0
  • backend: fail=0 (after excluding staticfiles generated assets)
  • frontend: fail=0 (after excluding build artifacts)
  • frontend_new: fail=0
  • live-updates: fail=0
  • onboarding: fail=0

Limitations

  • Static analysis cannot observe runtime frequency, parameter distributions, or write pressure.
  • ORM-to-SQL reconstruction is approximate for complex query builders.
  • EXPLAIN validation is optional and environment-dependent (DB connectivity and compatible SQL needed).
  • Suggestions are intentionally conservative and should still be reviewed before production rollout.

Testing

pytest -q

Build and publish (PyPI)

python -m pip install -U build twine
python -m build
twine check dist/*
# twine upload dist/*

Release checklist:

  • Package name available on PyPI.
  • Version bumped (pyproject.toml).
  • Long description renders (twine check passes).
  • LICENSE and README.md included in source distribution.

Roadmap

  • Add richer ORM semantic lifting (SQLAlchemy Core/ORM constructs).
  • Add log-based dynamic signal fusion.
  • Add adaptive suggestion ranking using execution metrics.
  • Expand language frontends while retaining the shared SQL AST backend.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auraindex-0.1.4.tar.gz (766.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auraindex-0.1.4-py3-none-any.whl (765.3 kB view details)

Uploaded Python 3

File details

Details for the file auraindex-0.1.4.tar.gz.

File metadata

  • Download URL: auraindex-0.1.4.tar.gz
  • Upload date:
  • Size: 766.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for auraindex-0.1.4.tar.gz
Algorithm Hash digest
SHA256 cce508c357482486f7954d79e3133b351aee10fbb46d6c5a23985a554fe516ae
MD5 1469ab9ac30ade19e47b115260909113
BLAKE2b-256 c0fee6a9375b5eb7ddf5409893c99613a348d990fd64d6f2e34fb3e6b7f2a859

See more details on using hashes here.

File details

Details for the file auraindex-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: auraindex-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 765.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for auraindex-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fdb2f7de4a0c0fe64ee63f034c620a339502032b2d1a8b71828e7b66a50dc965
MD5 d8f41065c237d99e989cd6e892c86702
BLAKE2b-256 64a904d5cdf307f29cb59e7b456e8c586c610de05fa04b2a557b4530d1866797

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page