Skip to main content

AST-driven SQL pattern extraction and index recommendation engine.

Project description

AuraIndex

AuraIndex is an AST-driven index recommendation engine for databases. It statically scans application code, extracts SQL and ORM query patterns, parses them into a SQL AST, aggregates workload-style usage signals, and proposes guarded index recommendations.

Why this project

Traditional index tuning tools depend on live database telemetry. AuraIndex targets earlier design stages by mining query intent from source code:

  • Language AST frontends extract query intent from code.
  • A shared SQL AST backend normalizes patterns across sources.
  • Heuristics generate index suggestions with explicit confidence and review flags.

This architecture makes it practical to support multiple languages over time while keeping one recommendation core.

Current scope

  • Tree-sitter WASM extraction for Python, JavaScript/TypeScript, and PHP
    • raw SQL call detection (execute, executemany, query, raw, text)
    • SQL-like query builder chain lifting (where/filter/join/order/group patterns)
  • SQL file extraction (.sql)
  • SQL AST analysis with sqlglot
  • Index heuristics:
    • single-column filter keys
    • join keys
    • composite filter keys
    • where+order composite patterns
  • Impact model (enabled by default):
    • read benefit vs write amplification scoring
    • net impact adjustment on suggestion ranking
  • Schema-aware checks (enabled by default):
    • existing index detection from SQL DDL (CREATE INDEX)
    • exact/prefix overlap detection (including unique/partial overlaps)
    • automatic suppression of already-covered suggestions
  • Migration artifact generation (enabled by default):
    • up/down SQL script for suggested indexes
  • Optional EXPLAIN validation:
    • sqlite built-in runner (EXPLAIN QUERY PLAN)
    • custom external command runner for postgres/mysql
  • Engine-aware index type hints (enabled by default):
    • btree / hash / brin / gin heuristics
    • typed DDL variant rendering (when applicable)
  • Schema drift watch (enabled by default):
    • possibly unused existing index candidates
    • missing index candidate summary
  • Query optimization advisories (enabled by default):
    • SELECT * detection
    • ORDER BY without LIMIT
    • leading wildcard LIKE warnings
  • Guardrails:
    • deduplication
    • confidence scoring
    • per-table cap on suggestions
    • manual-review flags
  • Unified Tree-sitter WASM frontend (web-tree-sitter) for supported languages

Architecture

  1. Extractors (language-specific frontend)

    • WasmTreeSitterExtractor invokes a Node script using Tree-sitter WASM for Python/JS/TS/PHP.
    • extract_sql_file loads SQL statements directly from .sql files.
  2. SQL AST analyzer (shared backend)

    • Parses candidate SQL with sqlglot.
    • Extracts tables, predicate columns, join columns, ordering/grouping columns.
    • Builds query-level feature records.
  3. Workload aggregation

    • Counts column participation by role (WHERE, JOIN, ORDER BY, GROUP BY).
    • Tracks composite filter patterns and where+order pair frequency.
  4. Index suggestion engine

    • Applies heuristic scoring with confidence.
    • Applies impact scoring (read gain vs write cost).
    • Performs schema-aware overlap checks against discovered existing indexes.
    • Emits dialect-aware CREATE INDEX statements.
    • Adds ORM snippet equivalents for SQLAlchemy and Django.

Installation

cd AuraIndex
python -m venv .venv
.venv\Scripts\activate
pip install -e .[dev]
npm install

For live database introspection (Postgres/MySQL drivers):

pip install -e .[dev,db]

After publishing to PyPI, install the package with:

pip install AuraIndex

Quick start

auraindex --path . --dialect postgres --min-support 2 --show-top

Outputs:

  • By default, AuraIndex writes reports to a temp folder: %TEMP%\auraindex-report-*
  • On completion, AuraIndex automatically opens the interactive HTML report in your default browser.
  • Artifacts:
    • auraindex_report.json: extraction, parsing, workload, and suggestion details
    • auraindex_indexes.sql: executable SQL index script (DDL = Data Definition Language)
    • auraindex_migration.sql: up/down migration SQL (create + rollback)
    • auraindex_orm_snippets.md: ORM equivalents
    • auraindex_report.md: detailed human-readable report with ranking and evidence
    • auraindex_report.html: interactive report with filters, copy buttons, and evidence drill-down

Useful CLI flags

  • --report-dir D:\reports\scan-01: write outputs to a fixed directory.
  • --json-out custom.json, --ddl-out custom.sql, --migration-out migration.sql, --orm-out custom.md, --md-out details.md, --html-out ui.html
  • --show-top: print ranked recommendations in terminal.
  • --limit-suggestions 25: optionally limit ranked suggestions shown in markdown/HTML/terminal; by default AuraIndex shows all suggestions.
  • --strict-parse: exit code 2 when parse failures are found.
  • --disable-wasm: scan only .sql files (debug mode).
  • --disable-impact-scoring: turn off net impact rank adjustment.
  • --disable-schema-checks: turn off existing-index overlap suppression.
  • --disable-index-type-hints: turn off engine-aware index type recommendation metadata.
  • --disable-drift-watch: turn off schema drift watch output.
  • --disable-query-advisories: turn off query optimization advisories.
  • --max-query-advisories 150: cap advisory rows in report metadata.
  • --enable-db-introspection: enable live DB schema/index introspection.
  • --db-dialect postgres|mysql|sqlite: live DB dialect (defaults to --dialect if omitted).
  • --db-host, --db-port, --db-user, --db-password / --db-password-env, --db-name
  • --db-path (sqlite), --db-schema (postgres schema filter), --db-timeout-seconds
  • --disable-migration-output: skip writing migration SQL artifact.
  • --explain-validate --explain-db-path D:\db.sqlite: run sqlite EXPLAIN validation.
  • --explain-validate --explain-command "psql ... -c \"EXPLAIN {sql}\"": run custom EXPLAIN command.

Use help anytime:

auraindex --help

WASM grammar files (required for source-code scanning)

npm install tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-php
auraindex --path .

AuraIndex auto-detects grammars from local node_modules.

Default excluded directories include generated assets and build outputs:

  • .git, node_modules, __pycache__, .venv, venv
  • build, dist, staticfiles, .next

This avoids parse noise from minified vendor bundles in production/static folders.

Cardinality hints (optional)

You can provide a JSON file to bias recommendations:

{
  "users.email": "high",
  "orders.status": "low"
}

Use:

auraindex --path . --cardinality-hints cardinality_hints.json

Hints are guardrails only; they do not replace runtime telemetry.

Example report fields

auraindex_report.json includes:

  • summary: scanned files, candidate count, parse success/failures, suggestion count
  • candidates: extracted SQL with file, line, function, source_kind
  • workload: per-table and per-column usage metrics
  • suggestions: index DDL, confidence, evidence, impact metadata, schema checks, and manual-review flags
  • detailed_index_evidence: per-index support counts, matched query excerpts, and source locations

How confidence is computed

Confidence is a bounded heuristic score in [0.50, 0.98], not a probabilistic guarantee.

  • Join-key suggestions start higher (~0.80) and rise with repeated JOIN evidence.
  • Filter-key suggestions start lower (~0.74) and rise with repeated WHERE evidence.
  • Composite suggestions start higher (~0.82-0.84) because repeated multi-column patterns are stronger signals.
  • Cardinality hints can nudge confidence (high raises, low lowers).
  • Manual-review flag is set for lower-confidence suggestions and wider composites.

In short: more repeated, structurally strong evidence => higher confidence.

What --min-support means

--min-support is the minimum repetition threshold before AuraIndex recommends an index.

Example:

  • --min-support 1: include one-off patterns (higher recall, more noise)
  • --min-support 3: include only repeated patterns (cleaner, more conservative)

Use lower values for exploration, higher values for production-oriented recommendations.

Composite index handling

AuraIndex explicitly models composite opportunities and prunes redundant suggestions:

  • Detects repeated multi-column WHERE patterns
  • Detects WHERE + ORDER BY patterns for left-prefix composites
  • Suppresses redundant single-column suggestions when a stronger composite prefix exists
  • Limits per-table output via --max-indexes-per-table

Impact scoring model (default ON)

AuraIndex estimates whether an index is likely to be worth it:

  • Read benefit: weighted by WHERE/JOIN/ORDER/GROUP usage and support evidence.
  • Write cost: estimated from table write ratio and index width.
  • Net impact = read benefit - write cost.

This net impact adjusts ranking (score_adjustment) and is shown in JSON/Markdown/HTML reports.

Disable with --disable-impact-scoring when you want raw heuristic-only ranking.

Schema-aware checks (default ON)

AuraIndex scans discovered SQL DDL and extracts existing indexes, then checks each suggestion for:

  • exact match (skip_exact_existing)
  • covered by existing left-prefix (skip_covered_by_existing_prefix)
  • narrower overlap requiring review (review_overlaps_existing_prefix)
  • unique/partial overlap signals

Suggestions already covered by existing indexes are skipped by default to reduce noise.

Disable with --disable-schema-checks if you want full raw suggestions.

With --enable-db-introspection, AuraIndex also pulls existing indexes directly from live DB metadata:

  • PostgreSQL: pg_indexes
  • MySQL: INFORMATION_SCHEMA.STATISTICS
  • SQLite: PRAGMA index_list + PRAGMA index_info

This significantly improves practical overlap detection when schema is not fully represented in code migrations.

Migration SQL (up/down)

AuraIndex emits auraindex_migration.sql with:

  • Up: suggested CREATE INDEX ...
  • Down: corresponding rollback (DROP INDEX ...)

Disable file generation with --disable-migration-output.

Optional EXPLAIN validation

AuraIndex can annotate suggestions with execution-plan checks:

  • SQLite mode: --explain-validate --explain-db-path <path>
  • External mode: --explain-validate --explain-command "<command with {sql}>"

Results are included in report metadata and rendered in Markdown/HTML.

Engine-aware index type hints (default ON)

AuraIndex recommends an index access method per suggestion (dialect-aware):

  • btree as baseline default
  • hash for equality-heavy single-column patterns (low write ratio)
  • brin for time-correlated columns in append-style workloads
  • gin for JSON/metadata-like columns
  • ivfflat hint for vector-like columns (manual operator-class tuning required)

The report includes rationale and typed DDL variant when safely renderable.

Disable with --disable-index-type-hints.

Schema drift watch (default ON)

AuraIndex emits drift-oriented metadata:

  • possibly unused existing indexes (non-unique indexes with no observed column usage)
  • missing index candidates from current scan

Rendered in Markdown/HTML under Schema Drift Watch.

Disable with --disable-drift-watch.

When live DB introspection is enabled, drift watch uses live index inventory as the primary signal.

Query optimization advisories (default ON)

AuraIndex includes static query-level improvement hints:

  • SELECT * projection warning
  • ORDER BY without LIMIT warning
  • leading wildcard LIKE '%x' warning

Rendered in Markdown/HTML under Query Optimization Advisories.

Disable with --disable-query-advisories.

Latest end-to-end validation snapshot

Run command:

auraindex --path D:\SuccessPilot --dialect postgres --min-support 2 --report-dir D:\AutoCure\AuraIndex\reports\successpilot-strong-adds

Observed output snapshot (latest run):

  • scanned files: 1002
  • query candidates: 62
  • parsed queries: 62
  • parse failures: 0
  • suggested indexes: 17
  • schema checks: existing index discovery and overlap checks enabled by default

Generated artifacts:

  • auraindex_report.json
  • auraindex_indexes.sql
  • auraindex_migration.sql
  • auraindex_orm_snippets.md
  • auraindex_report.md
  • auraindex_report.html

Live DB mode examples:

# PostgreSQL
auraindex --path . --dialect postgres --enable-db-introspection --db-dialect postgres --db-host localhost --db-port 5432 --db-user app --db-password-env DB_PASSWORD --db-name appdb --db-schema public

# MySQL
auraindex --path . --dialect mysql --enable-db-introspection --db-dialect mysql --db-host localhost --db-port 3306 --db-user app --db-password-env DB_PASSWORD --db-name appdb

# SQLite
auraindex --path . --dialect sqlite --enable-db-introspection --db-dialect sqlite --db-path D:\data\app.db

Per-service verification (SuccessPilot microservices):

  • ai-worker: fail=0
  • analytics: fail=0
  • backend: fail=0 (after excluding staticfiles generated assets)
  • frontend: fail=0 (after excluding build artifacts)
  • frontend_new: fail=0
  • live-updates: fail=0
  • onboarding: fail=0

Limitations

  • Static analysis cannot observe runtime frequency, parameter distributions, or write pressure.
  • ORM-to-SQL reconstruction is approximate for complex query builders.
  • EXPLAIN validation is optional and environment-dependent (DB connectivity and compatible SQL needed).
  • Suggestions are intentionally conservative and should still be reviewed before production rollout.

Testing

pytest -q

Build and publish (PyPI)

python -m pip install -U build twine
python -m build
twine check dist/*
# twine upload dist/*

Release checklist:

  • Package name available on PyPI.
  • Version bumped (pyproject.toml).
  • Long description renders (twine check passes).
  • LICENSE and README.md included in source distribution.

Roadmap

  • Add richer ORM semantic lifting (SQLAlchemy Core/ORM constructs).
  • Add log-based dynamic signal fusion.
  • Add adaptive suggestion ranking using execution metrics.
  • Expand language frontends while retaining the shared SQL AST backend.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auraindex-0.1.0.tar.gz (54.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auraindex-0.1.0-py3-none-any.whl (47.4 kB view details)

Uploaded Python 3

File details

Details for the file auraindex-0.1.0.tar.gz.

File metadata

  • Download URL: auraindex-0.1.0.tar.gz
  • Upload date:
  • Size: 54.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for auraindex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97a1e5b24e306490f161a6f3340d36e95043a6a6fe84a3178f0d3b94f6cd9291
MD5 e807cd24d1cdc08ca0b433ad75ed40c1
BLAKE2b-256 355efa4979100cba1e27ccb2db0a11ac1a3505fc5da29ebb4606d3fcee507368

See more details on using hashes here.

File details

Details for the file auraindex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: auraindex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for auraindex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5bbba991a3a4e621f83e675526ff5135a83473f56144eb54c16b5616a3494390
MD5 58e3fa9080eef3504add4fa9db0eeade
BLAKE2b-256 a31f1d0de676876bc1942c424c992b319ea7a818d33fa754ced8c537d16a0a68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page