Skip to main content

AST-driven SQL pattern extraction and index recommendation engine.

Project description

AuraIndex

AuraIndex is an AST-driven index recommendation engine for databases. It statically scans application code, extracts SQL and ORM query patterns, parses them into a SQL AST, aggregates workload-style usage signals, and proposes guarded index recommendations.

Why this project

Traditional index tuning tools depend on live database telemetry. AuraIndex targets earlier design stages by mining query intent from source code:

  • Language AST frontends extract query intent from code.
  • A shared SQL AST backend normalizes patterns across sources.
  • Heuristics generate index suggestions with explicit confidence and review flags.

This architecture makes it practical to support multiple languages over time while keeping one recommendation core.

Current scope

  • Tree-sitter WASM extraction for Python, JavaScript/TypeScript, and PHP
    • raw SQL call detection (execute, executemany, query, raw, text)
    • SQL-like query builder chain lifting (where/filter/join/order/group patterns)
  • SQL file extraction (.sql)
  • SQL AST analysis with sqlglot
  • Index heuristics:
    • single-column filter keys
    • join keys
    • composite filter keys
    • where+order composite patterns
  • Impact model (enabled by default):
    • read benefit vs write amplification scoring
    • net impact adjustment on suggestion ranking
  • Schema-aware checks (enabled by default):
    • existing index detection from SQL DDL (CREATE INDEX)
    • exact/prefix overlap detection (including unique/partial overlaps)
    • automatic suppression of already-covered suggestions
  • Migration artifact generation (enabled by default):
    • up/down SQL script for suggested indexes
  • Optional EXPLAIN validation:
    • sqlite built-in runner (EXPLAIN QUERY PLAN)
    • custom external command runner for postgres/mysql
  • Engine-aware index type hints (enabled by default):
    • btree / hash / brin / gin heuristics
    • typed DDL variant rendering (when applicable)
  • Schema drift watch (enabled by default):
    • possibly unused existing index candidates
    • missing index candidate summary
  • Query optimization advisories (enabled by default):
    • SELECT * detection
    • ORDER BY without LIMIT
    • leading wildcard LIKE warnings
  • Guardrails:
    • deduplication
    • confidence scoring
    • per-table cap on suggestions
    • manual-review flags
  • Unified Tree-sitter WASM frontend (web-tree-sitter) for supported languages

Architecture

  1. Extractors (language-specific frontend)

    • WasmTreeSitterExtractor invokes a Node script using Tree-sitter WASM for Python/JS/TS/PHP.
    • extract_sql_file loads SQL statements directly from .sql files.
  2. SQL AST analyzer (shared backend)

    • Parses candidate SQL with sqlglot.
    • Extracts tables, predicate columns, join columns, ordering/grouping columns.
    • Builds query-level feature records.
  3. Workload aggregation

    • Counts column participation by role (WHERE, JOIN, ORDER BY, GROUP BY).
    • Tracks composite filter patterns and where+order pair frequency.
  4. Index suggestion engine

    • Applies heuristic scoring with confidence.
    • Applies impact scoring (read gain vs write cost).
    • Performs schema-aware overlap checks against discovered existing indexes.
    • Emits dialect-aware CREATE INDEX statements.
    • Adds ORM snippet equivalents for SQLAlchemy and Django.

Installation

cd AuraIndex
python -m venv .venv
.venv\Scripts\activate
pip install -e .[dev]

AuraIndex wheels now bundle WASM runtime + grammar assets (Python/JS/TS/TSX/PHP), so npm setup is not required for normal package usage.

Optional (source-development override of local grammars/runtime):

npm install web-tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-php

For live database introspection (Postgres/MySQL drivers):

pip install -e .[dev,db]

After publishing to PyPI, install the package with:

pip install AuraIndex

Quick start

auraindex --path . --dialect postgres --min-support 2 --show-top

Outputs:

  • By default, AuraIndex writes reports to a temp folder: %TEMP%\auraindex-report-*
  • On completion, AuraIndex automatically opens the interactive HTML report in your default browser.
  • Artifacts:
    • auraindex_report.json: extraction, parsing, workload, and suggestion details
    • auraindex_indexes.sql: executable SQL index script (DDL = Data Definition Language)
    • auraindex_migration.sql: up/down migration SQL (create + rollback)
    • auraindex_orm_snippets.md: ORM equivalents
    • auraindex_report.md: detailed human-readable report with ranking and evidence
    • auraindex_report.html: interactive report with filters, copy buttons, and evidence drill-down

Useful CLI flags

  • --report-dir D:\reports\scan-01: write outputs to a fixed directory.
  • --json-out custom.json, --ddl-out custom.sql, --migration-out migration.sql, --orm-out custom.md, --md-out details.md, --html-out ui.html
  • --show-top: print ranked recommendations in terminal.
  • --limit-suggestions 25: optionally limit ranked suggestions shown in markdown/HTML/terminal; by default AuraIndex shows all suggestions.
  • --strict-parse: exit code 2 when parse failures are found.
  • --disable-wasm: scan only .sql files (debug mode).
  • --disable-impact-scoring: turn off net impact rank adjustment.
  • --disable-schema-checks: turn off existing-index overlap suppression.
  • --disable-index-type-hints: turn off engine-aware index type recommendation metadata.
  • --disable-drift-watch: turn off schema drift watch output.
  • --disable-query-advisories: turn off query optimization advisories.
  • --max-query-advisories 150: cap advisory rows in report metadata.
  • --enable-db-introspection: enable live DB schema/index introspection.
  • --db-dialect postgres|mysql|sqlite: live DB dialect (defaults to --dialect if omitted).
  • --db-host, --db-port, --db-user, --db-password / --db-password-env, --db-name
  • --db-path (sqlite), --db-schema (postgres schema filter), --db-timeout-seconds
  • --disable-migration-output: skip writing migration SQL artifact.
  • --explain-validate --explain-db-path D:\db.sqlite: run sqlite EXPLAIN validation.
  • --explain-validate --explain-command "psql ... -c \"EXPLAIN {sql}\"": run custom EXPLAIN command.

Use help anytime:

auraindex --help

WASM runtime assets

npm install web-tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-php
auraindex --path .

AuraIndex bundles default WASM grammar/runtime assets in the wheel. It can also auto-detect external grammars from local node_modules, and supports AURAINDEX_WASM_GRAMMAR_DIR.

If WASM grammars are unavailable, AuraIndex now degrades gracefully:

  • Python files are still scanned using a built-in Python AST fallback.
  • SQL files continue to be scanned normally.
  • JS/TS/PHP source files are scanned with a built-in text fallback extractor (raw SQL + query-builder pattern lifting) if WASM is unavailable.

Default excluded directories include generated assets and build outputs:

  • .git, node_modules, __pycache__, .venv, venv
  • build, dist, staticfiles, .next

This avoids parse noise from minified vendor bundles in production/static folders.

Cardinality hints (optional)

You can provide a JSON file to bias recommendations:

{
  "users.email": "high",
  "orders.status": "low"
}

Use:

auraindex --path . --cardinality-hints cardinality_hints.json

Hints are guardrails only; they do not replace runtime telemetry.

Example report fields

auraindex_report.json includes:

  • summary: scanned files, candidate count, parse success/failures, suggestion count
  • candidates: extracted SQL with file, line, function, source_kind
  • workload: per-table and per-column usage metrics
  • suggestions: index DDL, confidence, evidence, impact metadata, schema checks, and manual-review flags
  • detailed_index_evidence: per-index support counts, matched query excerpts, and source locations

How confidence is computed

Confidence is a bounded heuristic score in [0.50, 0.98], not a probabilistic guarantee.

  • Join-key suggestions start higher (~0.80) and rise with repeated JOIN evidence.
  • Filter-key suggestions start lower (~0.74) and rise with repeated WHERE evidence.
  • Composite suggestions start higher (~0.82-0.84) because repeated multi-column patterns are stronger signals.
  • Cardinality hints can nudge confidence (high raises, low lowers).
  • Manual-review flag is set for lower-confidence suggestions and wider composites.

In short: more repeated, structurally strong evidence => higher confidence.

What --min-support means

--min-support is the minimum repetition threshold before AuraIndex recommends an index.

Example:

  • --min-support 1: include one-off patterns (higher recall, more noise)
  • --min-support 3: include only repeated patterns (cleaner, more conservative)

Use lower values for exploration, higher values for production-oriented recommendations.

Composite index handling

AuraIndex explicitly models composite opportunities and prunes redundant suggestions:

  • Detects repeated multi-column WHERE patterns
  • Detects WHERE + ORDER BY patterns for left-prefix composites
  • Suppresses redundant single-column suggestions when a stronger composite prefix exists
  • Limits per-table output via --max-indexes-per-table

Impact scoring model (default ON)

AuraIndex estimates whether an index is likely to be worth it:

  • Read benefit: weighted by WHERE/JOIN/ORDER/GROUP usage and support evidence.
  • Write cost: estimated from table write ratio and index width.
  • Net impact = read benefit - write cost.

This net impact adjusts ranking (score_adjustment) and is shown in JSON/Markdown/HTML reports.

Disable with --disable-impact-scoring when you want raw heuristic-only ranking.

Schema-aware checks (default ON)

AuraIndex scans discovered SQL DDL and extracts existing indexes, then checks each suggestion for:

  • exact match (skip_exact_existing)
  • covered by existing left-prefix (skip_covered_by_existing_prefix)
  • narrower overlap requiring review (review_overlaps_existing_prefix)
  • unique/partial overlap signals

Suggestions already covered by existing indexes are skipped by default to reduce noise.

Disable with --disable-schema-checks if you want full raw suggestions.

With --enable-db-introspection, AuraIndex also pulls existing indexes directly from live DB metadata:

  • PostgreSQL: pg_indexes
  • MySQL: INFORMATION_SCHEMA.STATISTICS
  • SQLite: PRAGMA index_list + PRAGMA index_info

This significantly improves practical overlap detection when schema is not fully represented in code migrations.

Migration SQL (up/down)

AuraIndex emits auraindex_migration.sql with:

  • Up: suggested CREATE INDEX ...
  • Down: corresponding rollback (DROP INDEX ...)

Disable file generation with --disable-migration-output.

Optional EXPLAIN validation

AuraIndex can annotate suggestions with execution-plan checks:

  • SQLite mode: --explain-validate --explain-db-path <path>
  • External mode: --explain-validate --explain-command "<command with {sql}>"

Results are included in report metadata and rendered in Markdown/HTML.

Engine-aware index type hints (default ON)

AuraIndex recommends an index access method per suggestion (dialect-aware):

  • btree as baseline default
  • hash for equality-heavy single-column patterns (low write ratio)
  • brin for time-correlated columns in append-style workloads
  • gin for JSON/metadata-like columns
  • ivfflat hint for vector-like columns (manual operator-class tuning required)

The report includes rationale and typed DDL variant when safely renderable.

Disable with --disable-index-type-hints.

Schema drift watch (default ON)

AuraIndex emits drift-oriented metadata:

  • possibly unused existing indexes (non-unique indexes with no observed column usage)
  • missing index candidates from current scan

Rendered in Markdown/HTML under Schema Drift Watch.

Disable with --disable-drift-watch.

When live DB introspection is enabled, drift watch uses live index inventory as the primary signal.

Query optimization advisories (default ON)

AuraIndex includes static query-level improvement hints:

  • SELECT * projection warning
  • ORDER BY without LIMIT warning
  • leading wildcard LIKE '%x' warning

Rendered in Markdown/HTML under Query Optimization Advisories.

Disable with --disable-query-advisories.

Latest end-to-end validation snapshot

Run command:

auraindex --path D:\SuccessPilot --dialect postgres --min-support 2 --report-dir D:\AutoCure\AuraIndex\reports\successpilot-strong-adds

Observed output snapshot (latest run):

  • scanned files: 1002
  • query candidates: 62
  • parsed queries: 62
  • parse failures: 0
  • suggested indexes: 17
  • schema checks: existing index discovery and overlap checks enabled by default

Generated artifacts:

  • auraindex_report.json
  • auraindex_indexes.sql
  • auraindex_migration.sql
  • auraindex_orm_snippets.md
  • auraindex_report.md
  • auraindex_report.html

Live DB mode examples:

# PostgreSQL
auraindex --path . --dialect postgres --enable-db-introspection --db-dialect postgres --db-host localhost --db-port 5432 --db-user app --db-password-env DB_PASSWORD --db-name appdb --db-schema public

# MySQL
auraindex --path . --dialect mysql --enable-db-introspection --db-dialect mysql --db-host localhost --db-port 3306 --db-user app --db-password-env DB_PASSWORD --db-name appdb

# SQLite
auraindex --path . --dialect sqlite --enable-db-introspection --db-dialect sqlite --db-path D:\data\app.db

Per-service verification (SuccessPilot microservices):

  • ai-worker: fail=0
  • analytics: fail=0
  • backend: fail=0 (after excluding staticfiles generated assets)
  • frontend: fail=0 (after excluding build artifacts)
  • frontend_new: fail=0
  • live-updates: fail=0
  • onboarding: fail=0

Limitations

  • Static analysis cannot observe runtime frequency, parameter distributions, or write pressure.
  • ORM-to-SQL reconstruction is approximate for complex query builders.
  • EXPLAIN validation is optional and environment-dependent (DB connectivity and compatible SQL needed).
  • Suggestions are intentionally conservative and should still be reviewed before production rollout.

Testing

pytest -q

Build and publish (PyPI)

python -m pip install -U build twine
python -m build
twine check dist/*
# twine upload dist/*

Release checklist:

  • Package name available on PyPI.
  • Version bumped (pyproject.toml).
  • Long description renders (twine check passes).
  • LICENSE and README.md included in source distribution.

Roadmap

  • Add richer ORM semantic lifting (SQLAlchemy Core/ORM constructs).
  • Add log-based dynamic signal fusion.
  • Add adaptive suggestion ranking using execution metrics.
  • Expand language frontends while retaining the shared SQL AST backend.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auraindex-0.1.3.tar.gz (764.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auraindex-0.1.3-py3-none-any.whl (763.3 kB view details)

Uploaded Python 3

File details

Details for the file auraindex-0.1.3.tar.gz.

File metadata

  • Download URL: auraindex-0.1.3.tar.gz
  • Upload date:
  • Size: 764.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for auraindex-0.1.3.tar.gz
Algorithm Hash digest
SHA256 165747b7093e1359ef05bc276cc702a3e5546212c199900d043d248ac2480e2b
MD5 1dbe4bd5963bef73d2a5ea79972ceb00
BLAKE2b-256 e7a81da9d73d9729f2378437a1d5fc8e9fa4474e8ee106be0e817be4c29784f0

See more details on using hashes here.

File details

Details for the file auraindex-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: auraindex-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 763.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for auraindex-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 65e2c65e0562cb3e0ba93fb6cc84cf1bfa554644179328f8d15b079ff8e7a471
MD5 4d712560d7ece649f370c78acfdf840f
BLAKE2b-256 5c080ae4e17b4b9fd14b015dac4f2df9920e5e22878baaa5899fb0345f83a323

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page