Skip to main content

ora — The NL2SQL Agentic Runtime. Connect any database. Ask in plain English.

Project description

ora

The NL2SQL Agentic Runtime

Not a chatbot wrapper. A fully agentic system that reasons, self-corrects, and learns from every query.

PyPI CI License: MIT Python 3.11+ Docs

An on-machine AI agent that turns natural language into SQL across any database — PostgreSQL, Snowflake, BigQuery, DuckDB, Shopify, Salesforce, or plain CSV files. Ora decomposes complex questions, resolves entity semantics, generates and self-corrects SQL, and builds a living semantic layer that gets smarter with every query.

Quickstart · v2.0 Docs · Architecture · What's New · Changelog


import ora

db = ora.connect("sqlite:///retail.db")
result = db.query("top 10 stores by revenue")
print(result.dataframe)
      store_name  total_revenue
 Lotus Sukhumvit       133500.0
     Lotus Korat        35700.0
Lotus Chiang Mai        31000.0

One call. Works with any database. Works with any LLM.


What makes ora different

Every other NL2SQL tool stops at "generate SQL." ora is the agentic runtime that makes it production-worthy:

Challenge Others ora
Complex schemas (500+ cols) Hallucinate or fail CHESS LSH schema pruning → 8 relevant columns
SQL fails on first try Crash or retry blindly 3-stage ReFoRCE self-correction
Multiple databases Impossible Decompose → parallel agents → DuckDB in-memory JOIN
No learning over time Starts fresh every query 3-tier memory + SOUL user mental model
Black box decisions No visibility Full execution trace (10-node LangGraph pipeline)
No governance None PolicyGateway: DDL block, PII protection, cost ceiling

Architecture

Built on LangGraph for orchestration, OpenTelemetry for observability, LiteLLM for any LLM provider.

User asks question
  ↓
LangGraph StateGraph (12 nodes, conditional routing)
  ↓
understand → prune → retrieve → plan → generate → execute → respond → learn
                                           ↓ (on error)
                                        correct → execute (retry, up to 3x)
                                           ↓ (cross-source)
                              decompose → fan_out → synthesize → respond
  ↓
Execution Trace (Beam.ai-style node timeline) + NL Response + SQL + Data

Every node emits OpenTelemetry spans. Every query is audited. Every decision is traceable.

v2.0 Architecture (ReAct)

User asks question
  |
Ora ReAct Orchestrator (single node — does ALL thinking)
  |
  +-- Phase 1: Decompose (intent, entities, query structure, cross-source detection)
  +-- Phase 2: Semantic Resolve (alias pre-check, pattern injection, failure reflection)
  +-- Phase 3: Schema Context (CHESS LSH pruning, column validation against actual schema)
  +-- Phase 4: SQL Generation (3 candidates, parallel, self-correct up to 3x)
  +-- Phase 5: Semantic Fitness Check (LLM reviews "does this answer the question?")
  +-- Phase 6: Validate & Learn (evolve semantic layer, record rule outcomes)
  |
  +-> respond (NL summary + chart + follow-ups)
  +-> learn (aliases, patterns, enrichments, rules)

3-node LangGraph: ora -> respond -> learn. Ora does everything in one ReAct loop.


What's New in v2.0

Ora ReAct Orchestrator

The v1.0 multi-node pipeline (12 nodes, conditional routing) is replaced by a single ReAct orchestrator that thinks, delegates, validates, and re-routes — all in one node. This eliminates routing errors and gives Ora full context at every decision point.

  • Semantic fitness check — after SQL executes, Ora asks: "Does this result actually answer the question?" Not just "did rows come back?" but "are the right columns, entities, and dimensions present?" If not, Ora re-routes with specific fix instructions.
  • Entity coverage validation — if a query asks for "ALL ASEAN countries" but the semantic resolution only produced 2 filter values, Ora catches it before SQL generation. Known group sizes: ASEAN (10), G7 (7), EU (27), BRICS (5), OECD (38).
  • Schema gap detection — before SQL generation, Ora checks if the query requires dimensions (time trends, geographic breakdown) that don't exist in the target tables. Flags data limitations honestly.
  • Source-aware decomposition — the decomposition prompt now sees all available data sources and assigns each query part to a target domain. Cross-source queries are detected and flagged.
  • Column validation — semantic agent output is validated against the actual pruned schema. Non-existent columns are stripped with a WARNING to the SQL agent.

Semantic Layer Evolution

The semantic layer is no longer static. It evolves with every query through a 4-layer taxonomy:

Foundation (connect time) — analyze_source() discovers domain, column meanings, abbreviation maps, dimension/measure classification, filter tips. build_initial_taxonomy() detects cross-source join candidates and entity types with a single LLM call.

Inferred (from data patterns) — entity aliases, value mappings, cross-source joins detected via column name/type matching.

Confirmed (from successful queries) — evolve_semantic_layer() runs after every successful query:

  • New aliases saved with confidence scores (strengthened +3% per confirmation, cap 0.99)
  • Table relationships confirmed with query count
  • Filter patterns saved (auto-injected as defaults after 3+ uses)
  • Column enrichments tracked (what each column was used for)
  • Semantic manifest incremented with iteration ID + history

Corrected (from user feedback) — structured rules created from corrections with confidence lifecycle:

  • Created at 0.9 confidence from user corrections
  • Applied during SQL generation (top 5 rules injected)
  • Confirmed (+0.05) when query succeeds with rule applied
  • Weakened (-0.10) when query fails with rule applied
  • Expired when confidence drops below 0.30

Each evolution step is tracked in semantic_manifest.json with iteration ID, timestamp, trigger, and exact items learned.

Semantic Agent Reasoning Loop

The semantic agent now uses a multi-pass iterative reasoning approach:

  1. Pre-check — high-confidence aliases (>=0.93) resolved deterministically before the LLM call. Known patterns and column enrichments injected into prompt. Past failures loaded as anti-patterns.
  2. LLM reasoning — entity mapping with full schema context, column meanings, and learned vocabulary.
  3. Schema search — for unresolved entities, targeted DB lookups across text columns.
  4. Refinement — merge findings, update confidence, save resolution log entry.
  5. Failure reflection — when queries fail, negative signals recorded so the agent avoids repeating mistakes.

Structured Rule Engine

sqlagent/rules.py — learned rules with confidence, scope, and lifecycle:

from sqlagent.rules import load_rules, create_rule, record_rule_outcome

# Rules are created from corrections, applied during SQL generation,
# confirmed/weakened based on outcomes, and expired when stale.
rules = load_rules(workspace_id)  # sorted by confidence * hit_count

Observable Learning in Traces

Every query trace now shows an "Applied learned context" node:

Applied learned context
  3 past examples (best: 'top selling product' sim:0.94)
  2 rules: [sex='Total' default; DuckDB no YEAR()]
  4 context notes
  12 pre-resolved aliases
  semantic layer v8

REST API Connector Framework

New RestConnector base class for SaaS/POS/ERP/CRM integrations:

from sqlagent.connectors.catalog.shopify import ShopifyConnector

conn = ShopifyConnector(
    source_id="shopify_store",
    store_name="mystore",
    api_key="shpat_xxx",
)
await conn.connect()  # pulls data into DuckDB for SQL querying

Built-in connectors:

Connector Tables Connection URL
Shopify orders, customers, products, inventory, collections shopify://store?api_key=xxx
Salesforce accounts, contacts, opportunities, leads, cases salesforce://instance?access_token=xxx
Stripe charges, customers, subscriptions, invoices, payouts stripe://?api_key=xxx
HubSpot contacts, companies, deals, tickets hubspot://?api_key=xxx
Google Analytics 4 sessions, users, events ga4://property_id?access_token=xxx
Airbyte 300+ sources via PyAirbyte or managed instance airbyte://?source=source-name

All REST connectors handle auth (OAuth2, API key, Bearer), pagination (cursor, offset, link-header), rate limiting (token bucket with backoff), and schema inference automatically.

Knowledge Page (Semantic Taxonomy)

The Knowledge page is the Semantic Agent's working memory — a live, observable taxonomy showing how the agent's understanding evolves:

  • Graph tab — force-directed semantic graph with table nodes, relationship edges, confidence halos, and learned term annotations
  • Taxonomy tab — 4-layer knowledge feed (Foundation, Inferred, Confirmed, Corrected) with expandable entries showing exact items learned per query
  • Agent tab — conversational interface to the Semantic Agent with full semantic layer context (aliases, patterns, rules, relationships, evolution history)

Learn Page Improvements

  • Impact metrics — accuracy trend (early vs recent), rule inventory with hit count + success rate
  • Persistent training pairs — Qdrant stored on disk per workspace (survives server restarts)
  • Re-execute button — edit corrected SQL and re-run before saving to training
  • CTE-aware rewriting — Learn Agent's rewrite prompt prevents invalid CTE references

UI Updates

  • Landing page updated for v2.0 architecture (Ora phases, Semantic Agent passes)
  • Sources page: SaaS connector cards with branded letter icons (Shopify, Salesforce, Stripe, etc.)
  • Chart axis selection by LLM (analyzes varying vs constant columns)
  • Token arrow tooltips (hover to see "input tokens" vs "output tokens")
  • Setup chat: persistent messages across reloads, streamed greeting, workspace naming before data
  • Error handling: "Fetch is aborted" shows helpful message, SSE stream catches exceptions

Install

pip install ora-sql            # latest stable release
pip install ora-sql --upgrade  # upgrade to latest
pip install ora-sql==2.1.2     # specific version

Check installed version:

python -c "import sqlagent; print(sqlagent.__version__)"

Works with any LLM. Works offline with Ollama. No API key required for local models.

All releases: PyPI · GitHub Releases


Quickstart

One query

import ora

# Connect to any database
db = ora.connect("sqlite:///northwind.db")
db = ora.connect("postgresql://user:pass@host/mydb")
db = ora.connect("data/sales.csv")

result = db.query("top 10 customers by revenue")
print(result.dataframe)   # pandas DataFrame
print(result.sql)          # generated SQL
print(result.nl_response)  # natural language summary
print(result.trace)        # full execution trace

Cross-source queries

import ora

with ora.connect(
    sales="postgresql://prod/sales",
    inventory="snowflake://warehouse/inventory",
    staff="headcount.csv",
) as db:
    result = db.query("which stores have the highest revenue per employee this quarter?")
    print(result.dataframe)
    # → One DataFrame. Three databases. Zero manual join keys.

One-liner

import ora

result = ora.ask("top 10 customers by revenue", db="sqlite:///northwind.db")
print(result.dataframe)

Choose your LLM

import ora

# Anthropic Claude
db = ora.connect("sqlite:///retail.db")
result = db.query("top stores", model="claude-sonnet-4-5")

# OpenAI GPT-4o
result = db.query("top stores", model="gpt-4o")

# Local (free, no API key)
result = db.query("top stores", model="ollama/llama3")

Workspace UI

export ANTHROPIC_API_KEY="sk-ant-..."  # or OPENAI_API_KEY
ora serve --db sqlite:///retail.db --model claude-sonnet-4-5 --port 8080

Open http://localhost:8080 → Beam.ai-inspired workspace with:

  • Chat interface (Ask/Execute modes)
  • Execution trace timeline (node-by-node with green/red status dots)
  • SQL + data table + NL response + follow-up suggestions
  • Schema browser (Data view)
  • Task history (Tasks view)
  • QueryHub training packs (Learn view)
  • 👍/👎 feedback → auto-trains the system

The 10-Step Agentic Pipeline

Every query runs through a LangGraph StateGraph with 12 nodes and conditional routing:

Step What happens Real?
1. Routing Detect single vs cross-source, assess complexity Yes
2. Schema Pruning CHESS LSH: 500 cols → 8 relevant via embeddings Yes
3. Example Retrieval Vector search for similar NL→SQL pairs (Qdrant) Yes
4. Query Planning LLM plans JOIN strategy, filters, aggregations Yes
5. SQL Generation 3 generators run in parallel (asyncio.gather) Yes
6. Execution Run SQL + policy check (DDL block, PII, budget) Yes
7. Self-Correction ReFoRCE 3-stage: error-aware → schema-aware → DB-confirmed Yes
8. Response NL summary with bold highlights + follow-ups + chart Yes
9. Learning Write to episodic memory, update SOUL profile Yes
10. Audit OTel span, Prometheus metric, SQLite audit log Yes

Nothing is mocked. Nothing is faked. Every step is a real LLM call + real database operation.


Supported databases & integrations

Source Status Type
SQLite Database
PostgreSQL Database
MySQL Database
DuckDB Database
Snowflake Data Warehouse
BigQuery Data Warehouse
Redshift Data Warehouse
CSV / XLSX / Parquet / JSON File (via DuckDB)
Shopify ✅ v2.0 SaaS (REST API)
Salesforce ✅ v2.0 SaaS (REST API)
Stripe ✅ v2.0 SaaS (REST API)
HubSpot ✅ v2.0 SaaS (REST API)
Google Analytics 4 ✅ v2.0 SaaS (REST API)
Airbyte (300+ sources) ✅ v2.0 Integration Platform

Supported LLMs

Any model via LiteLLM:

claude-sonnet-4-5 · gpt-4o · gpt-4o-mini · bedrock/anthropic.claude-sonnet-4-5 · ollama/llama3 · azure/gpt-4o · vertex_ai/gemini-pro


Observability

Built-in from day one, not bolted on:

  • OpenTelemetry — span tree per query, exportable to Jaeger/Datadog/etc.
  • Langfuse — optional deep LLM tracing (set LANGFUSE_PUBLIC_KEY)
  • Prometheus — 6 metrics: queries, latency, corrections, cost, training pairs, sessions
  • Audit log — immutable SQLite record per query
  • Execution traces — full node-by-node trace persisted and viewable in UI

Run the demo

git clone https://github.com/ashfrnndz21/ora
cd ora
pip install -e ".[dev]"
export ANTHROPIC_API_KEY="sk-ant-..."  # or OPENAI_API_KEY
python demo.py

Project structure

sqlagent/                          ← core implementation
ora/                               ← public API shim (import ora)
sqlagent/
├── agent.py                       # SQLAgent + ask() + connect()
├── agents.py                      # SchemaAgent (full), SetupAgent, DecomposeAgent, SynthesisAgent
├── auth.py                        # JWT + Google OAuth + magic link
├── config.py                      # AgentConfig (40+ fields, bootstrap_aliases flag)
├── exceptions.py                  # 20+ typed exceptions
├── execution.py                   # SQLExecutor + 3-stage ReFoRCE correction
├── generators.py                  # Fewshot, Plan, Decompose + parallel Ensemble
├── hub.py                         # QueryHub community training packs
├── llm.py                         # LiteLLM provider + FastEmbed embedder
├── models.py                      # All data models (schema, KG, trace, events)
├── retrieval.py                   # Qdrant vector store + ExampleStore
├── rules.py                       # v2.0: Structured rule engine (confidence lifecycle)
├── runtime.py                     # PolicyGateway + Sessions + 3-tier Memory
├── schema.py                      # CHESS LSH pruning + M-Schema serializer
├── schema_agent.py                # v2.0: Bridge to full SchemaAgent with analyze()
├── semantic_agent.py              # v2.0: Semantic layer evolution + taxonomy + manifest
├── server.py                      # FastAPI (60+ endpoints, SSE streaming)
├── soul.py                        # SOUL user mental model (learns over 20 queries)
├── telemetry.py                   # OTel + Prometheus + audit + Langfuse
├── trace.py                       # TraceCollector + TraceStore
├── visualization.py               # LLM-powered chart generation (Vega-Lite)
├── confidence.py                  # LLM-assessed query confidence scoring
├── workspace.py                   # Multi-workspace persistence
├── agents/                        # v2.0: Multi-agent package
│   ├── orchestrator.py            # Ora ReAct orchestrator (fitness check, coverage, gaps)
│   ├── semantic.py                # SemanticAgent (iterative reasoning wrapper)
│   ├── schema.py                  # SchemaAgent (lightweight, for orchestrator)
│   ├── sql.py                     # SQLAgent (generation + validation)
│   ├── learn.py                   # LearnAgent (semantic layer evolution)
│   ├── setup.py                   # SetupAgent (conversational workspace creation)
│   ├── learning_loop.py           # LearningLoop (trace-aware training)
│   └── protocol.py                # AgentRequest/Response/ValidationResult/QueryDecomposition
├── graph/
│   ├── state.py                   # QueryState TypedDict (LangGraph)
│   ├── nodes.py                   # Graph node functions (respond, learn)
│   ├── builder.py                 # v2.0: 3-node graph (ora → respond → learn)
│   ├── ora_react.py               # v2.0: Ora ReAct entry point + AgentTrace
│   └── learn_graph.py             # v2.0: Learn Agent 5-node correction pipeline
├── connectors/
│   ├── __init__.py                # Connector protocol + ConnectorRegistry
│   ├── sql_connectors.py          # SQLite, Postgres, MySQL, Redshift
│   ├── warehouse_connectors.py    # Snowflake, BigQuery
│   ├── file_connector.py          # DuckDB + CSV/XLSX/Parquet/JSON
│   ├── rest_connector.py          # v2.0: REST API base class (auth, pagination, rate limit)
│   ├── airbyte_connector.py       # v2.0: Airbyte adapter (embedded + managed)
│   └── catalog/                   # v2.0: SaaS connector definitions
│       ├── shopify.py             # Shopify Admin API
│       ├── salesforce.py          # Salesforce REST API
│       ├── stripe.py              # Stripe API
│       ├── hubspot.py             # HubSpot CRM API
│       └── google_analytics.py    # Google Analytics 4 Data API
└── ui/
    ├── app.html                   # Workspace app (React 18 UMD, single file)
    └── preview_bp.html            # Landing page with architecture visualization

Contributing

git clone https://github.com/ora-ai/ora
pip install -e ".[dev]"
pytest tests/ -q    # 194+ tests, <2s

License

MIT — use freely, modify as needed, contribute back if you can.


Get started

pip install ora-sql

Quickstart · v2.0 Docs · Changelog · Contributing

Built by @ashfrnndz21

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ora_sql-2.1.2.tar.gz (596.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ora_sql-2.1.2-py3-none-any.whl (614.5 kB view details)

Uploaded Python 3

File details

Details for the file ora_sql-2.1.2.tar.gz.

File metadata

  • Download URL: ora_sql-2.1.2.tar.gz
  • Upload date:
  • Size: 596.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ora_sql-2.1.2.tar.gz
Algorithm Hash digest
SHA256 078750e5c67d537a366f6427706d58f9fea6ec426e59a80e6c69413259ba25a2
MD5 1e958397ed9b96ef21bea1fcf906069a
BLAKE2b-256 7581d008978b0f93ad4221772f210c1caa5501f9b46a545290c96e2a6df9e72e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ora_sql-2.1.2.tar.gz:

Publisher: publish.yml on ashfrnndz21/ora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ora_sql-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: ora_sql-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 614.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ora_sql-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ada7ae8c14825171b8388cd081e4f06ec318ec064266c2f50ffff61ad2336bca
MD5 c343554591c339a826da14300f10ecf1
BLAKE2b-256 86213cfa07e1f8d0fb56e2827cce759a6b54ce6a71495f01039416a82689b49c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ora_sql-2.1.2-py3-none-any.whl:

Publisher: publish.yml on ashfrnndz21/ora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page