ora — The NL2SQL Agentic Runtime. Connect any database. Ask in plain English.
Project description
ora
The NL2SQL Agentic Runtime
Not a chatbot wrapper. A fully agentic system that reasons, self-corrects, and learns from every query.
An on-machine AI agent that turns natural language into SQL across any database — PostgreSQL, Snowflake, BigQuery, DuckDB, Shopify, Salesforce, or plain CSV files. Ora decomposes complex questions, resolves entity semantics, generates and self-corrects SQL, and builds a living semantic layer that gets smarter with every query.
Quickstart · v2.0 Docs · Architecture · What's New · Changelog
import ora
db = ora.connect("sqlite:///retail.db")
result = db.query("top 10 stores by revenue")
print(result.dataframe)
store_name total_revenue
Lotus Sukhumvit 133500.0
Lotus Korat 35700.0
Lotus Chiang Mai 31000.0
One call. Works with any database. Works with any LLM.
What makes ora different
Every other NL2SQL tool stops at "generate SQL." ora is the agentic runtime that makes it production-worthy:
| Challenge | Others | ora |
|---|---|---|
| Complex schemas (500+ cols) | Hallucinate or fail | CHESS LSH schema pruning → 8 relevant columns |
| SQL fails on first try | Crash or retry blindly | 3-stage ReFoRCE self-correction |
| Multiple databases | Impossible | Decompose → parallel agents → DuckDB in-memory JOIN |
| No learning over time | Starts fresh every query | 3-tier memory + SOUL user mental model |
| Black box decisions | No visibility | Full execution trace (10-node LangGraph pipeline) |
| No governance | None | PolicyGateway: DDL block, PII protection, cost ceiling |
Architecture
Built on LangGraph for orchestration, OpenTelemetry for observability, LiteLLM for any LLM provider.
User asks question
↓
LangGraph StateGraph (12 nodes, conditional routing)
↓
understand → prune → retrieve → plan → generate → execute → respond → learn
↓ (on error)
correct → execute (retry, up to 3x)
↓ (cross-source)
decompose → fan_out → synthesize → respond
↓
Execution Trace (Beam.ai-style node timeline) + NL Response + SQL + Data
Every node emits OpenTelemetry spans. Every query is audited. Every decision is traceable.
v2.0 Architecture (ReAct)
User asks question
|
Ora ReAct Orchestrator (single node — does ALL thinking)
|
+-- Phase 1: Decompose (intent, entities, query structure, cross-source detection)
+-- Phase 2: Semantic Resolve (alias pre-check, pattern injection, failure reflection)
+-- Phase 3: Schema Context (CHESS LSH pruning, column validation against actual schema)
+-- Phase 4: SQL Generation (3 candidates, parallel, self-correct up to 3x)
+-- Phase 5: Semantic Fitness Check (LLM reviews "does this answer the question?")
+-- Phase 6: Validate & Learn (evolve semantic layer, record rule outcomes)
|
+-> respond (NL summary + chart + follow-ups)
+-> learn (aliases, patterns, enrichments, rules)
3-node LangGraph: ora -> respond -> learn. Ora does everything in one ReAct loop.
What's New in v2.0
Ora ReAct Orchestrator
The v1.0 multi-node pipeline (12 nodes, conditional routing) is replaced by a single ReAct orchestrator that thinks, delegates, validates, and re-routes — all in one node. This eliminates routing errors and gives Ora full context at every decision point.
- Semantic fitness check — after SQL executes, Ora asks: "Does this result actually answer the question?" Not just "did rows come back?" but "are the right columns, entities, and dimensions present?" If not, Ora re-routes with specific fix instructions.
- Entity coverage validation — if a query asks for "ALL ASEAN countries" but the semantic resolution only produced 2 filter values, Ora catches it before SQL generation. Known group sizes: ASEAN (10), G7 (7), EU (27), BRICS (5), OECD (38).
- Schema gap detection — before SQL generation, Ora checks if the query requires dimensions (time trends, geographic breakdown) that don't exist in the target tables. Flags data limitations honestly.
- Source-aware decomposition — the decomposition prompt now sees all available data sources and assigns each query part to a target domain. Cross-source queries are detected and flagged.
- Column validation — semantic agent output is validated against the actual pruned schema. Non-existent columns are stripped with a WARNING to the SQL agent.
Semantic Layer Evolution
The semantic layer is no longer static. It evolves with every query through a 4-layer taxonomy:
Foundation (connect time) — analyze_source() discovers domain, column meanings, abbreviation maps, dimension/measure classification, filter tips. build_initial_taxonomy() detects cross-source join candidates and entity types with a single LLM call.
Inferred (from data patterns) — entity aliases, value mappings, cross-source joins detected via column name/type matching.
Confirmed (from successful queries) — evolve_semantic_layer() runs after every successful query:
- New aliases saved with confidence scores (strengthened +3% per confirmation, cap 0.99)
- Table relationships confirmed with query count
- Filter patterns saved (auto-injected as defaults after 3+ uses)
- Column enrichments tracked (what each column was used for)
- Semantic manifest incremented with iteration ID + history
Corrected (from user feedback) — structured rules created from corrections with confidence lifecycle:
- Created at 0.9 confidence from user corrections
- Applied during SQL generation (top 5 rules injected)
- Confirmed (+0.05) when query succeeds with rule applied
- Weakened (-0.10) when query fails with rule applied
- Expired when confidence drops below 0.30
Each evolution step is tracked in semantic_manifest.json with iteration ID, timestamp, trigger, and exact items learned.
Semantic Agent Reasoning Loop
The semantic agent now uses a multi-pass iterative reasoning approach:
- Pre-check — high-confidence aliases (>=0.93) resolved deterministically before the LLM call. Known patterns and column enrichments injected into prompt. Past failures loaded as anti-patterns.
- LLM reasoning — entity mapping with full schema context, column meanings, and learned vocabulary.
- Schema search — for unresolved entities, targeted DB lookups across text columns.
- Refinement — merge findings, update confidence, save resolution log entry.
- Failure reflection — when queries fail, negative signals recorded so the agent avoids repeating mistakes.
Structured Rule Engine
sqlagent/rules.py — learned rules with confidence, scope, and lifecycle:
from sqlagent.rules import load_rules, create_rule, record_rule_outcome
# Rules are created from corrections, applied during SQL generation,
# confirmed/weakened based on outcomes, and expired when stale.
rules = load_rules(workspace_id) # sorted by confidence * hit_count
Observable Learning in Traces
Every query trace now shows an "Applied learned context" node:
Applied learned context
3 past examples (best: 'top selling product' sim:0.94)
2 rules: [sex='Total' default; DuckDB no YEAR()]
4 context notes
12 pre-resolved aliases
semantic layer v8
REST API Connector Framework
New RestConnector base class for SaaS/POS/ERP/CRM integrations:
from sqlagent.connectors.catalog.shopify import ShopifyConnector
conn = ShopifyConnector(
source_id="shopify_store",
store_name="mystore",
api_key="shpat_xxx",
)
await conn.connect() # pulls data into DuckDB for SQL querying
Built-in connectors:
| Connector | Tables | Connection URL |
|---|---|---|
| Shopify | orders, customers, products, inventory, collections | shopify://store?api_key=xxx |
| Salesforce | accounts, contacts, opportunities, leads, cases | salesforce://instance?access_token=xxx |
| Stripe | charges, customers, subscriptions, invoices, payouts | stripe://?api_key=xxx |
| HubSpot | contacts, companies, deals, tickets | hubspot://?api_key=xxx |
| Google Analytics 4 | sessions, users, events | ga4://property_id?access_token=xxx |
| Airbyte | 300+ sources via PyAirbyte or managed instance | airbyte://?source=source-name |
All REST connectors handle auth (OAuth2, API key, Bearer), pagination (cursor, offset, link-header), rate limiting (token bucket with backoff), and schema inference automatically.
Knowledge Page (Semantic Taxonomy)
The Knowledge page is the Semantic Agent's working memory — a live, observable taxonomy showing how the agent's understanding evolves:
- Graph tab — force-directed semantic graph with table nodes, relationship edges, confidence halos, and learned term annotations
- Taxonomy tab — 4-layer knowledge feed (Foundation, Inferred, Confirmed, Corrected) with expandable entries showing exact items learned per query
- Agent tab — conversational interface to the Semantic Agent with full semantic layer context (aliases, patterns, rules, relationships, evolution history)
Learn Page Improvements
- Impact metrics — accuracy trend (early vs recent), rule inventory with hit count + success rate
- Persistent training pairs — Qdrant stored on disk per workspace (survives server restarts)
- Re-execute button — edit corrected SQL and re-run before saving to training
- CTE-aware rewriting — Learn Agent's rewrite prompt prevents invalid CTE references
UI Updates
- Landing page updated for v2.0 architecture (Ora phases, Semantic Agent passes)
- Sources page: SaaS connector cards with branded letter icons (Shopify, Salesforce, Stripe, etc.)
- Chart axis selection by LLM (analyzes varying vs constant columns)
- Token arrow tooltips (hover to see "input tokens" vs "output tokens")
- Setup chat: persistent messages across reloads, streamed greeting, workspace naming before data
- Error handling: "Fetch is aborted" shows helpful message, SSE stream catches exceptions
Install
pip install ora-sql # latest stable release
pip install ora-sql --upgrade # upgrade to latest
pip install ora-sql==2.1.2 # specific version
Check installed version:
python -c "import sqlagent; print(sqlagent.__version__)"
Works with any LLM. Works offline with Ollama. No API key required for local models.
All releases: PyPI · GitHub Releases
Quickstart
One query
import ora
# Connect to any database
db = ora.connect("sqlite:///northwind.db")
db = ora.connect("postgresql://user:pass@host/mydb")
db = ora.connect("data/sales.csv")
result = db.query("top 10 customers by revenue")
print(result.dataframe) # pandas DataFrame
print(result.sql) # generated SQL
print(result.nl_response) # natural language summary
print(result.trace) # full execution trace
Cross-source queries
import ora
with ora.connect(
sales="postgresql://prod/sales",
inventory="snowflake://warehouse/inventory",
staff="headcount.csv",
) as db:
result = db.query("which stores have the highest revenue per employee this quarter?")
print(result.dataframe)
# → One DataFrame. Three databases. Zero manual join keys.
One-liner
import ora
result = ora.ask("top 10 customers by revenue", db="sqlite:///northwind.db")
print(result.dataframe)
Choose your LLM
import ora
# Anthropic Claude
db = ora.connect("sqlite:///retail.db")
result = db.query("top stores", model="claude-sonnet-4-5")
# OpenAI GPT-4o
result = db.query("top stores", model="gpt-4o")
# Local (free, no API key)
result = db.query("top stores", model="ollama/llama3")
Workspace UI
export ANTHROPIC_API_KEY="sk-ant-..." # or OPENAI_API_KEY
ora serve --db sqlite:///retail.db --model claude-sonnet-4-5 --port 8080
Open http://localhost:8080 → Beam.ai-inspired workspace with:
- Chat interface (Ask/Execute modes)
- Execution trace timeline (node-by-node with green/red status dots)
- SQL + data table + NL response + follow-up suggestions
- Schema browser (Data view)
- Task history (Tasks view)
- QueryHub training packs (Learn view)
- 👍/👎 feedback → auto-trains the system
The 10-Step Agentic Pipeline
Every query runs through a LangGraph StateGraph with 12 nodes and conditional routing:
| Step | What happens | Real? |
|---|---|---|
| 1. Routing | Detect single vs cross-source, assess complexity | Yes |
| 2. Schema Pruning | CHESS LSH: 500 cols → 8 relevant via embeddings | Yes |
| 3. Example Retrieval | Vector search for similar NL→SQL pairs (Qdrant) | Yes |
| 4. Query Planning | LLM plans JOIN strategy, filters, aggregations | Yes |
| 5. SQL Generation | 3 generators run in parallel (asyncio.gather) | Yes |
| 6. Execution | Run SQL + policy check (DDL block, PII, budget) | Yes |
| 7. Self-Correction | ReFoRCE 3-stage: error-aware → schema-aware → DB-confirmed | Yes |
| 8. Response | NL summary with bold highlights + follow-ups + chart | Yes |
| 9. Learning | Write to episodic memory, update SOUL profile | Yes |
| 10. Audit | OTel span, Prometheus metric, SQLite audit log | Yes |
Nothing is mocked. Nothing is faked. Every step is a real LLM call + real database operation.
Supported databases & integrations
| Source | Status | Type |
|---|---|---|
| SQLite | ✅ | Database |
| PostgreSQL | ✅ | Database |
| MySQL | ✅ | Database |
| DuckDB | ✅ | Database |
| Snowflake | ✅ | Data Warehouse |
| BigQuery | ✅ | Data Warehouse |
| Redshift | ✅ | Data Warehouse |
| CSV / XLSX / Parquet / JSON | ✅ | File (via DuckDB) |
| Shopify | ✅ v2.0 | SaaS (REST API) |
| Salesforce | ✅ v2.0 | SaaS (REST API) |
| Stripe | ✅ v2.0 | SaaS (REST API) |
| HubSpot | ✅ v2.0 | SaaS (REST API) |
| Google Analytics 4 | ✅ v2.0 | SaaS (REST API) |
| Airbyte (300+ sources) | ✅ v2.0 | Integration Platform |
Supported LLMs
Any model via LiteLLM:
claude-sonnet-4-5 · gpt-4o · gpt-4o-mini · bedrock/anthropic.claude-sonnet-4-5 · ollama/llama3 · azure/gpt-4o · vertex_ai/gemini-pro
Observability
Built-in from day one, not bolted on:
- OpenTelemetry — span tree per query, exportable to Jaeger/Datadog/etc.
- Langfuse — optional deep LLM tracing (set
LANGFUSE_PUBLIC_KEY) - Prometheus — 6 metrics: queries, latency, corrections, cost, training pairs, sessions
- Audit log — immutable SQLite record per query
- Execution traces — full node-by-node trace persisted and viewable in UI
Run the demo
git clone https://github.com/ashfrnndz21/ora
cd ora
pip install -e ".[dev]"
export ANTHROPIC_API_KEY="sk-ant-..." # or OPENAI_API_KEY
python demo.py
Project structure
sqlagent/ ← core implementation
ora/ ← public API shim (import ora)
sqlagent/
├── agent.py # SQLAgent + ask() + connect()
├── agents.py # SchemaAgent (full), SetupAgent, DecomposeAgent, SynthesisAgent
├── auth.py # JWT + Google OAuth + magic link
├── config.py # AgentConfig (40+ fields, bootstrap_aliases flag)
├── exceptions.py # 20+ typed exceptions
├── execution.py # SQLExecutor + 3-stage ReFoRCE correction
├── generators.py # Fewshot, Plan, Decompose + parallel Ensemble
├── hub.py # QueryHub community training packs
├── llm.py # LiteLLM provider + FastEmbed embedder
├── models.py # All data models (schema, KG, trace, events)
├── retrieval.py # Qdrant vector store + ExampleStore
├── rules.py # v2.0: Structured rule engine (confidence lifecycle)
├── runtime.py # PolicyGateway + Sessions + 3-tier Memory
├── schema.py # CHESS LSH pruning + M-Schema serializer
├── schema_agent.py # v2.0: Bridge to full SchemaAgent with analyze()
├── semantic_agent.py # v2.0: Semantic layer evolution + taxonomy + manifest
├── server.py # FastAPI (60+ endpoints, SSE streaming)
├── soul.py # SOUL user mental model (learns over 20 queries)
├── telemetry.py # OTel + Prometheus + audit + Langfuse
├── trace.py # TraceCollector + TraceStore
├── visualization.py # LLM-powered chart generation (Vega-Lite)
├── confidence.py # LLM-assessed query confidence scoring
├── workspace.py # Multi-workspace persistence
├── agents/ # v2.0: Multi-agent package
│ ├── orchestrator.py # Ora ReAct orchestrator (fitness check, coverage, gaps)
│ ├── semantic.py # SemanticAgent (iterative reasoning wrapper)
│ ├── schema.py # SchemaAgent (lightweight, for orchestrator)
│ ├── sql.py # SQLAgent (generation + validation)
│ ├── learn.py # LearnAgent (semantic layer evolution)
│ ├── setup.py # SetupAgent (conversational workspace creation)
│ ├── learning_loop.py # LearningLoop (trace-aware training)
│ └── protocol.py # AgentRequest/Response/ValidationResult/QueryDecomposition
├── graph/
│ ├── state.py # QueryState TypedDict (LangGraph)
│ ├── nodes.py # Graph node functions (respond, learn)
│ ├── builder.py # v2.0: 3-node graph (ora → respond → learn)
│ ├── ora_react.py # v2.0: Ora ReAct entry point + AgentTrace
│ └── learn_graph.py # v2.0: Learn Agent 5-node correction pipeline
├── connectors/
│ ├── __init__.py # Connector protocol + ConnectorRegistry
│ ├── sql_connectors.py # SQLite, Postgres, MySQL, Redshift
│ ├── warehouse_connectors.py # Snowflake, BigQuery
│ ├── file_connector.py # DuckDB + CSV/XLSX/Parquet/JSON
│ ├── rest_connector.py # v2.0: REST API base class (auth, pagination, rate limit)
│ ├── airbyte_connector.py # v2.0: Airbyte adapter (embedded + managed)
│ └── catalog/ # v2.0: SaaS connector definitions
│ ├── shopify.py # Shopify Admin API
│ ├── salesforce.py # Salesforce REST API
│ ├── stripe.py # Stripe API
│ ├── hubspot.py # HubSpot CRM API
│ └── google_analytics.py # Google Analytics 4 Data API
└── ui/
├── app.html # Workspace app (React 18 UMD, single file)
└── preview_bp.html # Landing page with architecture visualization
Contributing
git clone https://github.com/ora-ai/ora
pip install -e ".[dev]"
pytest tests/ -q # 194+ tests, <2s
License
MIT — use freely, modify as needed, contribute back if you can.
Get started
pip install ora-sql
Quickstart · v2.0 Docs · Changelog · Contributing
Built by @ashfrnndz21
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ora_sql-2.1.2.tar.gz.
File metadata
- Download URL: ora_sql-2.1.2.tar.gz
- Upload date:
- Size: 596.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
078750e5c67d537a366f6427706d58f9fea6ec426e59a80e6c69413259ba25a2
|
|
| MD5 |
1e958397ed9b96ef21bea1fcf906069a
|
|
| BLAKE2b-256 |
7581d008978b0f93ad4221772f210c1caa5501f9b46a545290c96e2a6df9e72e
|
Provenance
The following attestation bundles were made for ora_sql-2.1.2.tar.gz:
Publisher:
publish.yml on ashfrnndz21/ora
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ora_sql-2.1.2.tar.gz -
Subject digest:
078750e5c67d537a366f6427706d58f9fea6ec426e59a80e6c69413259ba25a2 - Sigstore transparency entry: 1249555000
- Sigstore integration time:
-
Permalink:
ashfrnndz21/ora@f4299a63f9556ba030114d9dd79729cb25b80829 -
Branch / Tag:
refs/tags/v2.1.2 - Owner: https://github.com/ashfrnndz21
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f4299a63f9556ba030114d9dd79729cb25b80829 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ora_sql-2.1.2-py3-none-any.whl.
File metadata
- Download URL: ora_sql-2.1.2-py3-none-any.whl
- Upload date:
- Size: 614.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ada7ae8c14825171b8388cd081e4f06ec318ec064266c2f50ffff61ad2336bca
|
|
| MD5 |
c343554591c339a826da14300f10ecf1
|
|
| BLAKE2b-256 |
86213cfa07e1f8d0fb56e2827cce759a6b54ce6a71495f01039416a82689b49c
|
Provenance
The following attestation bundles were made for ora_sql-2.1.2-py3-none-any.whl:
Publisher:
publish.yml on ashfrnndz21/ora
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ora_sql-2.1.2-py3-none-any.whl -
Subject digest:
ada7ae8c14825171b8388cd081e4f06ec318ec064266c2f50ffff61ad2336bca - Sigstore transparency entry: 1249555243
- Sigstore integration time:
-
Permalink:
ashfrnndz21/ora@f4299a63f9556ba030114d9dd79729cb25b80829 -
Branch / Tag:
refs/tags/v2.1.2 - Owner: https://github.com/ashfrnndz21
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f4299a63f9556ba030114d9dd79729cb25b80829 -
Trigger Event:
push
-
Statement type: