Skip to main content

Open-source, self-hostable, framework-agnostic, multi-modal governed answer layer for agentic Q&A.

Project description

semqa

A governed Semantic Answer Layer for agentic Q&A

Open-source infrastructure that sits above your data, documents, and tools, and lets any chatbot or agent answer natural-language questions — only when it can stand behind the answer, with citations and permissions enforced, and otherwise clarifying or abstaining.

License Python Status Tests Runtime deps

Why it exists · How it works · Install · Examples · The evidence

[!NOTE] Alpha (v0.1.0) — an open-source reference implementation. The trust spine and all five answer modes are implemented and tested (152 passing). See Status for exactly what works today and what it deliberately does not try to solve.

semqa is not a chatbot, not a RAG wrapper, and not a text-to-SQL tool. The way to think about it: as Cube is to metrics, semqa is to governed agentic Q&A — a horizontal layer you point at your own world, callable from any framework.


Why it exists (the thinking)

The naive way to build "chat with your data" — let an LLM write SQL, or retrieve some chunks and summarize — produces impressive demos and unreliable products. It fails in production not because the model is unintelligent, but because of how it fails: it returns confident, silently-wrong answers. The query runs, a number comes back, and the number is quietly incorrect. A stale policy gets cited as current. A user sees data they shouldn't. The system answers a question it had no business answering.

The expensive realization is that the hard part of enterprise Q&A is not generating an answer — it's knowing whether you should. Which source is authoritative? Is it stale? Is this user allowed to see it? Is there actually enough evidence, or is the model guessing? Should the honest response be a clarifying question, or "I don't know"?

Almost every tool in this space optimizes for generating an answer (SQL, a chart, a dashboard, a paragraph). Very few make abstention, sufficiency, verification, and citation quality the core product primitive. That gap is the entire point of semqa:

Trust is the primary product surface — not accuracy as a supporting feature. The differentiator is boundary behavior: knowing what it cannot answer.


The core ideas

A few principles shape every decision:

  1. The governed answer layer is the show-runner; a query mechanism is never the star. semqa decides what kind of question this is, whether it can be answered, and how to govern it. Text-to-SQL is one mode — a handler for the structured-data slice — and even there a governed semantic query is preferred over raw free-form SQL. Execution is delegated (to Cube, Wren, SQLite, …), never rebuilt.

  2. The LLM proposes; deterministic code disposes. The LLM is constrained to the smallest fuzzy job — mapping a natural-language question to governed concept names. Everything that must be correct or governed — routing, authorization, sufficiency, verification, grounding, citation — is deterministic, typed, testable code. (We verified this matters: a small local model gave sloppy, sometimes wrong selections, yet outcomes were correct because the typed governance layer rejected its bad guesses.)

  3. Multi-modal by construction. A real user asks metric questions, policy questions, how-to questions, and diagnostic questions — most of which are not SQL at all. semqa routes each to the right mode behind one interface.

  4. Open, self-hostable, and provider-neutral. No proprietary lock-in: run it with a local model, a cloud model, or a gateway — your choice, your data residency.


How it works

Every question flows through a deterministic trust spine:

intake → route → authorize → collect evidence → sufficiency gate → verify → ground / clarify / abstain (with citations)
  • route — an intake step maps the question onto the governed model (today rule-based or LLM-backed; the LLM only picks concept names).
  • authorize — identity comes from a verified, signed token (never the prompt); restricted concepts are never even shown to the model, and row-level security is enforced at the source.
  • collect — a pluggable SemanticSource returns evidence: a metric source compiling a typed request to real SQL, a document source doing authority- and freshness-aware retrieval, and more later — all behind one interface, routed by mode.
  • sufficiency / verify — deterministic checks decide whether there is enough trustworthy evidence; if not, the system clarifies or abstains instead of fabricating.
  • ground — the answer is built strictly from the evidence, with citations and an explicit interpretation of what was measured.

The result is one of four first-class outcomes: answered (cited), clarify, abstained, or refused — never a confident guess.


Installation

# from source (today)
git clone https://github.com/pankajniet/semqa && cd semqa
uv sync                 # or: pip install -e .

# from PyPI (after the first release)
pip install semqa

The core depends only on pydantic. A model is optional — with no API key and no network, semqa falls back to a deterministic resolver, so you can run everything below (and the demos) entirely offline.


Provider & model neutrality

semqa leads with the open OpenAI-compatible API as the common surface, so the same adapter reaches OpenAI cloud and every local server (Ollama, vLLM, llama.cpp, LM Studio) with just a base_url. Local and cloud are equally first-class; the adopter chooses. Per-stage hybrid (a cheap/local model to route, a stronger model to ground) is the cost sweet spot.

Gateways like LiteLLM (self-hosted), OpenRouter, or Portkey are configuration, not code — point LLM_BASE_URL at the gateway and you get 100+ providers plus routing, fallbacks, and budgets, with no per-provider SDKs baked into semqa.

# pick any: a cloud key, a local model, a gateway — or nothing (deterministic fallback)
export OPENAI_API_KEY=sk-...                              # cloud
export LLM_BASE_URL=http://localhost:11434/v1            # local Ollama / vLLM / gateway
export LLM_MODEL=llama3.2
from semqa import Engine, auto_resolver, demo_source, context_for

engine = Engine(demo_source(), secret="...", resolver=auto_resolver())
answer = engine.ask("how are active users trending this month?",
                    context_for("...", subject="alice", tenant="acme", roles=["analyst"]))
print(answer.status, answer.text, answer.citations)

Run the demos across all modes and outcomes:

uv run python examples/use_cases.py            # six real-world use cases, real governed outputs
uv run python examples/quickstart.py           # smaller; uses your LLM if configured, else deterministic
uv run python -m semqa.eval.scenario_live      # a realistic SaaS scenario through a live local model

What makes it different

The strong players — Cube, Wren, dbt, Snowflake Cortex, Databricks Genie — are excellent at structured-data analytics, and semqa sits above and delegates to them rather than competing. But they are largely structured-data only and platform-locked, and they optimize for generating an answer. The gap semqa aims at is the intersection none of them occupy: open, self-hostable, vendor-neutral, multi-modal (including graph), and trust-first, with abstention, verification, and citation as the core primitive. The bet is that governance, provenance, and calibrated abstention — the trust layer, not autonomy and not raw accuracy alone — are what make agentic Q&A adoptable in the enterprise. That bet is grounded in real, sourced production failures — Microsoft Copilot oversharing, Uber QueryGPT hallucinations, the Air Canada chatbot ruling, the ~11% real RCA solve rate, buyer surveys of 600–1,006 orgs — collected in docs/validated-problems.md.


Design principles (in the code)

  • LLM proposes, code disposes — minimize the model's surface; deterministic code owns correctness.
  • Closed-vocabulary, validated output — the model can only reference defined concepts; anything off-list is rejected.
  • Ports & adapters — narrow Resolver / SemanticSource / LLMClient seams; swap provider or backend without touching the spine.
  • Trust-first, safe-by-default — abstain/clarify are first-class; helpfulness is the opt-in, never the default.
  • Defense in depth — never trust the prompt for identity or authorization; enforce at every layer.
  • Bounded, verified loops — deterministic checks first; bounded retries; no unbounded autonomy.
  • Delegate, don't rebuild — real engines behind SemanticSource; gateways for providers.
  • Evals first-class — measure outcomes (execution, abstention, RLS correctness), not vibes.

Notes on the thinking

A few convictions behind the design — and a few things we deliberately don't do:

  • We reframed the question. "Can we build a chatbot over data?" is the wrong question — it leads to demos. The right one is: "can we build a governed layer that decides what to answer, from which authoritative source, for which user, and when to say no?" Once the question is about governance and evidence rather than generation, most of the design falls out on its own.

  • We bet on trust over autonomy. A lot of the agentic-AI energy is about giving models more freedom. For the enterprise we bet the opposite: bounded loops, abstention over guessing, and a governed substrate the model cannot escape. A confident wrong answer is worse than an honest "I don't know" — so "I don't know" is a first-class outcome, not a failure.

  • We let the evidence correct us. The first time we ran a real local model end-to-end, it was sloppy — it over-added fields, mis-filed a policy as a metric, and once picked "active users" for a question about the weather. The outcomes were still correct, because the typed governance layer rejected the junk. That was the most useful result we got: it showed exactly where to place trust (deterministic code) and where not to (the model's raw output). We try to observe what the system actually does, not infer it from the outcome.

  • We delegate instead of rebuilding. The structured-query problem is already well-solved by Cube, Wren, dbt, and a plain SQL engine; rebuilding it would only produce a worse version. Our value is the governed, multi-modal, trust-first layer above them — so the metric mode delegates execution, and we spend our effort on routing, modes, and the trust layer.

  • We don't privilege a provider — in either direction. Defaulting to one cloud model is a bias; swinging to "local only" is the same bias inverted. The neutral truth is an open interface where local and cloud are equally first-class and the adopter chooses — so we lead with the open standard everyone already implements.

  • We try to stay calibrated. We keep what we measured separate from what we modeled separate from what we assumed; we verify claims against primary sources; and we treat "it worked five times" as a smoke test, not validation. The honest open question here isn't technical — it's whether a real team needs this enough to adopt it, and only real conversations answer that. We'd rather say that plainly than oversell.


Status

Early but real, and honestly scoped — an open-source reference implementation, not a battle-tested product.

Working today (152 tests green; builds + installs as a pydantic-only package):

  • The full trust spine with per-stage observability, an explicit tunable sufficiency/abstain gate, and a typed reason on every non-answer.
  • Five governed modes: metric (real SQL via SQLite + a Cube delegate adapter), document/policy (lexical and dense hybrid retrieval, authority→relevance→freshness ranking, staleness abstention, competing-source surfacing, a verified-answer repository), tool/live-status (read-only, freshness-stamped, read/write partition), graph (multi-hop, adaptively gated, node-level authz), and diagnostic/RCA (bounded, correlational, confidence-capped — never a confident root cause).
  • A framework-agnostic surface — a pure handler + a contract-derived tool schema + a zero-dependency HTTP server — so any chatbot can call it over the wire.
  • Provider-neutral resolver run live end-to-end against a local model; a realistic SaaS scenario passing 21/21 deterministically and 21/21 live.

What it does NOT solve (be clear): it does not fix the raw text-to-SQL accuracy cliff (it converts confident-wrong answers into abstentions); it does not stop prompt injection (it bounds the blast radius via read-only / Rule-of-Two); it inherits the curation/governance setup cost; and it depends on — but does not itself fix — upstream index freshness and source-data quality.

The honest open question is not technical — it is whether a real team needs this enough to adopt it. The build is validated against a realistic scenario and a real local model, not yet against a real adopter's data. The problems it targets are real and sourced — see docs/validated-problems.md.

Deliberately deferred (not built ahead of demand): production MCP/FastAPI server wrappers; identity → OAuth 2.1 + on-behalf-of; an LLM grounder + citation-faithfulness; more backend adapters (Wren/dbt/Neo4j). See TODO.md.

Stack: Python 3.11+, pydantic v2, zero other runtime dependencies. Licensed Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semqa-0.1.0.tar.gz (103.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semqa-0.1.0-py3-none-any.whl (65.1 kB view details)

Uploaded Python 3

File details

Details for the file semqa-0.1.0.tar.gz.

File metadata

  • Download URL: semqa-0.1.0.tar.gz
  • Upload date:
  • Size: 103.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for semqa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ad09da1a5fbc3561a0bf254be925cb3f9cb569e87dcba2a3eeb0c3c417e872a1
MD5 56a4eac581e42acb0fcc248d437c1159
BLAKE2b-256 a85eca22fbce6614d37be33c1f4935ab933fe244b0a0275d2e8310010c9eb385

See more details on using hashes here.

File details

Details for the file semqa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: semqa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 65.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for semqa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b852db3b7bc429fa6d24951696a06c2c5e709ce5f62929409546825355c41d3
MD5 f1c2c527c891babccdb18a1a9838e52b
BLAKE2b-256 cafbb1a2607ace3df671c6b233d43140b6732697ef8d10892d596f1852338579

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page