Framework-free backend RAG: deterministic dual-channel retrieval + agent orchestration with built-in anti-fabrication and source provenance.
Project description
RAGSpine
The framework-free backbone for backend RAG. Deterministic dual-channel retrieval and agent orchestration, with anti-fabrication and source provenance built in — no Dify, no LangGraph, no DSL. Just composable Python.
What is RAGSpine?
RAGSpine is a backend RAG engine you assemble in plain Python — not a framework you
submit to. Most stacks force a choice between hand-rolled glue and heavyweight
orchestration platforms (Dify, LangGraph) that drag in their own runtime, graph DSL, UI,
and lock-in. RAGSpine is the middle path: a coherent, batteries-included library of
composable parts — retrieval, agent orchestration, document extraction, evaluation, and an
HTTP service layer — wired together by ordinary functions and typed Protocols.
It was built for a demanding use case (executive insight Q&A over financial and operational reports) and ships that rigor as first-class, code-enforced invariants:
- Never fabricate. When the data isn't there, the orchestrator deterministically refuses ("not found") regardless of what the LLM says. Anti-fabrication lives in the control flow, not in a prompt you hope the model obeys.
- Always cite. Every answer carries source lineage (document + locator).
- Two channels, one router. A deterministic structured/numeric channel (a fact table + function-calling) answers "what's the number," and a narrative RAG channel (hybrid retrieval + rerank) answers "why / what happened." An agent routes each question — or splits a composite one and runs both.
- Everything is pluggable. LLM provider, embedding backend, reranker, OCR, retriever,
task queue — all are typed
Protocols injected at the edges. The core imports zero SDKs and runs fully offline with a deterministicMockProvider.
Why RAGSpine
| No framework lock-in | Pure Python; bring your own everything. Drop into any backend. |
| Dual-channel | Deterministic numbers + narrative RAG, unified by an agent router. |
| Anti-fabrication + provenance | Enforced invariants, not prompt suggestions. |
| Office-document extraction | xlsx / pptx / pdf → structured facts, style- and color-aware — not just text splitting. |
| Hybrid retrieval | CJK-aware BM25 + injectable vector channel + RRF fusion + optional LLM listwise rerank. |
| FAQ short-circuit | SME-vetted answers bypass the LLM, behind conservative exclusion guards. |
| Built-in evaluation | Four-gate metrics (numeric accuracy / citation validity / refusal / clarification) + baseline regression gating. |
| Async ingestion | FastAPI service + RQ/Redis job queue, worker-owned resources. |
| Privacy-aware observability | Traces carry codes/counts/timings only — never answer, fact, or chunk text. |
Architecture
A deep, domain-grouped package layout — find the file by folder before you read a name.
src/ragspine/
├── common/ cross-cutting: company profile, sensitivity, glossary, observability
├── extraction/ documents → a frozen StyledGrid intermediate representation (IR)
│ ├── extractors/ xlsx / pptx / pdf (digital + scanned/OCR), style- & color-aware
│ ├── routing/ per-page PDF triage (digital vs scanned vs export)
│ ├── color/ controlled color-semantics registry
│ └── verification/ dual-channel cross-check → review queue
├── ingestion/ IR/text → stores
│ ├── structured/ fact ingestion + batch manifest ledger (idempotent)
│ ├── narrative/ document chunk ingestion + extraction
│ └── review/ human review-queue state machine (SME)
├── storage/ fact store (numeric) + chunk store (narrative), sqlite, full lineage
├── retrieval/ narrative RAG
│ ├── chunking/ paragraph-granular chunker + versioned chunk store
│ ├── lexical/ Okapi BM25 (CJK uni+bigram) + RRF fusion
│ ├── vector/ injectable embedding backends (default: none = pure BM25)
│ ├── rerank/ LLM listwise reranker (RRF-fallback)
│ └── link/ adapter wiring retrieval into the agent (strips RESTRICTED at exit)
├── agent/ intent parsing, clarification gateway, tool-use loop, llm provider
├── eval/ QA + extraction evaluation harnesses with baseline gates
└── service/ FastAPI app, RQ task queue, ingestion jobs, FAQ short-circuit cache
Request flow
question
→ intent parse (metric / entity / period / channel)
→ clarification gate ──(ambiguous)→ ask ──(out-of-scope entity)→ refuse
→ FAQ short-circuit (service edge) ──(vetted hit)→ cached answer + provenance
→ route:
structured → function-calling over the fact store → found / not_found / unrecognized
narrative → hybrid retrieve → listwise rerank → synthesize with citations
composite → run both, compare, merge
→ answer + sources (anti-fabrication guard rewrites to "not found" if no fact)
Install
pip install rag-spine # distribution name is hyphenated; the import is: import ragspine
Optional extras:
| Extra | Pulls in | For |
|---|---|---|
[service] |
fastapi, uvicorn, rq, redis, httpx | the HTTP + async-queue layer |
[pdf] |
docling | digital-PDF table extraction |
[ocr] |
paddleocr | scanned-PDF OCR (Linux + NVIDIA GPU) |
[llm] |
anthropic, openai | real LLM providers (lazy-imported) |
[embed] |
sentence-transformers | real embedding models for the vector channel |
[dev] |
pytest, reportlab, markdown | tests + fixture generation |
From source
git clone https://github.com/VoldemortGin/ragspine.git && cd ragspine
uv venv .venv
VIRTUAL_ENV="$(pwd)/.venv" uv pip install -e ".[dev,service]"
Quickstart
1. End-to-end demo on synthetic data — offline, no API key:
.venv/bin/python scripts/run_demo.py # → ALL CHECKS PASSED
2. Ask a question (offline deterministic MockProvider):
.venv/bin/python scripts/ask.py --provider mock --db data/fact_metric.db "中国内地FY2024的REVENUE是多少"
# → ACME_CN FY2024 REVENUE 为 1320 USD_M(来源:ACME_FY2024_Review.pptx · slide=2,table=1,row=REVENUE,col=FY2024)
Ask for something the data doesn't have and you get an honest refusal, never a guess:
.venv/bin/python scripts/ask.py --provider mock --db data/fact_metric.db "中国内地FY2025的REVENUE是多少"
# → 查不到:REVENUE / ACME_CN / 2025 …未在事实表中找到。为避免误导,不提供任何推测数字。
3. Python API:
from ragspine.agent.agent import answer_question
from ragspine.agent.llm_provider import MockProvider
from ragspine.storage.fact_store import FactStore
store = FactStore("data/fact_metric.db"); store.init_schema()
result = answer_question("中国内地FY2024的REVENUE是多少", store, MockProvider())
print(result.answer) # deterministic value, or an honest "not found"
print(result.sources) # [{'doc': ..., 'locator': ...}]
4. HTTP service + async ingestion:
# API
RAGSPINE_DB_PATH=data/fact_metric.db .venv/bin/python scripts/run_server.py --port 8000
curl -s localhost:8000/v1/ask -H 'content-type: application/json' \
-d '{"question":"中国内地FY2024的REVENUE是多少"}'
# worker (needs Redis) — ingestion jobs run out-of-process
RAGSPINE_REDIS_URL=redis://localhost:6379/0 .venv/bin/python scripts/run_worker.py
Endpoints: GET /healthz, GET /readyz, POST /v1/ask,
POST /v1/ingest/structured/jobs, POST /v1/ingest/narrative/jobs, GET /v1/jobs/{id}.
Core concepts
- Structured channel — every number lives in a
fact_metrictable with full lineage (source_doc_id+source_locator). Aglossarynormalizes ZH/EN/abbrev synonyms to controlled metric/entity/period codes (returnsNonerather than guessing). A function-callingquery_metrictool returnsfound/not_found/unrecognized— and the agent never lets the model invent a number. - Narrative channel —
chunking→ hybridretrieval(BM25 + injectable vector + RRF k=60 + glossary multi-query) → optional LLMlistwise_rerank→ synthesis with citations.RESTRICTED-tier content is filtered at two exits before it can reach a prompt. - Agent — four-slot intent parse → clarification gateway (answer-first, expose assumptions, one-click narrow) → route → anti-fabrication guard.
- Ingestion — extractors emit a frozen
StyledGridIR; structured & narrative ingestion are hash-idempotent; low-confidence/conflicting items go to a human review queue (distinct from the async job queue). - FAQ cache — SME-vetted Q→A short-circuit with conservative exclusions: structured
numeric questions, competitor/external entities, real-time queries, expired, disabled,
and
RESTRICTEDitems never short-circuit. - Config —
ServiceConfig(env-driven,RAGSPINE_*) +CompanyProfile(config/company.example.toml→ copy toconfig/company.toml).
Extension points (just implement a Protocol)
LLMProvider · EmbeddingBackend · ListwiseJudge · OcrBackend · NarrativeRetriever ·
TaskQueue — implement and inject. The core depends on the abstraction, never the SDK, so
adding a provider / vector store / reranker / OCR engine touches one new file.
Configuration
| Env var | Default | Purpose |
|---|---|---|
RAGSPINE_DB_PATH |
data/fact_metric.db |
fact (numeric) store |
RAGSPINE_CHUNK_DB_PATH |
(unset → narrative degrades honestly) | narrative chunk store |
RAGSPINE_PROVIDER |
mock |
mock | anthropic |
RAGSPINE_MODEL / RAGSPINE_BASE_URL |
— | model + enterprise-gateway override |
RAGSPINE_REDIS_URL |
redis://localhost:6379/0 |
RQ job queue |
RAGSPINE_FAQ_SOURCE |
— | path to the FAQ JSON |
RAGSPINE_ALLOWED_UPLOAD_ROOT |
— | ingestion path allowlist (rejects traversal) |
Testing
.venv/bin/python -m pytest tests/ -q # 943 passed, 1 gpu-skipped
The project is test-driven: tests are the spec. The gpu marker gates real-OCR
integration tests to a Linux + NVIDIA GPU box; everything else runs anywhere.
Continuous integration (local)
CI runs on your machine, not on GitHub Actions. scripts/ci.sh is the gate (full test
suite, gpu-excluded, + demo smoke), and a pre-push hook enforces it so red code never gets
pushed:
scripts/ci.sh # run the gate manually
git config core.hooksPath .githooks # enable the pre-push gate (once per clone)
.github/workflows/ci.yml is included but dormant — manual-trigger only — so it consumes
zero Actions minutes. Uncomment its push: / pull_request: triggers to enable server-side
CI; it runs the exact same scripts/ci.sh. Lint / type-check (scripts/lint.sh, ruff + mypy)
is opt-in and informational for now (the inherited code predates linting).
Demo data
The bundled demo uses a fictional company (ACME), synthetic figures, and a fictional
competitor set — all generated by scripts/make_*.py (regenerable, deterministic). The
version-controlled evaluation sets live under data/golden/. Nothing here is real-world data.
Status & roadmap
Solid: structured channel, narrative hybrid retrieval, agent orchestration, office extraction (xlsx/pptx/pdf), FastAPI + RQ service, FAQ cache, evaluation harness, 943 tests.
Honest gaps (contributions welcome): the vector channel ships as an injectable
channel — the default is BM25-only, and real embedding models run behind the [embed]/GPU
extras; there is no persisted ANN index yet. Pipeline-topology export (.topology() →
Mermaid/DOT/JSON, plus scripts/topology.py) now ships — see src/ragspine/pipeline/.
License
Apache License 2.0. See NOTICE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_spine-0.1.1.tar.gz.
File metadata
- Download URL: rag_spine-0.1.1.tar.gz
- Upload date:
- Size: 753.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cf6c75dfddf5390ef10dc467975bff4d5d4d49a2bfa148274102e650770c988
|
|
| MD5 |
737ac1547cbb2fa5955c25da493da69d
|
|
| BLAKE2b-256 |
4c1fabcba1d127b6dbe2ef9432b6eb72fcec5b5adda0404291b52ed95dcca92e
|
File details
Details for the file rag_spine-0.1.1-py3-none-any.whl.
File metadata
- Download URL: rag_spine-0.1.1-py3-none-any.whl
- Upload date:
- Size: 203.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
201745bea0d9e49558b4aa135f845e6973e7360ff694dd9726bb243e4fdc2ec8
|
|
| MD5 |
3dc1d332c5d4b8cd57f411f12e3aef11
|
|
| BLAKE2b-256 |
36a65729b85d3cca8c05522cc31fcc4b73a1c7f61598b6de2928925de363d313
|