Framework-free backend RAG: deterministic dual-channel retrieval + agent orchestration with built-in anti-fabrication and source provenance.

These details have not been verified by PyPI

Project links

Project description

RAGSpine

The framework-free backbone for backend RAG. Deterministic dual-channel retrieval and agent orchestration, with anti-fabrication and source provenance built in — no Dify, no LangGraph, no DSL. Just composable Python.

Python Tests

What is RAGSpine?

RAGSpine is a backend RAG engine you assemble in plain Python — not a framework you submit to. Most stacks force a choice between hand-rolled glue and heavyweight orchestration platforms (Dify, LangGraph) that drag in their own runtime, graph DSL, UI, and lock-in. RAGSpine is the middle path: a coherent, batteries-included library of composable parts — retrieval, agent orchestration, document extraction, evaluation, and an HTTP service layer — wired together by ordinary functions and typed Protocols.

It was built for a demanding use case (executive insight Q&A over financial and operational reports) and ships that rigor as first-class, code-enforced invariants:

Never fabricate. When the data isn't there, the orchestrator deterministically refuses ("not found") regardless of what the LLM says. Anti-fabrication lives in the control flow, not in a prompt you hope the model obeys.
Always cite. Every answer carries source lineage (document + locator).
Two channels, one router. A deterministic structured/numeric channel (a fact table + function-calling) answers "what's the number," and a narrative RAG channel (hybrid retrieval + rerank) answers "why / what happened." An agent routes each question — or splits a composite one and runs both.
Everything is pluggable. LLM provider, embedding backend, reranker, OCR, retriever, task queue — all are typed Protocols injected at the edges. The core imports zero SDKs and runs fully offline with a deterministic MockProvider.

Why RAGSpine


No framework lock-in	Pure Python; bring your own everything. Drop into any backend.
Dual-channel	Deterministic numbers + narrative RAG, unified by an agent router.
Anti-fabrication + provenance	Enforced invariants, not prompt suggestions.
Office-document extraction	xlsx / pptx / pdf → structured facts, style- and color-aware — not just text splitting.
Hybrid retrieval	CJK-aware BM25 + injectable vector channel + RRF fusion + optional LLM listwise rerank.
FAQ short-circuit	SME-vetted answers bypass the LLM, behind conservative exclusion guards.
Built-in evaluation	Four-gate metrics (numeric accuracy / citation validity / refusal / clarification) + baseline regression gating.
Async ingestion	FastAPI service + RQ/Redis job queue, worker-owned resources.
Privacy-aware observability	Traces carry codes/counts/timings only — never answer, fact, or chunk text.

Architecture

A deep, domain-grouped package layout — find the file by folder before you read a name.

src/ragspine/
├── common/         cross-cutting: company profile, sensitivity, glossary, observability
├── extraction/     documents → a frozen StyledGrid intermediate representation (IR)
│   ├── extractors/   xlsx / pptx / pdf (digital + scanned/OCR), style- & color-aware
│   ├── routing/      per-page PDF triage (digital vs scanned vs export)
│   ├── color/        controlled color-semantics registry
│   └── verification/ dual-channel cross-check → review queue
├── ingestion/      IR/text → stores
│   ├── structured/   fact ingestion + batch manifest ledger (idempotent)
│   ├── narrative/    document chunk ingestion + extraction
│   └── review/       human review-queue state machine (SME)
├── storage/        fact store (numeric) + chunk store (narrative), sqlite, full lineage
├── retrieval/      narrative RAG
│   ├── chunking/     paragraph-granular chunker + versioned chunk store
│   ├── lexical/      Okapi BM25 (CJK uni+bigram) + RRF fusion
│   ├── vector/       injectable embedding backends (default: none = pure BM25)
│   ├── rerank/       LLM listwise reranker (RRF-fallback)
│   └── link/         adapter wiring retrieval into the agent (strips RESTRICTED at exit)
├── agent/          intent parsing, clarification gateway, tool-use loop, llm provider
├── eval/           QA + extraction evaluation harnesses with baseline gates
└── service/        FastAPI app, RQ task queue, ingestion jobs, FAQ short-circuit cache

Request flow

question
  → intent parse (metric / entity / period / channel)
  → clarification gate ──(ambiguous)→ ask  ──(out-of-scope entity)→ refuse
  → FAQ short-circuit (service edge) ──(vetted hit)→ cached answer + provenance
  → route:
       structured → function-calling over the fact store → found / not_found / unrecognized
       narrative  → hybrid retrieve → listwise rerank → synthesize with citations
       composite  → run both, compare, merge
  → answer + sources   (anti-fabrication guard rewrites to "not found" if no fact)

Install

pip install rag-spine      # distribution name is hyphenated; the import is:  import ragspine

Optional extras:

Extra	Pulls in	For
`[service]`	fastapi, uvicorn, rq, redis, httpx	the HTTP + async-queue layer
`[pdf]`	docling	digital-PDF table extraction
`[ocr]`	paddleocr	scanned-PDF OCR (Linux + NVIDIA GPU)
`[llm]`	anthropic, openai	real LLM providers (lazy-imported)
`[embed]`	sentence-transformers	real embedding models for the vector channel
`[dev]`	pytest, reportlab, markdown	tests + fixture generation

From source

git clone https://github.com/VoldemortGin/ragspine.git && cd ragspine
uv venv .venv
VIRTUAL_ENV="$(pwd)/.venv" uv pip install -e ".[dev,service]"

Quickstart

1. End-to-end demo on synthetic data — offline, no API key:

.venv/bin/python scripts/run_demo.py        # → ALL CHECKS PASSED

2. Ask a question (offline deterministic MockProvider):

.venv/bin/python scripts/ask.py --provider mock --db data/fact_metric.db "中国内地FY2024的REVENUE是多少"
# → ACME_CN FY2024 REVENUE 为 1320 USD_M（来源：ACME_FY2024_Review.pptx · slide=2,table=1,row=REVENUE,col=FY2024）

Ask for something the data doesn't have and you get an honest refusal, never a guess:

.venv/bin/python scripts/ask.py --provider mock --db data/fact_metric.db "中国内地FY2025的REVENUE是多少"
# → 查不到：REVENUE / ACME_CN / 2025 …未在事实表中找到。为避免误导，不提供任何推测数字。

3. Python API:

from ragspine.agent.agent import answer_question
from ragspine.agent.llm_provider import MockProvider
from ragspine.storage.fact_store import FactStore

store = FactStore("data/fact_metric.db"); store.init_schema()
result = answer_question("中国内地FY2024的REVENUE是多少", store, MockProvider())
print(result.answer)     # deterministic value, or an honest "not found"
print(result.sources)    # [{'doc': ..., 'locator': ...}]

4. HTTP service + async ingestion:

# API
RAGSPINE_DB_PATH=data/fact_metric.db .venv/bin/python scripts/run_server.py --port 8000
curl -s localhost:8000/v1/ask -H 'content-type: application/json' \
     -d '{"question":"中国内地FY2024的REVENUE是多少"}'

# worker (needs Redis) — ingestion jobs run out-of-process
RAGSPINE_REDIS_URL=redis://localhost:6379/0 .venv/bin/python scripts/run_worker.py

Endpoints: GET /healthz, GET /readyz, POST /v1/ask, POST /v1/ingest/structured/jobs, POST /v1/ingest/narrative/jobs, GET /v1/jobs/{id}.

Core concepts

Structured channel — every number lives in a fact_metric table with full lineage (source_doc_id + source_locator). A glossary normalizes ZH/EN/abbrev synonyms to controlled metric/entity/period codes (returns None rather than guessing). A function-calling query_metric tool returns found / not_found / unrecognized — and the agent never lets the model invent a number.
Narrative channel — chunking → hybrid retrieval (BM25 + injectable vector + RRF k=60 + glossary multi-query) → optional LLM listwise_rerank → synthesis with citations. RESTRICTED-tier content is filtered at two exits before it can reach a prompt.
Agent — four-slot intent parse → clarification gateway (answer-first, expose assumptions, one-click narrow) → route → anti-fabrication guard.
Ingestion — extractors emit a frozen StyledGrid IR; structured & narrative ingestion are hash-idempotent; low-confidence/conflicting items go to a human review queue (distinct from the async job queue).
FAQ cache — SME-vetted Q→A short-circuit with conservative exclusions: structured numeric questions, competitor/external entities, real-time queries, expired, disabled, and RESTRICTED items never short-circuit.
Config — ServiceConfig (env-driven, RAGSPINE_*) + CompanyProfile (config/company.example.toml → copy to config/company.toml).

Extension points (just implement a Protocol)

LLMProvider · EmbeddingBackend · ListwiseJudge · OcrBackend · NarrativeRetriever · TaskQueue — implement and inject. The core depends on the abstraction, never the SDK, so adding a provider / vector store / reranker / OCR engine touches one new file.

Configuration

Env var	Default	Purpose
`RAGSPINE_DB_PATH`	`data/fact_metric.db`	fact (numeric) store
`RAGSPINE_CHUNK_DB_PATH`	(unset → narrative degrades honestly)	narrative chunk store
`RAGSPINE_PROVIDER`	`mock`	`mock` \| `anthropic`
`RAGSPINE_MODEL` / `RAGSPINE_BASE_URL`	—	model + enterprise-gateway override
`RAGSPINE_REDIS_URL`	`redis://localhost:6379/0`	RQ job queue
`RAGSPINE_FAQ_SOURCE`	—	path to the FAQ JSON
`RAGSPINE_ALLOWED_UPLOAD_ROOT`	—	ingestion path allowlist (rejects traversal)

Testing

.venv/bin/python -m pytest tests/ -q        # 943 passed, 1 gpu-skipped

The project is test-driven: tests are the spec. The gpu marker gates real-OCR integration tests to a Linux + NVIDIA GPU box; everything else runs anywhere.

Continuous integration (local)

CI runs on your machine, not on GitHub Actions. scripts/ci.sh is the gate (full test suite, gpu-excluded, + demo smoke), and a pre-push hook enforces it so red code never gets pushed:

scripts/ci.sh                        # run the gate manually
git config core.hooksPath .githooks  # enable the pre-push gate (once per clone)

.github/workflows/ci.yml is included but dormant — manual-trigger only — so it consumes zero Actions minutes. Uncomment its push: / pull_request: triggers to enable server-side CI; it runs the exact same scripts/ci.sh. Lint / type-check (scripts/lint.sh, ruff + mypy) is opt-in and informational for now (the inherited code predates linting).

Demo data

The bundled demo uses a fictional company (ACME), synthetic figures, and a fictional competitor set — all generated by scripts/make_*.py (regenerable, deterministic). The version-controlled evaluation sets live under data/golden/. Nothing here is real-world data.

Status & roadmap

Solid: structured channel, narrative hybrid retrieval, agent orchestration, office extraction (xlsx/pptx/pdf), FastAPI + RQ service, FAQ cache, evaluation harness, 943 tests.

Honest gaps (contributions welcome): the vector channel ships as an injectable channel — the default is BM25-only, and real embedding models run behind the [embed]/GPU extras; there is no persisted ANN index yet. Pipeline-topology export (.topology() → Mermaid/DOT/JSON, plus scripts/topology.py) now ships — see src/ragspine/pipeline/.

License

Apache License 2.0. See NOTICE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Jun 18, 2026

0.1.1

Jun 18, 2026

0.1.0

Jun 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_spine-0.1.2.tar.gz (756.1 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_spine-0.1.2-py3-none-any.whl (206.3 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file rag_spine-0.1.2.tar.gz.

File metadata

Download URL: rag_spine-0.1.2.tar.gz
Upload date: Jun 18, 2026
Size: 756.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for rag_spine-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`fd9f00f2e3a21cdd0bf5ace4443175c9fbb6a9cd202ae7aea047340057b4212d`
MD5	`212c5fb2ce94758465db7da8a859671c`
BLAKE2b-256	`9551979f9583cad820a5bcea8aebbfb0af64636b049a1d8e8925363e4acb04da`

See more details on using hashes here.

File details

Details for the file rag_spine-0.1.2-py3-none-any.whl.

File metadata

Download URL: rag_spine-0.1.2-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 206.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for rag_spine-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4df1e90246142fb2d0f95d20907e992c6a889e0bb877c527f486d95aad735752`
MD5	`840c8a12abaf5408eef45566e0e72141`
BLAKE2b-256	`f727e347c73bea04a9b685bc55e45d4e7f4c1469ff20e3db7e49aad0f1a409d9`

See more details on using hashes here.

rag-spine 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAGSpine

What is RAGSpine?

Why RAGSpine

Architecture

Install

Quickstart

Core concepts

Extension points (just implement a Protocol)

Configuration

Testing

Continuous integration (local)

Demo data

Status & roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes