A minimalistic RAG system that prevents hallucination by ensuring all generated content is explicitly derived from source documents

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Verbatim RAG

A minimalistic approach to Retrieval-Augmented Generation (RAG) that prevents hallucination by ensuring all generated content is explicitly derived from source documents.

Concept

Traditional RAG systems retrieve relevant documents and then allow an LLM to freely generate responses based on that context. This can lead to hallucinations where the model invents facts not present in the source material.

Verbatim RAG solves this by extracting verbatim text spans from documents and composing responses entirely from these exact passages, with direct citations linking back to sources.

For extraction, we can use LLM-based span extractors or fine-tuned encoder-based models like ModernBERT. We've trained our own ModernBERT model for this purpose, which is available on HuggingFace (we've trained it on the RAGBench dataset).

With this approach, the whole RAG pipeline can be run without any usage of LLMs, and with using SPLADE embeddings, the pipeline can be run entirely on CPU, making it lightweight and efficient.

Installation

# Install the package
pip install verbatim-rag

Quick Start

from verbatim_rag import VerbatimIndex, VerbatimRAG
from verbatim_rag.ingestion import DocumentProcessor

# Process documents with intelligent chunking
processor = DocumentProcessor()

# Process PDFs from URLs
document = processor.process_url(
    url="https://aclanthology.org/2025.bionlp-share.8.pdf",
    title="KR Labs at ArchEHR-QA 2025: A Verbatim Approach for Evidence-Based Question Answering",
    metadata={"authors": ["Adam Kovacs", "Paul Schmitt", "Gabor Recski"]}
)

# Define SPLADE index with a sparse model
index = VerbatimIndex(
    sparse_model="naver/splade-v3", 
    db_path="./index.db"
)
index.add_documents([document])

# Then query the index
rag = VerbatimRAG(index)

response = rag.query("What is the main contribution of the paper?")
print(response.answer)

Environment Setup

Set your OpenAI API key before using the system:

export OPENAI_API_KEY=your_api_key_here

How It Works

Document Processing: Documents are processed using docling for format conversion and chonkie for chunking
Document Indexing: Documents are indexed using vector embeddings (both dense and sparse)
Template Management: Response templates are created and stored for common question types
Query Processing:
- Relevant documents are retrieved
- Key passages are extracted verbatim using either LLM-based or fine-tuned span extractors
- Responses are structured using templates
- Citations link back to source documents

This ensures all responses are grounded in the source material, preventing hallucinations.

Architecture

Core Components

VerbatimRAG (verbatim_rag/core.py): Main orchestrator that coordinates document retrieval, span extraction, and response generation
VerbatimIndex (verbatim_rag/index.py): Vector-based document indexing and retrieval
SpanExtractor (verbatim_rag/extractors.py): Abstract interface for extracting relevant text spans from documents
- LLMSpanExtractor: Uses OpenAI models to identify relevant spans
- ModelSpanExtractor: Uses fine-tuned BERT-based models for span classification
DocumentProcessor (verbatim_rag/ingestion/): Docling + Chonkie integration for intelligent document processing
Document (verbatim_rag/document.py): Core document representation with metadata

Data Flow

Documents are processed and chunked using docling and chonkie
Documents are indexed using vector embeddings
User queries retrieve relevant documents
Span extractors identify verbatim passages that answer the question
Response templates structure the final answer with citations
All responses include exact text spans and document references

Web Interface

The package includes a full web interface with React frontend and FastAPI backend:

# Start API server
python api/app.py

# Start React frontend (in another terminal)
cd frontend/
npm install
npm start

ModernBERT Based Span Extractor

We've trained our own encoder model based on ModernBERT for sentence classification. This model is designed to classify text spans as relevant or not, providing a robust alternative to LLM-based extractors.

You can find our model on HuggingFace: KRLabsOrg/verbatim-rag-modern-bert-v1.

You can use it with the defined index as follows:

from verbatim_rag.core import VerbatimRAG
from verbatim_rag.index import VerbatimIndex
from verbatim_rag.extractors import ModelSpanExtractor

# Load your trained extractor
extractor = ModelSpanExtractor("path/to/your/model")

# Create VerbatimRAG system with custom extractor
index = VerbatimIndex(
    sparse_model="naver/splade-v3", 
    db_path="./index.db"
)

rag_system = VerbatimRAG(
    index=index,
    extractor=extractor,
    k=5
)

# Query the system
response = rag_system.query("Main findings of the paper?")
print(response.answer)

Citation

If you use Verbatim RAG in your research, please cite our paper:

@inproceedings{kovacs-etal-2025-kr,
    title = "{KR} Labs at {A}rch{EHR}-{QA} 2025: A Verbatim Approach for Evidence-Based Question Answering",
    author = "Kovacs, Adam  and
      Schmitt, Paul  and
      Recski, Gabor",
    editor = "Soni, Sarvesh  and
      Demner-Fushman, Dina",
    booktitle = "Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)",
    month = aug,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.bionlp-share.8/",
    pages = "69--74",
    ISBN = "979-8-89176-276-3",
    abstract = "We present a lightweight, domain{-}agnostic verbatim pipeline for evidence{-}grounded question answering. Our pipeline operates in two steps: first, a sentence-level extractor flags relevant note sentences using either zero-shot LLM prompts or supervised ModernBERT classifiers. Next, an LLM drafts a question-specific template, which is filled verbatim with sentences from the extraction step. This prevents hallucinations and ensures traceability. In the ArchEHR{-}QA 2025 shared task, our system scored 42.01{\%}, ranking top{-}10 in core metrics and outperforming the organiser{'}s 70B{-}parameter Llama{-}3.3 baseline. We publicly release our code and inference scripts under an MIT license."
}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.6

May 18, 2026

0.2.5

Apr 27, 2026

0.2.4

Apr 27, 2026

0.2.3

Mar 22, 2026

0.2.2

Mar 22, 2026

0.2.1

Mar 16, 2026

0.2.0

Mar 10, 2026

0.1.9

Jan 19, 2026

0.1.8

Dec 5, 2025

0.1.7

Nov 17, 2025

0.1.6

Nov 3, 2025

0.1.5

Oct 8, 2025

0.1.4

Sep 21, 2025

0.1.3

Sep 20, 2025

0.1.2

Sep 3, 2025

0.1.1

Jul 28, 2025

This version

0.1.0

Jul 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verbatim_rag-0.1.0.tar.gz (235.1 kB view details)

Uploaded Jul 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

verbatim_rag-0.1.0-py3-none-any.whl (52.6 kB view details)

Uploaded Jul 26, 2025 Python 3

File details

Details for the file verbatim_rag-0.1.0.tar.gz.

File metadata

Download URL: verbatim_rag-0.1.0.tar.gz
Upload date: Jul 26, 2025
Size: 235.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for verbatim_rag-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9150edf6017ef5e59c59d34a23de1907427e3cddc3a26f78fa72af1e0eb73660`
MD5	`1a897a332189d6a2783567b390e1f92c`
BLAKE2b-256	`92e76b457bdfa6398ae795b29e77149292c63fe274e276c4c8fdb3394d527d4c`

See more details on using hashes here.

File details

Details for the file verbatim_rag-0.1.0-py3-none-any.whl.

File metadata

Download URL: verbatim_rag-0.1.0-py3-none-any.whl
Upload date: Jul 26, 2025
Size: 52.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for verbatim_rag-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f19c6a38492b9bf341430ebcbc59ea3c967c0d83f34af702a7f88b09bc892b08`
MD5	`bfd498395166ca5d766671c6cf91c388`
BLAKE2b-256	`4a2536da6072e6745af944003c1b819e03299e8d2ed73fbc7a252b534da0e0e8`

See more details on using hashes here.

verbatim-rag 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Verbatim RAG

Concept

Installation

Quick Start

Environment Setup

How It Works

Architecture

Core Components

Data Flow

Web Interface

ModernBERT Based Span Extractor

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes