RAG pipeline for omnidoc-sdk — intent-aware chunking, evaluation, streaming, graph linking, and vector DB integrations

These details have not been verified by PyPI

Project links

Project description

omnidoc-rag

Intent-aware RAG pipeline for the OmniDoc document intelligence ecosystem

Python 3.9+ v0.1.0 Apache 2.0 6 intent types 4 vector DB adapters

What is omnidoc-rag?

omnidoc-rag is the companion RAG SDK for omnidoc-sdk. It takes the clean Document objects produced by the extraction layer and turns them into vector-DB-ready semantic chunks with:

Intent classification — 6 canonical intent types (metric, table, process, value_proposition, heading, narrative)
Adaptive chunking — token budget varies by intent; overlap preserves context across boundaries
Deterministic confidence scoring — per-chunk quality signal for retrieval ranking
Streaming — true lazy generator that emits one chunk at a time
Retrieval evaluation — query-term coverage, source diversity, verdict
Graph linking — NEXT / SAME_INTENT / METRIC_OF edges between chunks
Cross-document stitching — merge equivalent sections from multiple documents
Vector DB adapters — ChromaDB, Pinecone, Weaviate, PostgreSQL/pgvector

Installation
Quick Start
Intent Types
Chunking
Streaming
Confidence Scoring
Retrieval Evaluation
Graph Linking
Cross-Document Stitching
Vector DB Adapters
Schema Reference
Optional Extras
Contributing & Development
Changelog

Installation

Core (no vector DB)

pip install omnidoc-rag

With ChromaDB

pip install "omnidoc-rag[chroma]"

With Pinecone

pip install "omnidoc-rag[pinecone]"

With Weaviate

pip install "omnidoc-rag[weaviate]"

With PostgreSQL / pgvector

pip install "omnidoc-rag[pgvector]"

Everything

pip install "omnidoc-rag[all]"

Quick Start

from omnidoc.loader.load import load_document      # omnidoc-sdk
from omnidocrag.chunker import chunk_document
from omnidocrag.evaluation import evaluate_rag_result

# 1. Extract
doc = load_document("investor_deck.pdf")

# 2. Chunk
chunks = chunk_document(doc)

for c in chunks:
    print(f"[{c.intent:<18}] conf={c.confidence:.2f}  p{c.page}  {c.text[:80]}")

# 3. Evaluate a retrieval result
result = evaluate_rag_result(
    query="What was the revenue growth rate?",
    answer="Revenue grew 24% year-over-year to $4.2B.",
    chunks=chunks,
)
print(result["overall"], result["verdict"])   # 0.87  excellent

Intent Types

Every chunk is labelled with one of six canonical intents. The intent drives chunk sizing and confidence scoring.

Intent	Token budget	Typical content
`heading`	60	Section/slide titles
`metric`	150	KPIs, financial figures, percentages
`table`	200	One row from an extracted table
`value_proposition`	250	Benefits, ROI claims, competitive statements
`narrative`	350	Prose, analysis, background paragraphs
`process`	400	Numbered steps, workflows, procedures

Classification uses a deterministic regex classifier — no LLM call required:

from omnidocrag.intent import classify_intent

classify_intent("Revenue grew 24% YoY to $4.2B")      # "metric"
classify_intent("Step 1: Configure the API key")       # "process"
classify_intent("This solution reduces costs by 30%")  # "value_proposition"
classify_intent("EXECUTIVE SUMMARY")                   # "heading"
classify_intent("The company was founded in 2012.")    # "narrative"

Chunking

chunk_document converts a Document object into a list of SemanticChunk dataclasses.

from omnidocrag.chunker import chunk_document

chunks = chunk_document(
    doc,
    overlap_chars=100,   # characters carried from previous chunk (default: 100)
    min_chars=20,        # discard chunks shorter than this (default: 20)
)

What it does

Iterates over doc.sections line by line
Detects headings — flushes the current buffer and starts a new heading chunk
Uses classify_intent() on the first line of each new buffer to set the intent
Token budget comes from tokens_for_intent(intent) (see intent table above)
When the buffer exceeds the budget it is flushed as a chunk; last overlap_chars characters carry over
Each row of doc.tables becomes a separate metric chunk with a header prefix
Chunk IDs are SHA1 hashes of source + page + text — deterministic and reproducible

Chunk fields

chunk.id           # str — SHA1 deterministic ID
chunk.text         # str — chunk content
chunk.intent       # str — one of the 6 intent types
chunk.confidence   # float — 0.1 … 1.0
chunk.page         # int — source page number
chunk.heading      # str | None — nearest heading above this chunk
chunk.keywords     # List[str] — BM25-weighted non-stopword terms
chunk.metadata     # dict — source, chunk_index, char_length, embedding_hint

chunk.to_dict()    # → dict, ready for JSON / vector DB

Streaming

stream_chunks is a true Python generator — chunks are computed and emitted one at a time without building a full list first. Use this for large documents or memory-constrained environments.

from omnidocrag.stream import stream_chunks

for chunk in stream_chunks(doc, overlap_chars=100):
    # process or upsert each chunk immediately
    print(chunk.text[:80])

The generator produces the same chunks as chunk_document (same algorithm, same overlap logic) but yields each one lazily.

Confidence Scoring

score_chunk returns a float in [0.1, 1.0] — a deterministic quality signal based on text density, length, and structure.

from omnidocrag.confidence import score_chunk

score_chunk("Revenue grew 24% YoY to $4.2B in fiscal 2024.")   # ≥ 0.8
score_chunk("ROI: 38%", intent="metric")                        # ≥ 0.7 (not penalised for short length)
score_chunk("See below")                                        # < 0.7
score_chunk("")                                                  # 0.1 (floor)

Scoring factors:

Length bonus (scaled up to ≥ 100 chars)
Multi-line bonus (≥ 4 lines)
Dense-fact pattern bonus (financial terms, percentages, currency)
Short-text penalty — not applied to metric, table, or value_proposition intents

Retrieval Evaluation

Score a RAG result without an LLM. All logic is deterministic and runs locally.

from omnidocrag.evaluation import evaluate_rag_result

result = evaluate_rag_result(
    query="What was the EBITDA margin in fiscal 2024?",
    answer="EBITDA margin reached 28% driven by cost efficiencies.",
    chunks=chunks,                   # List[SemanticChunk] retrieved
)

Return value

{
    "overall":          0.84,        # composite score 0.0 – 1.0
    "coverage":         0.75,        # fraction of query terms found in chunks
    "confidence":       0.91,        # average chunk confidence
    "source_diversity": 3,           # unique pages used
    "chunks_used":      4,           # number of chunks evaluated
    "verdict":          "good",      # "excellent" | "good" | "weak" | "unsafe"
    "missing_terms":    ["ebitda"],  # query terms absent from chunks
}

Verdict thresholds

Verdict	Condition
`unsafe`	No chunks provided
`weak`	overall < 0.5
`good`	overall < 0.75
`excellent`	overall ≥ 0.75

Graph Linking

Build a lightweight in-memory knowledge graph from a list of chunks.

from omnidocrag.graph import link_chunks

graph = link_chunks(chunks, source="investor_deck.pdf")

graph["nodes"]   # [{"id": ..., "text": ..., "intent": ..., ...}, ...]
graph["edges"]   # [{"from": ..., "to": ..., "relation": ...}, ...]

Edge types

Relation	Description
`NEXT`	Sequential order — every adjacent pair of chunks
`SAME_INTENT`	Consecutive chunks sharing the same intent
`METRIC_OF`	Metric chunk → nearest preceding heading chunk

Cross-Document Stitching

Merge semantically equivalent sections from multiple documents into a single unified chunk set.

from omnidocrag.stitcher import stitch_documents

# Each item: {"metadata": {...}, "chunks": [chunk.to_dict(), ...]}
docs = [
    {"metadata": {"file": "q3_report.pdf"},     "chunks": [c.to_dict() for c in chunks_q3]},
    {"metadata": {"file": "q4_report.pdf"},     "chunks": [c.to_dict() for c in chunks_q4]},
    {"metadata": {"file": "annual_report.pdf"}, "chunks": [c.to_dict() for c in chunks_annual]},
]

merged = stitch_documents(docs, similarity_threshold=0.80)

# Merged chunks have a "sources" list
for chunk in merged:
    if len(chunk["sources"]) > 1:
        print(f"Merged from: {chunk['sources']}  — {chunk['text'][:80]}")

Two chunks are merged when their heading and intent match with SequenceMatcher similarity ≥ similarity_threshold. The merged chunk's text is the highest-confidence version; sources lists all contributing files.

Vector DB Adapters

All four adapters share the same upsert(chunks) / query(query_text) interface and accept either SemanticChunk objects or plain dicts.

ChromaDB

Local or persistent ChromaDB. No embedding function required — ChromaDB provides a built-in one.

from omnidocrag.vectordb.chroma import ChromaAdapter

# In-memory (default)
adapter = ChromaAdapter(collection_name="omnidoc")

# Persistent on disk
import chromadb
client = chromadb.PersistentClient(path="/data/chroma")
adapter = ChromaAdapter(collection_name="omnidoc", client=client)

# Custom embedding function
adapter = ChromaAdapter(
    collection_name="omnidoc",
    embedding_fn=lambda texts: my_model.encode(texts).tolist(),
)

# Upsert
count = adapter.upsert(chunks)          # accepts SemanticChunk or dict

# Query
results = adapter.query(
    query_text="What was the Q3 revenue growth?",
    n_results=5,
    where={"intent": "metric"},         # optional metadata filter
)
for r in results:
    print(r["score"], r["text"][:80])

Requires pip install "omnidoc-rag[chroma]".

Pinecone

Pinecone serverless or pod-based index. Embedding function is required.

from omnidocrag.vectordb.pinecone import PineconeAdapter

adapter = PineconeAdapter(
    index_name="omnidoc-prod",          # must already exist in Pinecone
    embedding_fn=lambda text: model.encode(text).tolist(),
    api_key="pc-xxxxxxxxxxxxx",
    namespace="reports",                # optional
)

# Upsert
count = adapter.upsert(chunks)

# Query
results = adapter.query(query_text="EBITDA margin", top_k=5)
for r in results:
    print(r["score"], r["text"][:80])

Requires pip install "omnidoc-rag[pinecone]".

Weaviate

Weaviate v4 — local instance or Weaviate Cloud (WCD). Collection is created automatically.

from omnidocrag.vectordb.weaviate import WeaviateAdapter

# Local Weaviate (localhost:8080)
adapter = WeaviateAdapter(
    collection_name="OmnidocChunks",
    embedding_fn=lambda text: model.encode(text).tolist(),
)

# Weaviate Cloud (WCD)
adapter = WeaviateAdapter(
    collection_name="OmnidocChunks",
    embedding_fn=lambda text: model.encode(text).tolist(),
    wcd_url="https://my-cluster.weaviate.network",
    wcd_api_key="wcd-api-key",
)

# Existing connected client
import weaviate
client = weaviate.connect_to_local()
adapter = WeaviateAdapter(collection_name="OmnidocChunks", client=client)

# Upsert — idempotent (deterministic UUID per chunk ID)
count = adapter.upsert(chunks)

# Query — vector search when embedding_fn provided, BM25 fallback otherwise
from weaviate.classes.query import Filter
results = adapter.query(
    query_text="revenue growth",
    limit=5,
    filters=Filter.by_property("intent").equal("metric"),  # optional
    certainty=0.7,                                         # min cosine similarity
)
for r in results:
    print(r["score"], r["text"][:80])

# Always close the connection when done
adapter.close()

Requires pip install "omnidoc-rag[weaviate]".

PostgreSQL / pgvector

Stores embeddings in a PostgreSQL table using the vector column type. Table and IVFFlat index are created automatically.

Prerequisites:

-- Run once on your PostgreSQL instance
CREATE EXTENSION IF NOT EXISTS vector;

from omnidocrag.vectordb.pgvector import PgVectorAdapter
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

adapter = PgVectorAdapter(
    embedding_fn=lambda text: model.encode(text).tolist(),
    dsn="postgresql://user:password@localhost:5432/ragdb",
    table="omnidoc_chunks",            # created automatically
    dimensions=384,                    # must match embedding_fn output size
    create_index=True,                 # IVFFlat index for fast ANN search
)

# Upsert — ON CONFLICT DO UPDATE, safe to call repeatedly
count = adapter.upsert(chunks)

# Query by cosine similarity (<=> operator)
results = adapter.query(query_text="revenue growth", top_k=5)

# Query with SQL filter
results = adapter.query(
    query_text="revenue growth",
    top_k=5,
    where="intent = %s AND page > %s",
    where_params=("metric", 2),
)
for r in results:
    print(r["score"], r["text"][:80])

# Delete specific chunks
adapter.delete(["chunk-id-1", "chunk-id-2"])

# Context manager — closes connection automatically
with PgVectorAdapter(embedding_fn=..., dsn=..., dimensions=384) as adapter:
    adapter.upsert(chunks)
    results = adapter.query("revenue", top_k=5)

Requires pip install "omnidoc-rag[pgvector]".

Adapter comparison

	ChromaDB	Pinecone	Weaviate	pgvector
Embedding fn required	No (built-in)	Yes	No (built-in or custom)	Yes
Self-hosted	Yes	No	Yes / WCD	Yes
Persistent by default	No (in-memory)	Yes	Yes	Yes
Filter on query	Yes (`where` dict)	No	Yes (Filter API)	Yes (raw SQL)
Similarity metric	Cosine (distance)	Cosine	Cosine / certainty	Cosine (`<=>`)
Install extra	`chroma`	`pinecone`	`weaviate`	`pgvector`

Schema Reference

from omnidocrag.schema import SemanticChunk
from dataclasses import fields

for f in fields(SemanticChunk):
    print(f.name, f.type)

# id           str
# text         str
# intent       str    — metric|table|process|value_proposition|heading|narrative
# confidence   float  — 0.1 … 1.0
# page         int    — default 1
# heading      Optional[str]
# keywords     List[str]
# metadata     Dict[str, Any]

SemanticChunk.to_dict() returns a plain dict safe for JSON serialisation and vector DB metadata fields.

Optional Extras

Extra	Install	Unlocks
`chroma`	`pip install "omnidoc-rag[chroma]"`	ChromaDB adapter
`pinecone`	`pip install "omnidoc-rag[pinecone]"`	Pinecone adapter
`weaviate`	`pip install "omnidoc-rag[weaviate]"`	Weaviate v4 adapter
`pgvector`	`pip install "omnidoc-rag[pgvector]"`	PostgreSQL + pgvector adapter
`all`	`pip install "omnidoc-rag[all]"`	All four adapters

Contributing & Development

Setup

git clone https://github.com/your-org/omnidoc-rag.git
cd omnidoc-rag
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

# Editable install with all extras
pip install -e ".[all]"

# Dev tools
pip install build twine pytest pytest-cov ruff black mypy

Verify:

python -c "import omnidocrag; print('OK')"

omnidoc-sdk is a required dependency — install it first if not already present:

pip install omnidoc-sdk
# or from local source:
pip install -e ../omnidoc-sdk

Project Structure

omnidoc-rag/
├── omnidocrag/
│   ├── __init__.py          # Public API — lazy wrappers for all modules
│   ├── schema.py            # SemanticChunk dataclass
│   ├── intent.py            # classify_intent() — deterministic regex classifier
│   ├── adaptive.py          # tokens_for_intent() — per-intent token budgets
│   ├── confidence.py        # score_chunk() — deterministic quality scoring
│   ├── hybrid_metadata.py   # hybrid_metadata() — BM25 keywords + SHA1 hint
│   ├── chunker.py           # chunk_document() — full list output
│   ├── stream.py            # stream_chunks() — lazy generator
│   ├── evaluation.py        # evaluate_rag_result() — retrieval scoring
│   ├── graph.py             # link_chunks() — NEXT / SAME_INTENT / METRIC_OF
│   ├── stitcher.py          # stitch_documents() — cross-doc merging
│   └── vectordb/
│       ├── __init__.py      # Adapter index + docstring
│       ├── chroma.py        # ChromaAdapter
│       ├── pinecone.py      # PineconeAdapter
│       ├── weaviate.py      # WeaviateAdapter (v4 Collections API)
│       └── pgvector.py      # PgVectorAdapter (psycopg2 + pgvector)
├── tests/
│   ├── __init__.py
│   ├── test_rag.py          # 62 tests — intent, confidence, chunker, eval, graph, stitcher, stream
│   └── test_vectordb.py     # 49 tests — all 4 vector DB adapters (mock-based, no live DB)
├── pyproject.toml
└── README.md

Running Tests

The test suite uses fake Document objects and mocked DB clients — no omnidoc-sdk or live database required to run tests.

# All 111 tests
pytest tests/ -v

# Single class
pytest tests/test_rag.py::TestChunkDocument -v
pytest tests/test_vectordb.py::TestChromaAdapter -v

# With coverage
pytest tests/ --cov=omnidocrag --cov-report=term-missing

# Skip if vector DB packages not installed
pytest tests/ -v -m "not integration"

Coverage report (111 tests — 100% coverage):

Module	Stmts	Cover	Test file
`__init__.py`	16	100%	`test_rag.py::TestTopLevelAPI`
`schema.py`	10	100%	`test_rag.py`
`adaptive.py`	4	100%	`test_rag.py`
`intent.py`	30	100%	`test_rag.py::TestClassifyIntent` + `TestIntentEdgeCases`
`confidence.py`	23	100%	`test_rag.py::TestScoreChunk` + `TestConfidenceEdgeCases`
`hybrid_metadata.py`	22	100%	`test_rag.py::TestHybridMetadataEdgeCases`
`chunker.py`	89	100%	`test_rag.py::TestChunkDocument` + `TestChunkerEdgeCases`
`stream.py`	77	100%	`test_rag.py::TestStreamChunks` + `TestStreamEdgeCases`
`evaluation.py`	44	100%	`test_rag.py::TestEvaluateRagResult` + `TestEvaluationEdgeCases`
`graph.py`	30	100%	`test_rag.py::TestLinkChunks` + `TestGraphEdgeCases`
`stitcher.py`	29	100%	`test_rag.py::TestStitchDocuments`
`vectordb/chroma.py`	32	100%	`test_vectordb.py::TestChromaAdapter`
`vectordb/pinecone.py`	27	100%	`test_vectordb.py::TestPineconeAdapter`
`vectordb/weaviate.py`	69	100%	`test_vectordb.py::TestWeaviateAdapter`
`vectordb/pgvector.py`	80	100%	`test_vectordb.py::TestPgVectorAdapter`
Total	582	100%	—

Vector DB adapter tests use unittest.mock / sys.modules injection — no live ChromaDB, Pinecone, Weaviate, or PostgreSQL connection required.

Lint and type-check:

ruff check omnidocrag/
black --check omnidocrag/
mypy omnidocrag/ --ignore-missing-imports

# Auto-fix
black omnidocrag/
ruff check omnidocrag/ --fix

Building & Publishing

Build:

rm -rf dist/ build/ omnidocrag.egg-info/
python -m build
twine check dist/*

Test on TestPyPI first:

twine upload --repository testpypi dist/*

# Verify install
pip install \
  --index-url https://test.pypi.org/simple/ \
  --extra-index-url https://pypi.org/simple/ \
  omnidoc-rag

Publish to PyPI:

twine upload dist/*
pip install omnidoc-rag==0.1.0

Credentials — ~/.pypirc:

[distutils]
index-servers = pypi testpypi

[testpypi]
repository = https://test.pypi.org/legacy/
username = __token__
password = pypi-YOUR_TEST_TOKEN

[pypi]
repository = https://upload.pypi.org/legacy/
username = __token__
password = pypi-YOUR_PROD_TOKEN

chmod 600 ~/.pypirc

Store PYPI_API_TOKEN and TEST_PYPI_API_TOKEN as GitHub repository secrets for CI/CD publishing on version tags.

Versioning

Version is defined once in pyproject.toml. Follow Semantic Versioning:

Change	Example	Bump
Bug fix	Fix coverage calculation	`0.1.0 → 0.1.1`
New feature	Add new vector DB adapter	`0.1.0 → 0.2.0`
Breaking change	Rename `SemanticChunk` fields	`0.1.0 → 1.0.0`

PyPI does not allow re-uploading the same version. Always bump before rebuilding.

Release Checklist

[ ] ruff check omnidocrag/        — zero errors
[ ] black --check omnidocrag/     — no formatting changes
[ ] pytest tests/ -v              — all 111 tests pass
[ ] Version bumped in pyproject.toml
[ ] Changelog updated below
[ ] rm -rf dist/ && python -m build
[ ] twine check dist/*            — both artifacts PASSED
[ ] TestPyPI round-trip verified
[ ] twine upload dist/*           — production upload
[ ] git tag vX.Y.Z && git push origin vX.Y.Z

Troubleshooting

ImportError: ChromaDB support requires the chromadb package

pip install "omnidoc-rag[chroma]"

ImportError: Pinecone support requires the pinecone package

pip install "omnidoc-rag[pinecone]"

ImportError: Weaviate support requires the weaviate-client package

pip install "omnidoc-rag[weaviate]"

ImportError: pgvector support requires psycopg2-binary and pgvector

pip install "omnidoc-rag[pgvector]"

Pinecone / pgvector: embedding function is required — both adapters need you to supply an embedding_fn:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embedding_fn = lambda text: model.encode(text).tolist()

pgvector: could not open extension "vector" — the pgvector extension is not installed on the PostgreSQL server. Run as a superuser:

CREATE EXTENSION IF NOT EXISTS vector;

Weaviate: WeaviateConnectionError — no Weaviate instance running locally. Start one with Docker:

docker run -d -p 8080:8080 -p 50051:50051 cr.weaviate.io/semitechnologies/weaviate:latest

Or pass wcd_url and wcd_api_key to connect to Weaviate Cloud instead.

evaluate_rag_result returns overall=0.0 / verdict unsafe — chunks list is empty. Ensure chunk_document(doc) ran successfully and the document has non-empty sections.

twine check fails with "description failed to render"

pip install readme-renderer[md]
python -m readme_renderer README.md -o /tmp/preview.html

Changelog

[0.1.0] — 2026-04-10

Initial release of omnidoc-rag as a standalone SDK, split from omnidoc-sdk.

Added

schema.py — SemanticChunk dataclass with to_dict() serialisation
intent.py — classify_intent() deterministic regex classifier; 6 canonical labels: metric, table, process, value_proposition, heading, narrative
adaptive.py — tokens_for_intent() per-intent token budgets (heading=60 → process=400)
confidence.py — score_chunk(text, intent=) with intent-aware short-text penalty exemption; floor=0.1
hybrid_metadata.py — hybrid_metadata() BM25 keyword extraction with stopword removal; SHA1 embedding hint
chunker.py — chunk_document(doc, overlap_chars, min_chars) with heading detection, overlap carry-over, table row expansion, deterministic SHA1 chunk IDs
stream.py — stream_chunks() true lazy generator; yields one chunk at a time
evaluation.py — evaluate_rag_result(query, answer, chunks) returning overall, coverage, confidence, source_diversity, verdict, missing_terms
graph.py — link_chunks() producing NEXT, SAME_INTENT, and METRIC_OF edges
stitcher.py — stitch_documents() cross-document merging by heading+intent similarity via SequenceMatcher
vectordb/chroma.py — ChromaAdapter: in-memory or persistent; optional custom embedding function; where filter on query
vectordb/pinecone.py — PineconeAdapter: serverless and pod indexes; namespace support; top_k query
vectordb/weaviate.py — WeaviateAdapter: Weaviate v4 Collections API; auto-creates collection and schema; near_vector or BM25 fallback; local and WCD support; deterministic UUID upserts
vectordb/pgvector.py — PgVectorAdapter: PostgreSQL + pgvector <=> cosine operator; auto-creates table and IVFFlat index; ON CONFLICT DO UPDATE upserts; raw SQL WHERE filter; delete(ids), drop_table(), context manager
tests/test_rag.py — 62 tests covering all core modules; fake Document objects require no omnidoc-sdk install
tests/test_vectordb.py — 49 tests covering all 4 vector DB adapters using mocks (no live DB required)
pyproject.toml — omnidoc-rag v0.1.0; extras: chroma, pinecone, weaviate, pgvector, all

Fixed

Import path corrected from omnidoc_rag to omnidocrag across all vectordb adapters
pyproject.toml package discovery path corrected to omnidocrag

omnidoc-rag · v0.1.0 · Apache 2.0 · Extraction layer → omnidoc-sdk

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 10, 2026

0.1.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnidoc_rag-0.1.1.tar.gz (41.8 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnidoc_rag-0.1.1-py3-none-any.whl (37.0 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file omnidoc_rag-0.1.1.tar.gz.

File metadata

Download URL: omnidoc_rag-0.1.1.tar.gz
Upload date: Apr 10, 2026
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for omnidoc_rag-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`cff3f68a763793cd8c2352cd5d8a841da8e87a8bf79572b5a55757c0edadbb49`
MD5	`0fe2be351b10137886f1efab22401129`
BLAKE2b-256	`3938a1e18c233887f45f8cbc4d589102dfd29ca632154d576f4976612c7f54e4`

See more details on using hashes here.

File details

Details for the file omnidoc_rag-0.1.1-py3-none-any.whl.

File metadata

Download URL: omnidoc_rag-0.1.1-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 37.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for omnidoc_rag-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84a9e8aebc98248dce1a2436eee419d17b9ecb8793af36e44ccafc324ff89c2f`
MD5	`d9065e0cbdbd0b9044c8427eda935d66`
BLAKE2b-256	`5b1f547f41bfac15e4a4c5190dcf7d0ef9b370fd11536ef4d7ae2485e9491c54`

See more details on using hashes here.

omnidoc-rag 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

omnidoc-rag

What is omnidoc-rag?

Table of Contents

Installation

Core (no vector DB)

With ChromaDB

With Pinecone

With Weaviate

With PostgreSQL / pgvector

Everything

Quick Start

Intent Types

Chunking

What it does

Chunk fields

Streaming

Confidence Scoring

Retrieval Evaluation

Return value

Verdict thresholds

Graph Linking

Edge types

Cross-Document Stitching

Vector DB Adapters

ChromaDB

Pinecone

Weaviate

PostgreSQL / pgvector

Adapter comparison

Schema Reference

Optional Extras

Contributing & Development

Setup

Project Structure

Running Tests

Building & Publishing

Versioning

Release Checklist

Troubleshooting

Changelog

[0.1.0] — 2026-04-10

Added

Fixed

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes