Skip to main content

Reasoning-aware context runtime for RAG — chunk, retrieve, and allocate the document context an LLM should see, with citations and a Decision Report. In-process, no vector DB.

Project description

RedHop

A reasoning-preserving context runtime for RAG.

PyPI Python License Evidence layer

Hand it a document and a question. RedHop chunks, retrieves, and allocates the context your model should actually see — then tells you what it kept, what it dropped, and why, with citations back to the source. No vector database, no LLM, all in-process.

import redhop

doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")

answer = llm.generate(ctx.text())   # any LLM provider — no lock-in
pip install redhop

One self-contained wheel — no Python dependencies. The default lexical tier needs no model at all; the semantic/rerank tiers download a small model on first use (cached).

How it compares

Measured on identical documents + budgets + BM25 retrieval, RedHop beats both frameworks on multi-hop evidence retention (80% vs LangChain 71%, LlamaIndex 72%) and beats LangChain on contracts (82% vs 73%). It trails LlamaIndex by 4 points on CUAD's raw-template query — that gap is mechanism-known and closeable with a Stripper + Vocabulary chain (RedHop reaches 90.7%, +4.7 over LlamaIndex); see CUAD_CLAUSE_EXPANSION.md. All without a vector database, an agent framework, or model finetuning.

Evidence retention vs LangChain vs LlamaIndex

Methodology + raw runs: FRAMEWORK_COMPARISON.md · framework_comparison_2026-06-06.txt.

How it works

RedHop pipeline

Five stages: you bring documents and a query, RedHop owns parsing, chunking, retrieval, and context allocation, and you get a BuiltContext with the assembled prompt, citations, and a Decision Report. Each stage has an evidence-backed default that traces to a finding in docs/findings/.

The idea

Retrieval quality is not the same as reasoning quality. Transformers tolerate irrelevant context far better than they tolerate missing reasoning links — so the chunk a multi-hop answer depends on is often low-relevance to the query and gets silently pruned. RedHop's default keeps it, and makes the trade-off visible. It is not a retriever, vector database, agent framework, or workflow engine — it does one thing: turn a document and a query into the right prompt context, and explain the decision.

It explains every decision

Every call returns a Decision Report — what it kept, what it dropped, and why, including when it deliberately leaves a small context untouched.

Sample Decision Report

Read the fields directly via ctx.report.auto_decision, total_tokens, retained_evidence_ratio, or call doc.analyze(query) for the report without assembling a context.

Cite the evidence

Every selected chunk remembers where it came from:

for c in ctx.citations:
    print(c["source"], c["page"], c["heading"])
    # contract.pdf  3     None      ->  "contract.pdf, p.3"
    # notes.md      None  "Refunds" ->  "notes.md -> Refunds"

Show your work — query rewrites with an audit trail

Every transformation between the raw query and what BM25 actually saw is recorded on the same Decision Report. Compile a Stripper (boilerplate removal), a Vocabulary (workload-curated synonyms), or both, run them as a chain via doc.context_with_rewrites(...), and the per-stage records land on ctx.report.query_rewrites:

stripper = redhop.Stripper(["highlight", "the", "parts", "of", "this", "contract"])
vocab    = redhop.Vocabulary({"change of control": ["merger", "successor", "acquisition"]})

ctx = doc.context_with_rewrites(query, [stripper, vocab])

for rec in ctx.report.query_rewrites:
    print(rec.stage, "matched=", rec.matched, "added=", rec.added)

The same Vocabulary works chunk-side at ingest via vocab.enrich(chunk_text) — lifts retrieval +0.19 mean recall on schema-style corpora (SPIDER_ENRICH); measured to hurt (−2.0pt) on long prose chunks (CUAD_ENRICH_DEFINITIONS_NULL). A/B with redhop.evaluate(...) to confirm before adopting.

Score the change deterministically — no LLM judge

redhop.evaluate(query, ctx, gold_chunks=[...]) returns context_recall / context_precision / answer_token_recall + a composite overall, all computed from the same primitives the Decision Report uses (no LLM call, deterministic across runs, ~ms per query):

ctx_a = doc.context(user_query)
ctx_b = doc.context_with_rewrites(user_query, [stripper, vocab])
eval_a = redhop.evaluate(user_query, ctx_a, gold_chunks=gold_ids)
eval_b = redhop.evaluate(user_query, ctx_b, gold_chunks=gold_ids)
print("lift on overall:", eval_b.overall - eval_a.overall)

Design rationale + the full field list in EVALUATE_API.

Loading documents

On-ramp For
Document.from_text(text, source="document") text you already have
Document.from_chunks([redhop.Chunk(...), ...]) content you already chunked — pass typed redhop.Chunk(text, source=..., id=..., metadata={...}) instances
Document.from_file("x.pdf") a file — PDF, DOCX, PPTX, XLSX, Markdown, or text/code
Document.from_bytes(data, source="x.pdf") bytes you fetched (S3 / GCS / HTTP / DB)
Document.from_folder("./docs", persist=True) a whole directory, with an optional incremental on-disk index

Retrieval tiers — no vector database

Start at the lexical default — it handles most document QA because the words in the question are usually the words in the answer — and climb only when the failure shape calls for it. All in-process, no ANN, no index server.

# Default — most docs (code, API refs, runbooks, financial reports, handbooks)
doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")

# Structured docs with parallel clauses (regional overrides, per-region sub-sections):
doc = redhop.Document.from_file("msa.pdf", retrieval="hybrid", model="bge-small")
ctx = doc.context("What law applies in the UK?", include_heading=True, neighbors=1)

# Synonym-mismatch corpora (HR FAQs, support tickets where users phrase
# things very differently from the docs). Cross-encoder adds 5–10× latency
# — verify it helps on your corpus before enabling.
doc = redhop.Document.from_file("support.md",
    retrieval="hybrid", model="bge-small", rerank="cross-encoder")

The 60-second decision guide with trade-offs and query-writing tips: CHOOSING_A_CONFIG.

Non-English content

Default is English Snowball. Swap with the language= kwarg — any of the 18 Snowball Porter2 languages (arabic, danish, dutch, english, finnish, french, german, greek, hungarian, italian, norwegian, portuguese, romanian, russian, spanish, swedish, tamil, turkish):

doc = redhop.Document.from_text(german_text, language="german")
# Now `Buch` finds chunks containing `Bücher` (and vice versa)

One analyzer drives both BM25 retrieval AND the grounding scorer, so they can't drift on what "the same term" means. Unknown names raise (we don't silently fall back to English). See the language guide for the full breakdown and the calibration disclaimer (we ship the stemmers; eval-corpus ranking quality on a real domain corpus is the user's call).

Assembly strategies

strategy= What it does
reasoning_preserving (default) keep query-relevant seeds and rescue low-relevance chunks linked to one; drop only unlinked junk
distractor_filtered drop everything below a query-grounding bar
max_density greedily pack the densest chunks into the budget
raw_topk keep retrieval order until the budget fills
auto size-gated: pass small contexts through, prune large/diluted ones

Already have chunks from your own retriever? Wrap each as redhop.Chunk(text, source=..., id=..., metadata={...}) and pass into redhop.build_context(query, retrieved_chunks=chunks, ...) (low-level) or redhop.Document.from_chunks(chunks) (full indexing).

Templated workloads — the +9 retention lift (BM25, no model needed)

If every query in your workload follows a fixed template — legal QA ("Highlight the parts (if any) of this contract related to X. Details: …"), support-ticket triage ("Help me with X, my account is Y, the error is Z"), form-filled queries from a structured UI — BM25 weights every query term by corpus IDF, not by how often the term repeats across your query set. The boilerplate words dilute the real signal words, and retention suffers. This is the mechanism behind the 4-point CUAD gap on the head-to-head; closing it doesn't need a vector DB or a different retriever — it needs two small preprocessing helpers on the query side.

CUAD retention rises 81% → 88% → 90.7% across the detect → Stripper → Vocabulary workflow; LlamaIndex is at 86%

Measured on the CUAD framework comparison (n=300, BM25, budget 2,000 tok):

step helper retention Δ
raw 24-word template 81.3%
+ strip the wrapper Stripper 87.7% +6.4
+ add workload synonyms Vocabulary 90.7% +3.0

RedHop with the full workflow is at 90.7% — beating LlamaIndex by 4 points on the same setup, at native BM25 latency (~2.5ms/query). Mechanism + worked clause dict: CUAD_CLAUSE_EXPANSION.md.

Recommended workflow: detect → strip → (optional) expand → A/B. The rewrite chain runs inside Document.context_with_rewrites(...) so each stage's audit trail lands on report.query_rewrites automatically.

import redhop

# 1 — Detect. Hand a representative sample of your queries to the analyzer.
report = redhop.analyze_query_set(my_queries[:300])
# report.is_templated            → True / False
# report.template_word_share     → e.g. 0.66 on CUAD
# report.boilerplate_terms       → ["highlight", "contract", "lawyer", …]
# report.estimated_dilution_cost → "high" | "medium" | "low" | "none"

if report.is_templated:
    # 2 — Compile the rewrite chain.
    stripper = redhop.Stripper(report.boilerplate_terms)

    # 3 — (optional) Vocabulary. If your workload has known topic synonyms
    #     (clause types, error codes), compile them once.
    vocab = redhop.Vocabulary({
        # YOUR keys → synonyms; CUAD worked example in CUAD_CLAUSE_EXPANSION.md
        "change of control": ["merger", "successor", "acquisition"],
    })

    # 4 — Run the chain through retrieval; audit lands on report.query_rewrites.
    doc = redhop.Document.from_file("contract.pdf")
    ctx_a = doc.context(user_query)                              # baseline
    ctx_b = doc.context_with_rewrites(user_query, [stripper, vocab])
    eval_a = redhop.evaluate(user_query, ctx_a, gold_chunks=gold_ids)
    eval_b = redhop.evaluate(user_query, ctx_b, gold_chunks=gold_ids)
    print(eval_b.overall - eval_a.overall)   # the lift, deterministically
  • Only matters if your queries are templated. analyze_query_set is conservative by design — HotpotQA and MuSiQue both register quiet (is_templated=False) in the cross-workload probe; CUAD fires. If yours doesn't fire, skip this section.
  • The analyzer measures the shape of your query set, not your retention. It says "this looks like a templated workload" with the boilerplate terms it found; it does not promise a specific lift. Always A/B on your gold-evidence sample before committing.
  • For single-doc extraction workloads also set strategy="raw_topk". auto routes large contexts to reasoning_preserving, which solves a multi-hop problem contract extraction doesn't have. RawTopK beats it by ~4 points at every chunk size on CUAD.
  • We deliberately don't ship a CUAD-specific strip_template() helper. Templates are workload-specific; baking one in would make the wrong call for the next workload. Stripper(...) and Vocabulary({...}) take your boilerplate / synonym dict so the call stays on your side.
  • Or take the one-knob alternative — retrieval="hybrid". Dense reads chunks as semantic content rather than counting tokens, so the boilerplate ratio stops mattering. Substitutes for stripping by a different mechanism (+5.3 on raw CUAD at ~10ms/query). On CUAD specifically, BM25 + strip + vocabulary still wins — 90.7% / 2.5ms vs hybrid+CE 89.0% / 683ms. The two paths are substitutes, not complements; pick one. See CUAD_HYBRID_RERANK.md.
helper what it does finding
analyze_query_set(queries) Inspects your queries; flags whether they're templated and which terms are doing the dilution QUERY_SET_ANALYZER
Stripper(boilerplate) Compiled token-level boilerplate strip; word-boundary safe (an "of" strip does not erase "of" inside "office"). Plugs into the rewrite chain so the audit trail is captured CUAD_RECALL_GAP · MULTILINGUAL_ANALYZER
Vocabulary({key: [synonyms]}) Compiled workload-curated equivalence classes — appends high-IDF synonyms when the token-level key matches. Vocabulary.bidirectional({...}) for symmetric maps (PTO ↔ paid time off). Opposite mechanism to PRF (falsified) CUAD_CLAUSE_EXPANSION
vocab.enrich(chunk_text) Chunk-side mirror. Measured to lift retrieval +0.19 mean recall on Spider-shape schemas — use it when your retrieval units are short and opaque (schema columns, error codes, API symbols, defined contract terms). Measured to hurt (−2.0pt) on long prose chunks — don't use it there. A/B with redhop.evaluate(...) against your gold before adopting SPIDER_ENRICH + VOCABULARY_ENRICH + CUAD_ENRICH_DEFINITIONS_NULL
Document.context_with_rewrites(query, [stripper, vocab]) Runs the chain through retrieval; per-stage audit lands on report.query_rewrites (same finding as above)
evaluate(query, ctx, gold_chunks=, gold_answer=) Deterministic A/B scoring against gold; no LLM judge. Same primitives the Decision Report uses EVALUATE_API

Decision rule + the recipe on the docs site: Choosing a configuration → "Templated queries with heavy boilerplate".

Documentation

Full docs, the comparison vs LangChain / LlamaIndex, and the evidence behind every default: https://www.redhopai.com

Apache-2.0. Also available for Node.js (npm install redhop) and Rust (cargo add redhop).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redhop-0.3.0.tar.gz (387.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

redhop-0.3.0-cp39-abi3-win_amd64.whl (13.0 MB view details)

Uploaded CPython 3.9+Windows x86-64

redhop-0.3.0-cp39-abi3-manylinux_2_28_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

redhop-0.3.0-cp39-abi3-manylinux_2_28_aarch64.whl (16.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ ARM64

redhop-0.3.0-cp39-abi3-macosx_11_0_arm64.whl (14.0 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

redhop-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl (15.4 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file redhop-0.3.0.tar.gz.

File metadata

  • Download URL: redhop-0.3.0.tar.gz
  • Upload date:
  • Size: 387.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redhop-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a53a4d71eadbcd288c8c6e167b2eefe9ca92912136ea1a5aeafb9f68dd806ee5
MD5 c86203e3c96548b6d7a7fcdc51818375
BLAKE2b-256 0b94cce568a7ec5ff323f369f2b8f20f4caea989681a824380e2c8dadd7c7803

See more details on using hashes here.

Provenance

The following attestation bundles were made for redhop-0.3.0.tar.gz:

Publisher: release-python.yml on vysakh0/redhop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file redhop-0.3.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: redhop-0.3.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 13.0 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redhop-0.3.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3b1b47413b005d18a180f86788f5b9d2821a6bad388ce73abaaeeaa0e96a3f9a
MD5 97f28a628f014dfef779e1b35561c99f
BLAKE2b-256 426a70a8346fd4bea87a84d58e0b2d56c9001fa16fad2cd0b04560163c73c9da

See more details on using hashes here.

Provenance

The following attestation bundles were made for redhop-0.3.0-cp39-abi3-win_amd64.whl:

Publisher: release-python.yml on vysakh0/redhop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file redhop-0.3.0-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for redhop-0.3.0-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 18fbee8a6568a4d25da0008cf41cdb205a31785eed09dd360a9ef629281794e0
MD5 7959d379ade71f2a41cca93a7bb01633
BLAKE2b-256 1dfb01158728c11395ebdf3e6acc5c1574f59cebedcd0df34f2ca337e639af32

See more details on using hashes here.

Provenance

The following attestation bundles were made for redhop-0.3.0-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-python.yml on vysakh0/redhop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file redhop-0.3.0-cp39-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for redhop-0.3.0-cp39-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 438005fcaf11e61836ef3124dd499e29174e0351599dcfb10c106b4838a061a4
MD5 c00ce133b5639ce2638cea1800ffe2bc
BLAKE2b-256 aa33d654c144bce032c3146dc13f7e4e6a4534a041d3c398d7ccedc17ed1dc99

See more details on using hashes here.

Provenance

The following attestation bundles were made for redhop-0.3.0-cp39-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-python.yml on vysakh0/redhop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file redhop-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for redhop-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5f3c111e25a02b947dc092bb3825e4c2ec3b7783d81c424b8e2aa82ab1787ae4
MD5 636a308693f7338a09ef7b41bf2ca843
BLAKE2b-256 cba8e955d955a0a85c54256454646fb25a64a750db68d2d3ef2545b961044446

See more details on using hashes here.

Provenance

The following attestation bundles were made for redhop-0.3.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release-python.yml on vysakh0/redhop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file redhop-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for redhop-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3de7502ecd36ad0e65606fb9f0bddb58d60cbfd7c00e813ced8005e311aac6bd
MD5 dff5f34cb3fd5b7c720a30d979d98454
BLAKE2b-256 ab9b2953763e4bb589cf31f0c9cbd3a4b1ea5a183fea626c47fd36152ad5be85

See more details on using hashes here.

Provenance

The following attestation bundles were made for redhop-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: release-python.yml on vysakh0/redhop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page