Grounded document evidence engine for agents and developers that returns citations, snippets, highlights, source coverage, support strength, conflict hints, evidence gaps, and next questions.
Project description
MARE
MARE is an open-source grounded document evidence engine for agents and developers.
Point MARE at a document or folder, ask a question, and get inspectable proof:
documents -> exact evidence -> source coverage -> support strength -> gaps -> next questions
MARE is not another generic chat-with-PDF app. It is the document evidence layer underneath products, RAG systems, MCP tools, OpenClaw/Hermes-style agents, and local document workflows.
Optional modern retrieval stacks include FastEmbed semantic retrieval and reranking for lighter ONNX-based embeddings, experimental ColPali/ColQwen visual page retrieval for layout-heavy PDFs, plus sentence-transformers, FAISS, and Qdrant for deeper vector workflows.
Trust-First Demo
From a repo checkout:
mare workflow --folder ./examples/mixed_docs --query "show me the onboarding steps" --task brief
Or:
PYTHONPATH=src python3 examples/evidence_brief_demo.py
Example output shape:
Evidence brief query: show me the onboarding steps
Weak support from 1 retrieved result across 1 source.
Support note: Evidence is weak or ambiguous. Inspect the proof carefully or refine the question.
Sources: employee-onboarding.docx
Source coverage: Single-source coverage
Proof assets: snippet, citation
Evidence gap 1: Support is weak; ask a narrower question or increase top-k.
Next question 1: Find stronger evidence for: show me the onboarding steps
That is the core MARE difference: it shows proof, source coverage, support level, conflict signals when detected, gaps, and the next evidence-seeking move.
Install
Install from PyPI:
pip install mare-retrieval
Install the visual playground:
pip install "mare-retrieval[ui]"
Install from a repo checkout for examples and development:
git clone https://github.com/mare-retrieval/MARE.git
cd MARE
pip install -e ".[dev]"
First Run
Use the guided entrypoint:
mare start
mare start ./examples/mixed_docs
mare start ./docs
Run an Evidence Brief over your own folder:
mare workflow --folder ./docs --query "what does this document set require?" --task brief
Choose a retrieval stack explicitly when you want to test an optional path:
mare workflow --folder ./docs --query "show me the diagram" --task brief --retriever colpali-visual
mare chat --folder ./docs --retriever fastembed
Ask one document a question:
mare ask manual.pdf "how do I connect the AC adapter"
Open the visual playground:
mare ui
Then open:
http://localhost:8501
Compare retrieval stacks before choosing one:
mare-eval --corpus generated/manual.json --eval examples/eval_cases.json --stack builtin --stack fastembed --stack hybrid-semantic
The comparison output includes a recommendation block with the best stack and ranked page/doc/object/no-result metrics.
If you install mare-retrieval[colpali], you can also compare --stack colpali-visual on corpora with rendered PDF page images.
If the corpus has no rendered page images, MARE will explain that the visual retriever needs PDF page images and suggest a text retriever instead.
What You Get
MARE can return:
- best matching page, section, procedure, table-like object, or figure-like object
- exact snippet
- file, page, line, heading, or section-aware citation when available
- rendered PDF page image when available
- highlighted PDF proof image when localization is possible
- retrieval rationale and score
- optional visual page retrieval for image-, chart-, table-, and layout-heavy PDFs through
mare-retrieval[colpali] - Evidence Brief with source coverage, support strength, conflict hints, proof assets, gaps, and next questions
- evidence rescue in
mare workflowandmare chat: when initial support is weak or missing, MARE tries alternate evidence-seeking queries and records whether stronger proof was found - structured payloads for agents, tools, and applications
Supported Documents
Current local document-first workflows support:
pdfmd/markdowntxt- first-pass
docx
PDFs currently have the strongest visual proof because MARE can render pages and highlight evidence. Markdown, text, and DOCX usually rely on snippet and citation proof first.
Product Surfaces
| Interface | Best for | What you get |
|---|---|---|
mare start |
guided onboarding | path-aware next commands |
mare ask |
fastest single-document test | best page, snippet, citation, image paths |
mare workflow |
terminal evaluation and agent-style output | corpus summary, object search, Evidence Brief, JSON payloads |
mare chat |
simple local document-agent loop | :brief, :review, :compare, :summary, findings, session history |
mare ui |
visual exploration | uploads, Evidence Briefs, summaries, findings, highlights |
mare mcp |
agent/app integrations | MCP tools returning structured evidence payloads |
Agent Integrations
MARE is useful for OpenClaw, Hermes Agent, and other tool-using agents because it gives them a grounded document-evidence tool instead of asking the model to guess from raw files.
Use CLI mode when an agent can run shell commands:
mare workflow --folder ./docs --query "what should I do before onboarding is complete?" --task brief --format json
Use MCP mode when an agent platform supports MCP tools:
mare mcp
See AGENT_INTEGRATIONS.md for OpenClaw/Hermes recipes, tool prompts, and safety guidance.
Python API
from mare import load_document
app = load_document("guide.md", reuse=True)
best = app.best_match("how do I connect the AC adapter")
print(best.page)
print(best.snippet)
print(best.metadata.get("source"))
For richer agent payloads, use:
from mare.integrations import hits_to_evidence_payload
hits = app.retrieve("show me the onboarding steps", top_k=3)
payload = hits_to_evidence_payload("show me the onboarding steps", hits)
print(payload["evidence_brief"])
Optional Integrations
The base install stays lightweight. Add extras as needed:
pip install "mare-retrieval[ui]"
pip install "mare-retrieval[fastembed]"
pip install "mare-retrieval[colpali]"
pip install "mare-retrieval[sentence-transformers]"
pip install "mare-retrieval[faiss]"
pip install "mare-retrieval[langchain]"
pip install "mare-retrieval[llamaindex]"
pip install "mare-retrieval[mcp]"
pip install "mare-retrieval[integrations]"
Advanced optional paths include FastEmbed semantic retrieval and reranking, experimental ColPali/ColQwen visual page retrieval, hybrid semantic retrieval, sentence-transformers, FAISS, Qdrant, LangChain, LangGraph, LlamaIndex, Docling, Unstructured, PaddleOCR, and Surya.
Generated Files
MARE writes local artifacts under generated/ by default:
- corpus JSON:
generated/<document-name>.json - rendered PDF pages:
generated/<document-name>/page-*.png - highlighted proof images:
generated/<document-name>/highlights/*.png - chat session history:
generated/chat_sessions/ - workflow run history:
generated/workflow_runs/ - UI recent runs:
generated/ui_sessions/playground-history.json
Use --no-history on mare chat or mare workflow when you want ephemeral runs.
Development
git clone https://github.com/mare-retrieval/MARE.git
cd MARE
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest
Useful developer entrypoints:
- DEVELOPER_GUIDE.md
- examples/evidence_brief_demo.py
- examples/mixed_docs_workflow.py
examples/developer_playground.ipynb
Architecture
query
-> modality routing
-> page/object retrieval
-> lexical, phrase, structure, and object-aware scoring
-> optional semantic retrieval and reranking
-> score fusion
-> snippet and evidence selection
-> proof rendering when available
-> Evidence Brief and structured payloads
Core modules:
src/mare/engine.pysrc/mare/router.pysrc/mare/fusion.pysrc/mare/retrievers/text.pysrc/mare/integrations.pysrc/mare/workflow.pysrc/mare/mcp_server.py
Current Limits
MARE is strongest today on text-bearing PDFs and local mixed-document folders. These areas are still early:
- scanned or camera-captured documents without OCR extras
- table and figure reasoning beyond lightweight object extraction
- deep contradiction analysis beyond deterministic conflict-language hints
- learned multimodal routing
Roadmap
Near-term priorities:
- stronger hybrid retrieval defaults
- tighter snippets and highlights
- better source diversity and contradiction analysis
- weak-support query rewriting
- evidence evaluation for retrieval quality
- stronger table/layout proof
- clearer agent integration recipes
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mare_retrieval-0.4.5.tar.gz.
File metadata
- Download URL: mare_retrieval-0.4.5.tar.gz
- Upload date:
- Size: 117.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f6f461754ec9625468f076f58a9c3732b6558f6eb6c1a872057aa18986c4d38
|
|
| MD5 |
2999ea2aa585ba236dfab10e1cfce7c5
|
|
| BLAKE2b-256 |
265156b516599bc4880f5db5999247eb48497640c05d2ffba534b7a1f72c1298
|
Provenance
The following attestation bundles were made for mare_retrieval-0.4.5.tar.gz:
Publisher:
publish.yml on mare-retrieval/MARE
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mare_retrieval-0.4.5.tar.gz -
Subject digest:
2f6f461754ec9625468f076f58a9c3732b6558f6eb6c1a872057aa18986c4d38 - Sigstore transparency entry: 1866437250
- Sigstore integration time:
-
Permalink:
mare-retrieval/MARE@13cc36579f16e2eb2a4b48c316c40843bb333a86 -
Branch / Tag:
refs/tags/v0.4.5 - Owner: https://github.com/mare-retrieval
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@13cc36579f16e2eb2a4b48c316c40843bb333a86 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mare_retrieval-0.4.5-py3-none-any.whl.
File metadata
- Download URL: mare_retrieval-0.4.5-py3-none-any.whl
- Upload date:
- Size: 84.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0ea85fb172999b129fb6757707ad762c9a2b3dac76338bd2f2aa198e5d58bce
|
|
| MD5 |
03e7e5c7be9f2a589de9a47e698e31dd
|
|
| BLAKE2b-256 |
502cfd0311376b7ae9fcac7b5ea281211a22e76acf03c05378fc30fd7ecc9272
|
Provenance
The following attestation bundles were made for mare_retrieval-0.4.5-py3-none-any.whl:
Publisher:
publish.yml on mare-retrieval/MARE
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mare_retrieval-0.4.5-py3-none-any.whl -
Subject digest:
f0ea85fb172999b129fb6757707ad762c9a2b3dac76338bd2f2aa198e5d58bce - Sigstore transparency entry: 1866437317
- Sigstore integration time:
-
Permalink:
mare-retrieval/MARE@13cc36579f16e2eb2a4b48c316c40843bb333a86 -
Branch / Tag:
refs/tags/v0.4.5 - Owner: https://github.com/mare-retrieval
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@13cc36579f16e2eb2a4b48c316c40843bb333a86 -
Trigger Event:
release
-
Statement type: