Skip to main content

Grounded document evidence engine for agents and developers that returns citations, snippets, highlights, source coverage, support strength, conflict hints, evidence gaps, and next questions.

Project description

MARE

PyPI version Python versions Publish to PyPI

MARE is an open-source grounded document evidence engine for agents and developers.

Point MARE at a document or folder, ask a question, and get inspectable proof:

documents -> exact evidence -> source coverage -> support strength -> gaps -> next questions

MARE is not another generic chat-with-PDF app. It is the document evidence layer underneath products, RAG systems, MCP tools, OpenClaw/Hermes-style agents, and local document workflows.

Optional modern retrieval stacks include FastEmbed semantic retrieval and reranking for lighter ONNX-based embeddings, experimental ColPali/ColQwen visual page retrieval for layout-heavy PDFs, plus sentence-transformers, FAISS, and Qdrant for deeper vector workflows.

Trust-First Demo

From a repo checkout:

mare workflow --folder ./examples/mixed_docs --query "show me the onboarding steps" --task brief

Or:

PYTHONPATH=src python3 examples/evidence_brief_demo.py

Example output shape:

Evidence brief query: show me the onboarding steps
Weak support from 1 retrieved result across 1 source.
Support note: Evidence is weak or ambiguous. Inspect the proof carefully or refine the question.
Sources: employee-onboarding.docx
Source coverage: Single-source coverage
Proof assets: snippet, citation
Evidence gap 1: Support is weak; ask a narrower question or increase top-k.
Next question 1: Find stronger evidence for: show me the onboarding steps

That is the core MARE difference: it shows proof, source coverage, support level, conflict signals when detected, gaps, and the next evidence-seeking move.

Install

Install from PyPI:

pip install mare-retrieval

Install the visual playground:

pip install "mare-retrieval[ui]"

Install from a repo checkout for examples and development:

git clone https://github.com/mare-retrieval/MARE.git
cd MARE
pip install -e ".[dev]"

First Run

Use the guided entrypoint:

mare start
mare start ./examples/mixed_docs
mare start ./docs

Run an Evidence Brief over your own folder:

mare workflow --folder ./docs --query "what does this document set require?" --task brief

Choose a retrieval stack explicitly when you want to test an optional path:

mare workflow --folder ./docs --query "show me the diagram" --task brief --retriever colpali-visual
mare chat --folder ./docs --retriever fastembed

Ask one document a question:

mare ask manual.pdf "how do I connect the AC adapter"

Open the visual playground:

mare ui

Then open:

http://localhost:8501

Compare retrieval stacks before choosing one:

mare-eval --corpus generated/manual.json --eval examples/eval_cases.json --stack builtin --stack fastembed --stack hybrid-semantic

The comparison output includes a recommendation block with the best stack and ranked page/doc/object/no-result metrics. If you install mare-retrieval[colpali], you can also compare --stack colpali-visual on corpora with rendered PDF page images. If the corpus has no rendered page images, MARE will explain that the visual retriever needs PDF page images and suggest a text retriever instead.

What You Get

MARE can return:

  • best matching page, section, procedure, table-like object, or figure-like object
  • exact snippet
  • file, page, line, heading, or section-aware citation when available
  • rendered PDF page image when available
  • highlighted PDF proof image when localization is possible
  • retrieval rationale and score
  • optional visual page retrieval for image-, chart-, table-, and layout-heavy PDFs through mare-retrieval[colpali]
  • Evidence Brief with source coverage, support strength, conflict hints, proof assets, gaps, and next questions
  • evidence rescue in mare workflow and mare chat: when initial support is weak or missing, MARE tries alternate evidence-seeking queries and records whether stronger proof was found
  • structured payloads for agents, tools, and applications

Supported Documents

Current local document-first workflows support:

  • pdf
  • md / markdown
  • txt
  • first-pass docx

PDFs currently have the strongest visual proof because MARE can render pages and highlight evidence. Markdown, text, and DOCX usually rely on snippet and citation proof first.

Product Surfaces

Interface Best for What you get
mare start guided onboarding path-aware next commands
mare ask fastest single-document test best page, snippet, citation, image paths
mare workflow terminal evaluation and agent-style output corpus summary, object search, Evidence Brief, JSON payloads
mare chat simple local document-agent loop :brief, :review, :compare, :summary, findings, session history
mare ui visual exploration uploads, Evidence Briefs, summaries, findings, highlights
mare mcp agent/app integrations MCP tools returning structured evidence payloads

Agent Integrations

MARE is useful for OpenClaw, Hermes Agent, and other tool-using agents because it gives them a grounded document-evidence tool instead of asking the model to guess from raw files.

Use CLI mode when an agent can run shell commands:

mare workflow --folder ./docs --query "what should I do before onboarding is complete?" --task brief --format json

Use MCP mode when an agent platform supports MCP tools:

mare mcp

See AGENT_INTEGRATIONS.md for OpenClaw/Hermes recipes, tool prompts, and safety guidance.

Python API

from mare import load_document

app = load_document("guide.md", reuse=True)
best = app.best_match("how do I connect the AC adapter")

print(best.page)
print(best.snippet)
print(best.metadata.get("source"))

For richer agent payloads, use:

from mare.integrations import hits_to_evidence_payload

hits = app.retrieve("show me the onboarding steps", top_k=3)
payload = hits_to_evidence_payload("show me the onboarding steps", hits)
print(payload["evidence_brief"])

Optional Integrations

The base install stays lightweight. Add extras as needed:

pip install "mare-retrieval[ui]"
pip install "mare-retrieval[fastembed]"
pip install "mare-retrieval[colpali]"
pip install "mare-retrieval[sentence-transformers]"
pip install "mare-retrieval[faiss]"
pip install "mare-retrieval[langchain]"
pip install "mare-retrieval[llamaindex]"
pip install "mare-retrieval[mcp]"
pip install "mare-retrieval[integrations]"

Advanced optional paths include FastEmbed semantic retrieval and reranking, experimental ColPali/ColQwen visual page retrieval, hybrid semantic retrieval, sentence-transformers, FAISS, Qdrant, LangChain, LangGraph, LlamaIndex, Docling, Unstructured, PaddleOCR, and Surya.

Generated Files

MARE writes local artifacts under generated/ by default:

  • corpus JSON: generated/<document-name>.json
  • rendered PDF pages: generated/<document-name>/page-*.png
  • highlighted proof images: generated/<document-name>/highlights/*.png
  • chat session history: generated/chat_sessions/
  • workflow run history: generated/workflow_runs/
  • UI recent runs: generated/ui_sessions/playground-history.json

Use --no-history on mare chat or mare workflow when you want ephemeral runs.

Development

git clone https://github.com/mare-retrieval/MARE.git
cd MARE
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest

Useful developer entrypoints:

Architecture

query
  -> modality routing
  -> page/object retrieval
  -> lexical, phrase, structure, and object-aware scoring
  -> optional semantic retrieval and reranking
  -> score fusion
  -> snippet and evidence selection
  -> proof rendering when available
  -> Evidence Brief and structured payloads

Core modules:

  • src/mare/engine.py
  • src/mare/router.py
  • src/mare/fusion.py
  • src/mare/retrievers/text.py
  • src/mare/integrations.py
  • src/mare/workflow.py
  • src/mare/mcp_server.py

Current Limits

MARE is strongest today on text-bearing PDFs and local mixed-document folders. These areas are still early:

  • scanned or camera-captured documents without OCR extras
  • table and figure reasoning beyond lightweight object extraction
  • deep contradiction analysis beyond deterministic conflict-language hints
  • learned multimodal routing

Roadmap

Near-term priorities:

  • stronger hybrid retrieval defaults
  • tighter snippets and highlights
  • better source diversity and contradiction analysis
  • weak-support query rewriting
  • evidence evaluation for retrieval quality
  • stronger table/layout proof
  • clearer agent integration recipes

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mare_retrieval-0.4.5.tar.gz (117.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mare_retrieval-0.4.5-py3-none-any.whl (84.0 kB view details)

Uploaded Python 3

File details

Details for the file mare_retrieval-0.4.5.tar.gz.

File metadata

  • Download URL: mare_retrieval-0.4.5.tar.gz
  • Upload date:
  • Size: 117.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mare_retrieval-0.4.5.tar.gz
Algorithm Hash digest
SHA256 2f6f461754ec9625468f076f58a9c3732b6558f6eb6c1a872057aa18986c4d38
MD5 2999ea2aa585ba236dfab10e1cfce7c5
BLAKE2b-256 265156b516599bc4880f5db5999247eb48497640c05d2ffba534b7a1f72c1298

See more details on using hashes here.

Provenance

The following attestation bundles were made for mare_retrieval-0.4.5.tar.gz:

Publisher: publish.yml on mare-retrieval/MARE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mare_retrieval-0.4.5-py3-none-any.whl.

File metadata

  • Download URL: mare_retrieval-0.4.5-py3-none-any.whl
  • Upload date:
  • Size: 84.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mare_retrieval-0.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f0ea85fb172999b129fb6757707ad762c9a2b3dac76338bd2f2aa198e5d58bce
MD5 03e7e5c7be9f2a589de9a47e698e31dd
BLAKE2b-256 502cfd0311376b7ae9fcac7b5ea281211a22e76acf03c05378fc30fd7ecc9272

See more details on using hashes here.

Provenance

The following attestation bundles were made for mare_retrieval-0.4.5-py3-none-any.whl:

Publisher: publish.yml on mare-retrieval/MARE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page