Skip to main content

MCP server for structured document access — read, write, search, and annotate DOCX, LaTeX, Markdown, and scientific papers with minimal token usage

Project description

Precis — Structured Document MCP Server

PyPI Python License

Stop burning tokens on raw documents. Precis is an MCP server that gives AI agents structured, token-efficient access to DOCX, LaTeX, Markdown, and scientific papers — read, write, search, and annotate through four simple tools.

Instead of dumping a 100k-token PDF into your context window, Precis lets your agent navigate to exactly the section it needs, read just that chunk, and move on. One MCP server replaces the PDF-in-context anti-pattern that burns through Claude, ChatGPT, and Cursor usage limits in minutes.

Why Precis?

  • Slash context bloat — Navigate by heading, grep for keywords, or read specific paragraphs by slug. No more feeding entire documents into the context window.
  • Write back, not just read — Edit DOCX with tracked changes, insert LaTeX sections, append Markdown — all through the same put() tool. Your agent becomes a true co-author.
  • Semantic search over papers — Vector search across thousands of ingested PDFs. Get the relevant chunk, not the whole paper.
  • One server, every format — DOCX, LaTeX (multi-file projects), Markdown, plaintext, and scientific papers. No format-specific plugins to juggle.
  • Token-aware output — Automatic RAKE keyword summaries for large documents. Depth filtering, pagination, and adaptive truncation keep responses lean.
  • Plugin architecture — Add custom document types or URI schemes via Python entry points.

Works with

Claude Desktop · Cursor · Windsurf · Cline · any MCP-compatible client

Quick start

pip install precis-mcp          # core: Markdown, plaintext, LaTeX
pip install precis-mcp[word]     # + Word DOCX support
pip install precis-mcp[paper]    # + scientific paper library
pip install precis-mcp[all]      # everything

Add to your MCP client config:

{
  "mcpServers": {
    "precis": {
      "command": "precis"
    }
  }
}

That's it. Your agent now has structured document access.

Four tools, zero sprawl

Tool What it does
get(id) Read any document node — heading, paragraph, table, figure, chunk, citation
put(id, text, mode) Write, replace, delete, annotate, or comment on any node
search(query) Semantic search across your paper library or grep within a document
move(id, after) Reorder sections and paragraphs within a document

get — Structured reading

get(id='report.docx')                    # table of contents with slugs
get(id='report.docx›PLXDX')             # read specific paragraph by slug
get(id='report.docx', grep='methods')    # find all nodes matching 'methods'
get(id='report.docx', depth=2)           # outline: H1 + H2 only
get(id='wang2020state')                  # paper overview + abstract
get(id='wang2020state›38')              # read chunk 38 of a paper
get(id='wang2020state›38..42')          # read chunk range
get(id='wang2020state/cite/bib')         # BibTeX citation
get(id='wang2020state/fig/3')            # figure 3 with caption
get(id='doi:10.1021/jacs.2c01234')       # lookup by DOI
get(id='arxiv:2301.12345')               # lookup by arXiv ID
get(grep='year:2020-2024')               # filter papers by date

put — Write and annotate

put(id='report.docx', text='## Methods\n\nWe used...', mode='append')
put(id='report.docx›PLXDX', text='Revised paragraph.', mode='replace')  # tracked changes
put(id='report.docx›PLXDX', text='Needs citation.', mode='comment')     # margin comment
put(id='report.docx›PLXDX', mode='delete')
put(id='wang2020state', note='Key finding about selectivity')            # paper annotation
put(id='wang2020state', link='jones2023surface:cites')                   # link papers

search — Find what matters

search(query='CO2 capture metal-organic frameworks')         # semantic search
search(query='selectivity', scope='wang2020state')           # search within one paper
search(query='thermal stability', scope='chapter3.tex')      # search within a doc

Supported formats

Format Read Write Track changes Comments Extras
DOCX ✓ margin comments Citations, tables, lists, bibliography
LaTeX Multi-file projects, .bib parsing, equations, figures, raw file access
Markdown Headings, code blocks, tables, lists. Zero deps
Plaintext Paragraph-based. Zero deps
Papers notes Semantic search, figures, citations, Semantic Scholar graph

URI grammar

id = path[›selector][/view[/subview]]

Precis auto-detects the scheme from the identifier:

  • File extension (.docx, .tex, .md, .txt) → file handler
  • doi:, arxiv: prefix → paper lookup
  • Bare DOI pattern (10.1234/...) → auto-detected
  • Everything else → paper slug

How it saves tokens

The problem: Feeding a raw 4,500-word PDF to Claude burns ~100,000 tokens. Every follow-up message resends the entire conversation history, compounding the waste. This is the #1 cause of hitting usage limits fast.

The fix: Precis parses documents into a navigable tree of headings, paragraphs, tables, and figures. Your agent reads the table of contents (tiny), drills into the section it needs (small), and gets exactly the paragraph it wants (minimal). Total tokens: a fraction of the raw dump.

Approach Tokens for a 30-page paper
Raw PDF in context ~100,000
Precis: TOC → section → paragraph ~2,000–5,000

For large documents (100+ nodes), Precis automatically returns headings-only and lets the agent drill in. RAKE keyword extraction provides compressed summaries for scanning without reading full text.

Output markers

Every line of output is prefixed with a provenance marker:

Marker Meaning Safe to quote?
= Verbatim text from the document
~ Derived (keywords, summary)
% Annotation (user note or comment) context-dependent

Plugin system

Extend Precis with new document types or URI schemes:

[project.entry-points."precis.schemes"]
chem = "my_plugin:ChemHandler"

[project.entry-points."precis.file_types"]
".sdf" = "my_plugin:SDFParser"

Implement the Handler protocol (just a read() method) and register via entry points. Precis discovers plugins at startup.

License

GPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

precis_mcp-3.0.3.tar.gz (359.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

precis_mcp-3.0.3-py3-none-any.whl (67.8 kB view details)

Uploaded Python 3

File details

Details for the file precis_mcp-3.0.3.tar.gz.

File metadata

  • Download URL: precis_mcp-3.0.3.tar.gz
  • Upload date:
  • Size: 359.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for precis_mcp-3.0.3.tar.gz
Algorithm Hash digest
SHA256 ed4b764a120d404d0a4d2257c030e08619e28dad1edb3e163859b784cfb8912a
MD5 32c8a1470a2f45dee074663030d67bdd
BLAKE2b-256 41aabcd2a1f316026d0e261110b6d1e02a33d09d70f24300392c5b767b3ad9ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for precis_mcp-3.0.3.tar.gz:

Publisher: publish.yml on retospect/precis-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file precis_mcp-3.0.3-py3-none-any.whl.

File metadata

  • Download URL: precis_mcp-3.0.3-py3-none-any.whl
  • Upload date:
  • Size: 67.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for precis_mcp-3.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b3fd084e20d76409daf97b5ed035b2d94228af50d4774a6fd0ac87ed871e2c65
MD5 6bbe93327f5b3a2476f3d1b7f1376609
BLAKE2b-256 8201c18711d38fb0b15f8f7924aa974f3ad54fc4cf2b8b093d9140fc6ddaca20

See more details on using hashes here.

Provenance

The following attestation bundles were made for precis_mcp-3.0.3-py3-none-any.whl:

Publisher: publish.yml on retospect/precis-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page