MCP server for structured document access — read, write, search, and annotate DOCX, LaTeX, Markdown, and scientific papers with minimal token usage
Project description
Precis — Structured Document MCP Server
Stop burning tokens on raw documents. Precis is an MCP server that gives AI agents structured, token-efficient access to DOCX, LaTeX, Markdown, and scientific papers — read, write, search, and annotate through four simple tools.
Instead of dumping a 100k-token PDF into your context window, Precis lets your agent navigate to exactly the section it needs, read just that chunk, and move on. One MCP server replaces the PDF-in-context anti-pattern that burns through Claude, ChatGPT, and Cursor usage limits in minutes.
Why Precis?
- Slash context bloat — Navigate by heading, grep for keywords, or read specific paragraphs by slug. No more feeding entire documents into the context window.
- Write back, not just read — Edit DOCX with tracked changes, insert LaTeX sections, append Markdown — all through the same
put()tool. Your agent becomes a true co-author. - Semantic search over papers — Vector search across thousands of ingested PDFs. Get the relevant chunk, not the whole paper.
- One server, every format — DOCX, LaTeX (multi-file projects), Markdown, plaintext, and scientific papers. No format-specific plugins to juggle.
- Token-aware output — Automatic RAKE keyword summaries for large documents. Depth filtering, pagination, and adaptive truncation keep responses lean.
- Plugin architecture — Add custom document types or URI schemes via Python entry points.
Works with
Claude Desktop · Cursor · Windsurf · Cline · any MCP-compatible client
Quick start
pip install precis-mcp # core: Markdown, plaintext, LaTeX
pip install precis-mcp[word] # + Word DOCX support
pip install precis-mcp[paper] # + scientific paper library
pip install precis-mcp[all] # everything
Add to your MCP client config:
{
"mcpServers": {
"precis": {
"command": "precis"
}
}
}
That's it. Your agent now has structured document access.
Four tools, zero sprawl
| Tool | What it does |
|---|---|
get(id) |
Read any document node — heading, paragraph, table, figure, chunk, citation |
put(id, text, mode) |
Write, replace, delete, annotate, or comment on any node |
search(query) |
Semantic search across your paper library or grep within a document |
move(id, after) |
Reorder sections and paragraphs within a document |
get — Structured reading
get(id='report.docx') # table of contents with slugs
get(id='report.docx›PLXDX') # read specific paragraph by slug
get(id='report.docx', grep='methods') # find all nodes matching 'methods'
get(id='report.docx', depth=2) # outline: H1 + H2 only
get(id='wang2020state') # paper overview + abstract
get(id='wang2020state›38') # read chunk 38 of a paper
get(id='wang2020state›38..42') # read chunk range
get(id='wang2020state/cite/bib') # BibTeX citation
get(id='wang2020state/fig/3') # figure 3 with caption
get(id='doi:10.1021/jacs.2c01234') # lookup by DOI
get(id='arxiv:2301.12345') # lookup by arXiv ID
get(grep='year:2020-2024') # filter papers by date
put — Write and annotate
put(id='report.docx', text='## Methods\n\nWe used...', mode='append')
put(id='report.docx›PLXDX', text='Revised paragraph.', mode='replace') # tracked changes
put(id='report.docx›PLXDX', text='Needs citation.', mode='comment') # margin comment
put(id='report.docx›PLXDX', mode='delete')
put(id='wang2020state', note='Key finding about selectivity') # paper annotation
put(id='wang2020state', link='jones2023surface:cites') # link papers
search — Find what matters
search(query='CO2 capture metal-organic frameworks') # semantic search
search(query='selectivity', scope='wang2020state') # search within one paper
search(query='thermal stability', scope='chapter3.tex') # search within a doc
Supported formats
| Format | Read | Write | Track changes | Comments | Extras |
|---|---|---|---|---|---|
| DOCX | ✓ | ✓ | ✓ | ✓ margin comments | Citations, tables, lists, bibliography |
| LaTeX | ✓ | ✓ | — | — | Multi-file projects, .bib parsing, equations, figures, raw file access |
| Markdown | ✓ | ✓ | — | — | Headings, code blocks, tables, lists. Zero deps |
| Plaintext | ✓ | ✓ | — | — | Paragraph-based. Zero deps |
| Papers | ✓ | notes | — | — | Semantic search, figures, citations, Semantic Scholar graph |
URI grammar
id = path[›selector][/view[/subview]]
Precis auto-detects the scheme from the identifier:
- File extension (
.docx,.tex,.md,.txt) → file handler doi:,arxiv:prefix → paper lookup- Bare DOI pattern (
10.1234/...) → auto-detected - Everything else → paper slug
How it saves tokens
The problem: Feeding a raw 4,500-word PDF to Claude burns ~100,000 tokens. Every follow-up message resends the entire conversation history, compounding the waste. This is the #1 cause of hitting usage limits fast.
The fix: Precis parses documents into a navigable tree of headings, paragraphs, tables, and figures. Your agent reads the table of contents (tiny), drills into the section it needs (small), and gets exactly the paragraph it wants (minimal). Total tokens: a fraction of the raw dump.
| Approach | Tokens for a 30-page paper |
|---|---|
| Raw PDF in context | ~100,000 |
| Precis: TOC → section → paragraph | ~2,000–5,000 |
For large documents (100+ nodes), Precis automatically returns headings-only and lets the agent drill in. RAKE keyword extraction provides compressed summaries for scanning without reading full text.
Output markers
Every line of output is prefixed with a provenance marker:
| Marker | Meaning | Safe to quote? |
|---|---|---|
= |
Verbatim text from the document | ✓ |
~ |
Derived (keywords, summary) | ✗ |
% |
Annotation (user note or comment) | context-dependent |
Plugin system
Extend Precis with new document types or URI schemes:
[project.entry-points."precis.schemes"]
chem = "my_plugin:ChemHandler"
[project.entry-points."precis.file_types"]
".sdf" = "my_plugin:SDFParser"
Implement the Handler protocol (just a read() method) and register via entry points. Precis discovers plugins at startup.
License
GPL-3.0-or-later
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file precis_mcp-3.1.0.tar.gz.
File metadata
- Download URL: precis_mcp-3.1.0.tar.gz
- Upload date:
- Size: 369.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
085dd2ed05e060a9b683059af8b7efc7ff32975953d7367a0da8ad59ca4fe1c8
|
|
| MD5 |
396c95699819a81c936d6cea6166b358
|
|
| BLAKE2b-256 |
04f82540ae632ea615d04b81687e503b8caed6d506c26bdd169083e2efce34a3
|
Provenance
The following attestation bundles were made for precis_mcp-3.1.0.tar.gz:
Publisher:
publish.yml on retospect/precis-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
precis_mcp-3.1.0.tar.gz -
Subject digest:
085dd2ed05e060a9b683059af8b7efc7ff32975953d7367a0da8ad59ca4fe1c8 - Sigstore transparency entry: 1254178745
- Sigstore integration time:
-
Permalink:
retospect/precis-mcp@84719c02b73ea6e5903fd8da9728c86ae0cf8fdc -
Branch / Tag:
refs/tags/v3.1.0 - Owner: https://github.com/retospect
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84719c02b73ea6e5903fd8da9728c86ae0cf8fdc -
Trigger Event:
push
-
Statement type:
File details
Details for the file precis_mcp-3.1.0-py3-none-any.whl.
File metadata
- Download URL: precis_mcp-3.1.0-py3-none-any.whl
- Upload date:
- Size: 76.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b89b43f0374cb17d56768752803c83059ea2ad13e67248ed28361503a2023e1
|
|
| MD5 |
ffd44b7b7e192d56143154d7f050ed88
|
|
| BLAKE2b-256 |
8cf3c0855b9e104c28f6dbfa7bb87e3306e4ae00291b620f81454d098eec8a72
|
Provenance
The following attestation bundles were made for precis_mcp-3.1.0-py3-none-any.whl:
Publisher:
publish.yml on retospect/precis-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
precis_mcp-3.1.0-py3-none-any.whl -
Subject digest:
7b89b43f0374cb17d56768752803c83059ea2ad13e67248ed28361503a2023e1 - Sigstore transparency entry: 1254178803
- Sigstore integration time:
-
Permalink:
retospect/precis-mcp@84719c02b73ea6e5903fd8da9728c86ae0cf8fdc -
Branch / Tag:
refs/tags/v3.1.0 - Owner: https://github.com/retospect
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84719c02b73ea6e5903fd8da9728c86ae0cf8fdc -
Trigger Event:
push
-
Statement type: