MCP Server for local document analysis: PDF text extraction, table detection, DOCX parsing, language detection and keyword search — no cloud API required
Project description
document-intelligence-mcp
Local document intelligence for AI agents — extract text, detect tables, read metadata, analyze structure, search keywords, and detect language from PDF and DOCX files. No cloud API required, no API key needed.
Features
- 10 MCP Tools for PDF and DOCX processing
- Local processing — no data leaves your machine
- No API key required
- Supports PDF (via PyMuPDF + pdfplumber) and Microsoft Word DOCX (via python-docx)
- Language detection via langdetect (55+ languages)
Tools
| Tool | Description |
|---|---|
tool_extract_text_from_pdf |
Extract all text from a PDF, page by page |
tool_extract_tables_from_pdf |
Detect and extract tables from PDF |
tool_get_pdf_metadata |
Read PDF metadata: title, author, dates, outline |
tool_analyze_document_structure |
Detect headings, font sizes, section structure |
tool_search_in_pdf |
Search for keywords with context in PDF |
tool_extract_text_from_docx |
Extract all text from a Word DOCX file |
tool_extract_tables_from_docx |
Extract all tables from a DOCX file |
tool_analyze_docx_structure |
Analyze headings, styles, and structure of DOCX |
tool_count_words_and_stats |
Word count, sentence count, reading time, top words |
tool_detect_document_language |
Detect language of PDF or DOCX (55+ languages) |
Installation
pip install document-intelligence-mcp
Claude Desktop Configuration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"document-intelligence": {
"command": "document-intelligence-mcp"
}
}
}
Usage Examples
Extract text from a PDF:
Extract the text from /path/to/report.pdf
Find tables in a PDF:
Find all tables in /path/to/financial_report.pdf
Search for a keyword:
Search for "revenue" in /path/to/annual_report.pdf
Get document stats:
Count the words and estimate reading time for /path/to/document.docx
Detect language:
What language is /path/to/document.pdf written in?
Requirements
- Python 3.10+
- PyMuPDF >= 1.24.0
- pdfplumber >= 0.11.0
- python-docx >= 1.1.0
- langdetect >= 1.0.9
License
MIT License — free to use, modify, and distribute.
Built by AiAgentKarl | Part of the AI Agent Economy toolkit
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file document_intelligence_mcp-0.1.0.tar.gz.
File metadata
- Download URL: document_intelligence_mcp-0.1.0.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f0dd4394bfcb467d9d25e9fd08db106bd65b90f952dd80e521b926ec3e61354
|
|
| MD5 |
6f52cf29353805d22b283785b2f0cf8d
|
|
| BLAKE2b-256 |
01cbd7c843f0cd85b51a2c8e286c2efb3972b2ac7054d2456f16bdf84611a730
|
File details
Details for the file document_intelligence_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: document_intelligence_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
503ccb77761b3a449d58c726dd334cb8103b54644162f7e7e835194204684442
|
|
| MD5 |
2f9d7ab1ad2e99e644cc7e41ab3722e7
|
|
| BLAKE2b-256 |
d24b65ec158e1ab6102e15b0f134c68be896a9b646c4c50e72bafbffa5ca1b39
|