Skip to main content

MCP Server for local document analysis: PDF text extraction, table detection, DOCX parsing, language detection and keyword search — no cloud API required

Project description

document-intelligence-mcp

PyPI version License: MIT Python 3.10+

Local document intelligence for AI agents — extract text, detect tables, read metadata, analyze structure, search keywords, and detect language from PDF and DOCX files. No cloud API required, no API key needed.

Features

  • 10 MCP Tools for PDF and DOCX processing
  • Local processing — no data leaves your machine
  • No API key required
  • Supports PDF (via PyMuPDF + pdfplumber) and Microsoft Word DOCX (via python-docx)
  • Language detection via langdetect (55+ languages)

Tools

Tool Description
tool_extract_text_from_pdf Extract all text from a PDF, page by page
tool_extract_tables_from_pdf Detect and extract tables from PDF
tool_get_pdf_metadata Read PDF metadata: title, author, dates, outline
tool_analyze_document_structure Detect headings, font sizes, section structure
tool_search_in_pdf Search for keywords with context in PDF
tool_extract_text_from_docx Extract all text from a Word DOCX file
tool_extract_tables_from_docx Extract all tables from a DOCX file
tool_analyze_docx_structure Analyze headings, styles, and structure of DOCX
tool_count_words_and_stats Word count, sentence count, reading time, top words
tool_detect_document_language Detect language of PDF or DOCX (55+ languages)

Installation

pip install document-intelligence-mcp

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "document-intelligence": {
      "command": "document-intelligence-mcp"
    }
  }
}

Usage Examples

Extract text from a PDF:

Extract the text from /path/to/report.pdf

Find tables in a PDF:

Find all tables in /path/to/financial_report.pdf

Search for a keyword:

Search for "revenue" in /path/to/annual_report.pdf

Get document stats:

Count the words and estimate reading time for /path/to/document.docx

Detect language:

What language is /path/to/document.pdf written in?

Requirements

  • Python 3.10+
  • PyMuPDF >= 1.24.0
  • pdfplumber >= 0.11.0
  • python-docx >= 1.1.0
  • langdetect >= 1.0.9

License

MIT License — free to use, modify, and distribute.


Built by AiAgentKarl | Part of the AI Agent Economy toolkit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_intelligence_mcp-0.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

document_intelligence_mcp-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file document_intelligence_mcp-0.1.0.tar.gz.

File metadata

File hashes

Hashes for document_intelligence_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f0dd4394bfcb467d9d25e9fd08db106bd65b90f952dd80e521b926ec3e61354
MD5 6f52cf29353805d22b283785b2f0cf8d
BLAKE2b-256 01cbd7c843f0cd85b51a2c8e286c2efb3972b2ac7054d2456f16bdf84611a730

See more details on using hashes here.

File details

Details for the file document_intelligence_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for document_intelligence_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 503ccb77761b3a449d58c726dd334cb8103b54644162f7e7e835194204684442
MD5 2f9d7ab1ad2e99e644cc7e41ab3722e7
BLAKE2b-256 d24b65ec158e1ab6102e15b0f134c68be896a9b646c4c50e72bafbffa5ca1b39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page