Skip to main content

Privacy-first document processing FastMCP server with PII anonymization

Project description

Inkognito

Privacy-first document processing FastMCP server. Extract, anonymize, and segment documents through FastMCP's modern tool interface.

Please note: As an MCP, privacy of file contents cannot be absolutely guaranteed, but it is a central design consideration. While file contents should be low risk (but non-zero) risk for leakage, file names will, unavoidably and by design, be read and written by the MCP. Plan accordingly. Consider using a local model.

Quick Start

Installation

# Install via pip
pip install inkognito

# Or via uvx (no Python setup needed)
uvx inkognito

# Or run directly with FastMCP
fastmcp run inkognito

Configure Claude Desktop

If not already present, you need to make sure you add a filesystem MCP.

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "inkognito": {
      "command": "uvx",
      "args": ["inkognito"],
      "env": {
        // Optional: Add keys when extractors are implemented
        // "AZURE_DI_KEY": "your-key-here",
        // "LLAMAPARSE_API_KEY": "your-key-here"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/you/input-files-or-whatever",
        "/Users/you/output-folder-if-you-want-one"
      ],
      "env": {},
      "transport": "stdio",
      "type": null,
      "cwd": null,
      "timeout": null,
      "description": null,
      "icon": null,
      "authentication": null
    }
  }
}

Basic Usage

In Claude Desktop:

"Extract this PDF to markdown"
"Anonymize all documents in my contracts folder"
"Split this large document into chunks for processing"
"Create individual prompts from this documentation"

Features

๐Ÿ”’ Privacy-First Anonymization

  • Universal PII detection (50+ types)
  • Consistent replacements across all documents
  • Reversible with secure vault file
  • No configuration needed - smart defaults

๐Ÿ“„ Multiple Extraction Options

  • Available Now: Docling (default, with OCR support)
  • Planned: Azure DI, LlamaIndex, MinerU (placeholders only)
  • Auto-selects best available option
  • Falls back to Docling if no cloud options

โœ‚๏ธ Intelligent Segmentation

  • Large documents: 10k-30k token chunks
  • Prompt generation: Split by headings
  • Preserves context and structure
  • Markdown-native processing

FastMCP Tools

All tools are exposed through FastMCP's modern interface with automatic progress reporting and error handling.

anonymize_documents

Replace PII with consistent fake data across multiple files.

anonymize_documents(
    directory="/path/to/docs",
    output_dir="/secure/output"
)

extract_document

Convert PDF/DOCX to markdown.

extract_document(
    file_path="/path/to/document.pdf",
    extraction_method="auto"  # auto, docling (others coming soon)
)

segment_document

Split large documents for LLM processing.

segment_document(
    file_path="/path/to/large.md",
    output_dir="/output/segments",
    max_tokens=20000
)

split_into_prompts

Create individual prompts from structured content.

split_into_prompts(
    file_path="/path/to/guide.md",
    output_dir="/output/prompts",
    split_level="h2", #configurable, LLM should be able to read the contents of these files safely
)

restore_documents

Restore original PII using vault.

restore_documents(
    directory="/anonymized/docs",
    output_dir="/restored",
    vault_path="/secure/vault.json"
)

Extractor Status

Extractor Status Notes
Docling โœ… Fully Implemented Default extractor with OCR support (OCRMac on macOS, EasyOCR on other platforms)
Azure DI โš ๏ธ Placeholder Requires AZURE_DI_KEY environment variable when implemented
LlamaIndex โš ๏ธ Placeholder Requires LLAMAPARSE_API_KEY environment variable when implemented
MinerU โš ๏ธ Placeholder Will require magic-pdf library when implemented

Configuration

Following FastMCP conventions, all configuration is via environment variables:

# Optional API keys for cloud extractors (when implemented)
export AZURE_DI_KEY="your-key-here"
export LLAMAPARSE_API_KEY="your-key-here"

# Optional OCR languages (comma-separated, default: all available)
export INKOGNITO_OCR_LANGUAGES="en,fr,de"

Examples

Legal Document Processing

You: "Anonymize all contracts in the merger folder for review"

Claude: "I'll anonymize those contracts for you...

[Processing 23 files...]

โœ“ Anonymized 23 contracts
โœ“ Replaced: 145 company names, 89 person names, 67 case numbers
โœ“ Vault saved to: /output/vault.json

Research Paper Extraction

You: "Extract this 300-page research PDF"

Claude: "I'll extract that PDF to markdown...

[Using Docling for extraction...]

โœ“ Extracted 300 pages
โœ“ Preserved: tables, figures, citations
โœ“ Output size: 487,000 tokens
โœ“ Saved to: research_paper.md

Documentation to Prompts

You: "Split this API documentation into individual prompts"

Claude: "I'll split the documentation by endpoints...

[Splitting by H2 headings...]

โœ“ Created 47 prompt files
โœ“ Each prompt includes endpoint context
โœ“ Ready for training or testing

Performance

Extractor Speed Requirements Status
Azure DI 0.2-1 sec/page API key Planned
LlamaIndex 1-2 sec/page API key Planned
MinerU 3-7 sec/page Local, GPU Planned
Docling 5-10 sec/page Local, CPU โœ… Available

Privacy & Security

  • Local processing: No cloud services required
  • No persistence: Nothing saved without explicit paths
  • Secure vaults: Encrypted mapping storage
  • API key safety: Never logged or transmitted

Development

Running Locally

# Clone the repository
git clone https://github.com/phren0logy/inkognito
cd inkognito

# Run with FastMCP CLI
fastmcp dev

# Or run directly in development
uv run python server.py

Testing with FastMCP

# Install the server configuration
fastmcp install inkognito

# Test a specific tool
fastmcp test inkognito extract_document

Project Structure

inkognito/
โ”œโ”€โ”€ pyproject.toml          # FastMCP-compatible packaging
โ”œโ”€โ”€ LICENSE                 # MIT license
โ”œโ”€โ”€ README.md               # This file
โ”œโ”€โ”€ server.py               # FastMCP server and entry point
โ”œโ”€โ”€ anonymizer.py           # PII detection and anonymization
โ”œโ”€โ”€ vault.py                # Vault management for reversibility
โ”œโ”€โ”€ segmenter.py            # Document segmentation
โ”œโ”€โ”€ exceptions.py           # Custom exceptions
โ”œโ”€โ”€ extractors/             # PDF extraction backends
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ base.py
โ”‚   โ”œโ”€โ”€ registry.py
โ”‚   โ”œโ”€โ”€ docling.py          # โœ… Implemented
โ”‚   โ”œโ”€โ”€ azure_di.py         # Placeholder
โ”‚   โ”œโ”€โ”€ llamaindex.py       # Placeholder
โ”‚   โ””โ”€โ”€ mineru.py           # Placeholder
โ””โ”€โ”€ tests/

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inkognito-0.1.0.tar.gz (88.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inkognito-0.1.0-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file inkognito-0.1.0.tar.gz.

File metadata

  • Download URL: inkognito-0.1.0.tar.gz
  • Upload date:
  • Size: 88.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for inkognito-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d9c2983fce901decb919165ace412603c7f1ec8376581c176fb8283316e92823
MD5 bce1fe51de78838bae2db8f26d1376b4
BLAKE2b-256 1be33ecc1b1c8650c06c277b4123326a804beb412d0b77617e8e88c014d90b74

See more details on using hashes here.

File details

Details for the file inkognito-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inkognito-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for inkognito-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de866f176003e0e83a24759e719599c96f363b911685e2191fa7d0cb75b3fda4
MD5 bd2fe9a9d21281206c9e9f46b875ff48
BLAKE2b-256 78c31490de84b78e78dbd004f336f788e5ad299ae71eea03a1136e251a27780d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page