Privacy-first document processing FastMCP server with PII anonymization
Project description
Inkognito
Privacy-first document processing FastMCP server. Extract, anonymize, and segment documents through FastMCP's modern tool interface.
Please note: As an MCP, privacy of file contents cannot be absolutely guaranteed, but it is a central design consideration. While file contents should be low risk (but non-zero) risk for leakage, file names will, unavoidably and by design, be read and written by the MCP. Plan accordingly. Consider using a local model.
Quick Start
Installation
# Install via pip
pip install inkognito
# Or via uvx (no Python setup needed)
uvx inkognito
# Or run directly with FastMCP
fastmcp run inkognito
Configure Claude Desktop
If not already present, you need to make sure you add a filesystem MCP.
Add to your claude_desktop_config.json:
{
"mcpServers": {
"inkognito": {
"command": "uvx",
"args": ["inkognito"],
"env": {
// Optional: Add keys when extractors are implemented
// "AZURE_DI_KEY": "your-key-here",
// "LLAMAPARSE_API_KEY": "your-key-here"
}
},
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/you/input-files-or-whatever",
"/Users/you/output-folder-if-you-want-one"
],
"env": {},
"transport": "stdio",
"type": null,
"cwd": null,
"timeout": null,
"description": null,
"icon": null,
"authentication": null
}
}
}
Basic Usage
In Claude Desktop:
"Extract this PDF to markdown"
"Anonymize all documents in my contracts folder"
"Split this large document into chunks for processing"
"Create individual prompts from this documentation"
Features
๐ Privacy-First Anonymization
- Universal PII detection (50+ types)
- Consistent replacements across all documents
- Reversible with secure vault file
- No configuration needed - smart defaults
๐ Multiple Extraction Options
- Available Now: Docling (default, with OCR support)
- Planned: Azure DI, LlamaIndex, MinerU (placeholders only)
- Auto-selects best available option
- Falls back to Docling if no cloud options
โ๏ธ Intelligent Segmentation
- Large documents: 10k-30k token chunks
- Prompt generation: Split by headings
- Preserves context and structure
- Markdown-native processing
FastMCP Tools
All tools are exposed through FastMCP's modern interface with automatic progress reporting and error handling.
anonymize_documents
Replace PII with consistent fake data across multiple files.
anonymize_documents(
directory="/path/to/docs",
output_dir="/secure/output"
)
extract_document
Convert PDF/DOCX to markdown.
extract_document(
file_path="/path/to/document.pdf",
extraction_method="auto" # auto, docling (others coming soon)
)
segment_document
Split large documents for LLM processing.
segment_document(
file_path="/path/to/large.md",
output_dir="/output/segments",
max_tokens=20000
)
split_into_prompts
Create individual prompts from structured content.
split_into_prompts(
file_path="/path/to/guide.md",
output_dir="/output/prompts",
split_level="h2", #configurable, LLM should be able to read the contents of these files safely
)
restore_documents
Restore original PII using vault.
restore_documents(
directory="/anonymized/docs",
output_dir="/restored",
vault_path="/secure/vault.json"
)
Extractor Status
| Extractor | Status | Notes |
|---|---|---|
| Docling | โ Fully Implemented | Default extractor with OCR support (OCRMac on macOS, EasyOCR on other platforms) |
| Azure DI | โ ๏ธ Placeholder | Requires AZURE_DI_KEY environment variable when implemented |
| LlamaIndex | โ ๏ธ Placeholder | Requires LLAMAPARSE_API_KEY environment variable when implemented |
| MinerU | โ ๏ธ Placeholder | Will require magic-pdf library when implemented |
Configuration
Following FastMCP conventions, all configuration is via environment variables:
# Optional API keys for cloud extractors (when implemented)
export AZURE_DI_KEY="your-key-here"
export LLAMAPARSE_API_KEY="your-key-here"
# Optional OCR languages (comma-separated, default: all available)
export INKOGNITO_OCR_LANGUAGES="en,fr,de"
Examples
Legal Document Processing
You: "Anonymize all contracts in the merger folder for review"
Claude: "I'll anonymize those contracts for you...
[Processing 23 files...]
โ Anonymized 23 contracts
โ Replaced: 145 company names, 89 person names, 67 case numbers
โ Vault saved to: /output/vault.json
Research Paper Extraction
You: "Extract this 300-page research PDF"
Claude: "I'll extract that PDF to markdown...
[Using Docling for extraction...]
โ Extracted 300 pages
โ Preserved: tables, figures, citations
โ Output size: 487,000 tokens
โ Saved to: research_paper.md
Documentation to Prompts
You: "Split this API documentation into individual prompts"
Claude: "I'll split the documentation by endpoints...
[Splitting by H2 headings...]
โ Created 47 prompt files
โ Each prompt includes endpoint context
โ Ready for training or testing
Performance
| Extractor | Speed | Requirements | Status |
|---|---|---|---|
| Azure DI | 0.2-1 sec/page | API key | Planned |
| LlamaIndex | 1-2 sec/page | API key | Planned |
| MinerU | 3-7 sec/page | Local, GPU | Planned |
| Docling | 5-10 sec/page | Local, CPU | โ Available |
Privacy & Security
- Local processing: No cloud services required
- No persistence: Nothing saved without explicit paths
- Secure vaults: Encrypted mapping storage
- API key safety: Never logged or transmitted
Development
Running Locally
# Clone the repository
git clone https://github.com/phren0logy/inkognito
cd inkognito
# Run with FastMCP CLI
fastmcp dev
# Or run directly in development
uv run python server.py
Testing with FastMCP
# Install the server configuration
fastmcp install inkognito
# Test a specific tool
fastmcp test inkognito extract_document
Project Structure
inkognito/
โโโ pyproject.toml # FastMCP-compatible packaging
โโโ LICENSE # MIT license
โโโ README.md # This file
โโโ server.py # FastMCP server and entry point
โโโ anonymizer.py # PII detection and anonymization
โโโ vault.py # Vault management for reversibility
โโโ segmenter.py # Document segmentation
โโโ exceptions.py # Custom exceptions
โโโ extractors/ # PDF extraction backends
โ โโโ __init__.py
โ โโโ base.py
โ โโโ registry.py
โ โโโ docling.py # โ
Implemented
โ โโโ azure_di.py # Placeholder
โ โโโ llamaindex.py # Placeholder
โ โโโ mineru.py # Placeholder
โโโ tests/
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inkognito-0.1.0.tar.gz.
File metadata
- Download URL: inkognito-0.1.0.tar.gz
- Upload date:
- Size: 88.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9c2983fce901decb919165ace412603c7f1ec8376581c176fb8283316e92823
|
|
| MD5 |
bce1fe51de78838bae2db8f26d1376b4
|
|
| BLAKE2b-256 |
1be33ecc1b1c8650c06c277b4123326a804beb412d0b77617e8e88c014d90b74
|
File details
Details for the file inkognito-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inkognito-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de866f176003e0e83a24759e719599c96f363b911685e2191fa7d0cb75b3fda4
|
|
| MD5 |
bd2fe9a9d21281206c9e9f46b875ff48
|
|
| BLAKE2b-256 |
78c31490de84b78e78dbd004f336f788e5ad299ae71eea03a1136e251a27780d
|