MCP server for ClicheFactory — structured data extraction from documents
Project description
clichefactory-mcp
MCP (Model Context Protocol) server for ClicheFactory — structured data extraction from documents.
This server exposes ClicheFactory's extraction and document conversion capabilities as MCP tools, allowing AI assistants in Cursor, Claude Desktop, OpenClaw, and other MCP-compatible clients to extract structured data from PDFs, images, DOCX, XLSX, CSV, EML, and more.
Tools
| Tool | Description |
|---|---|
extract |
Extract structured JSON from a document using a schema |
to_markdown |
Convert a document to markdown text |
doctor |
Check configuration, dependencies, and system binaries |
extract
The main tool. Pass a document file and a JSON schema — get structured data back.
Supports all extraction modes:
| Mode | Description | Requires |
|---|---|---|
| (default) | OCR + LLM extraction | local: LLM key · service: API key |
fast |
Fastest pipeline | Same as default |
trained |
Trained pipeline artifact | Service + artifact_id |
robust |
Two-stage extract + verify | Service only |
robust-trained |
Trained extract + verification | Service + artifact_id |
The schema can be provided as:
- File path: absolute path to a
.jsonschema file - Inline dict: the LLM constructs a JSON schema from the conversation (e.g., the user says "extract the invoice number and total" and the LLM builds
{"type": "object", "properties": {"invoice_number": {"type": "string"}, "total": {"type": "number"}}})
to_markdown
Converts any supported document to markdown. Useful for inspecting document contents or feeding them to the LLM for analysis before deciding on an extraction schema.
doctor
Runs diagnostics on the ClicheFactory setup — config file, API keys, Python dependencies, system binaries. Call this when things aren't working.
Execution Modes
The server supports two modes, matching the SDK and CLI:
-
local— Runs extraction on your machine. You bring your own LLM key (BYOK). Supports Gemini, OpenAI, Anthropic, and Ollama models. Requires theclichefactory[local]dependencies for document parsing. -
service— Uses the ClicheFactory cloud service. Requires a ClicheFactory API key. Supports all extraction modes including trained pipelines and robust verification. Optionally accepts BYOK model overrides.
Installation
Prerequisites
- Python ≥ 3.12
- uv (recommended) or pip
From PyPI
pip install clichefactory-mcp
For local-mode extraction (document parsing on your machine), install with the local extras:
pip install "clichefactory-mcp[local]"
Configuration
Environment Variables
Set these in your MCP client configuration (see below) or in ~/.clichefactory/config.toml via clichefactory configure.
| Variable | Required | Description |
|---|---|---|
CLICHEFACTORY_API_KEY |
Service mode | ClicheFactory API key (format: cliche-...) |
CLICHEFACTORY_API_URL |
No | Service URL override (default: production) |
LLM_MODEL_NAME |
Local mode | Model name, e.g. gemini/gemini-3-flash-preview |
LLM_API_KEY |
Local mode | API key for the LLM provider |
OCR_MODEL_NAME |
No | Separate OCR/VLM model (defaults to main model) |
OCR_API_KEY |
No | API key for OCR model (defaults to main key) |
The config file at ~/.clichefactory/config.toml (created by clichefactory configure) is also respected. Environment variables take precedence over the config file.
Cursor
Add to .cursor/mcp.json in your project (or global Cursor settings):
{
"mcpServers": {
"clichefactory": {
"command": "uv",
"args": ["--directory", "/absolute/path/to/cliche-mcp", "run", "clichefactory-mcp"],
"env": {
"LLM_MODEL_NAME": "gemini/gemini-3-flash-preview",
"LLM_API_KEY": "your-gemini-api-key"
}
}
}
}
For service mode:
{
"mcpServers": {
"clichefactory": {
"command": "uv",
"args": ["--directory", "/absolute/path/to/cliche-mcp", "run", "clichefactory-mcp"],
"env": {
"CLICHEFACTORY_API_KEY": "cliche-your-key-here",
"CLICHEFACTORY_API_URL": "https://api.clichefactory.com"
}
}
}
}
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"clichefactory": {
"command": "uv",
"args": ["--directory", "/absolute/path/to/cliche-mcp", "run", "clichefactory-mcp"],
"env": {
"LLM_MODEL_NAME": "gemini/gemini-3-flash-preview",
"LLM_API_KEY": "your-gemini-api-key"
}
}
}
}
OpenClaw
Register the MCP server with your OpenClaw agent:
openclaw mcp set clichefactory '{"command":"uv","args":["--directory","/absolute/path/to/cliche-mcp","run","clichefactory-mcp"],"env":{"LLM_MODEL_NAME":"gemini/gemini-3-flash-preview","LLM_API_KEY":"your-gemini-api-key"}}'
For service mode:
openclaw mcp set clichefactory '{"command":"uv","args":["--directory","/absolute/path/to/cliche-mcp","run","clichefactory-mcp"],"env":{"CLICHEFACTORY_API_KEY":"cliche-your-key-here","CLICHEFACTORY_API_URL":"https://api.clichefactory.com"}}'
Verify with openclaw mcp list. The agent can now use extract, to_markdown, and doctor tools in any conversation.
An OpenClaw skill with agent instructions is also available in integrations/openclaw/. To install it into your workspace:
cp -r /path/to/cliche-mcp/integrations/openclaw ~/.openclaw/skills/clichefactory
Or, once published to ClawHub:
openclaw skills install clichefactory
When published on PyPI
Once clichefactory-mcp is on PyPI, replace the command in any of the above configurations with uvx:
Cursor / Claude Desktop:
{
"mcpServers": {
"clichefactory": {
"command": "uvx",
"args": ["clichefactory-mcp"],
"env": {
"LLM_MODEL_NAME": "gemini/gemini-3-flash-preview",
"LLM_API_KEY": "your-gemini-api-key"
}
}
}
}
OpenClaw:
openclaw mcp set clichefactory '{"command":"uvx","args":["clichefactory-mcp"],"env":{"LLM_MODEL_NAME":"gemini/gemini-3-flash-preview","LLM_API_KEY":"your-gemini-api-key"}}'
Supported File Types
PDF, PNG, JPG, JPEG, WebP, GIF, BMP, DOCX, DOC, ODT, XLSX, CSV, EML, TXT, MD.
Differences from the CLI
This MCP server covers the core extraction and conversion workflows. The following CLI features are not included in v1:
| Feature | Reason |
|---|---|
Batch operations (extract-batch, to-markdown-batch) |
MCP tools are typically called one-at-a-time by the LLM. For multiple documents, the LLM calls extract in sequence. Batch support may be added in a future version. |
configure |
Interactive prompts don't work in MCP. Use env vars or run clichefactory configure in a terminal. |
--output / -o flag |
MCP tools return results directly to the LLM rather than writing to files. |
allow_partial |
Not exposed as a tool parameter in v1. |
| OCR engine selection | Uses the SDK defaults (RapidOCR). Configure via ~/.clichefactory/config.toml or pass parsing options through the SDK if needed. |
Development
# Install in development mode
uv sync
# Run the server directly (stdio transport, for testing with MCP clients)
uv run clichefactory-mcp
# Inspect available tools (requires mcp CLI)
uv run mcp dev cliche_mcp/server.py
License
MIT — Copyright (c) 2026 Urban Susnik s.p.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clichefactory_mcp-0.1.0.tar.gz.
File metadata
- Download URL: clichefactory_mcp-0.1.0.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed27a1750931faead8b91ef54273836702c753a8f6e5df0786841e9e024c3e41
|
|
| MD5 |
06034a71f87c3385e1c60089b665e49f
|
|
| BLAKE2b-256 |
b30c70e7d224fe422d46284f924ce2f02a2b0067ecd717cd874636091d388be7
|
File details
Details for the file clichefactory_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: clichefactory_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0720ceee40d9ea505cec6f7536fd68175267e699a7972bf4422a0af79ce9b208
|
|
| MD5 |
2ae9f19d97a68041bb4188646dfc5344
|
|
| BLAKE2b-256 |
14079fbff2a1ba8af89a183f6b2b8b3868cc33fd02b515dd0f87e8f1857bf08b
|