Skip to main content

MCP server for ClicheFactory — structured data extraction from documents

Project description

clichefactory-mcp

MCP (Model Context Protocol) server for ClicheFactory — structured data extraction from documents.

This server exposes ClicheFactory's extraction and document conversion capabilities as MCP tools, allowing AI assistants in Cursor, Claude Desktop, OpenClaw, and other MCP-compatible clients to extract structured data from PDFs, images, DOCX, XLSX, CSV, EML, and more.

Tools

Tool Description
extract Extract structured JSON from a document using a schema
to_markdown Convert a document to markdown text
doctor Check configuration, dependencies, and system binaries

extract

The main tool. Pass a document file and a JSON schema — get structured data back.

Supports all extraction modes:

Mode Description Requires
(default) OCR + LLM extraction local: LLM key · service: API key
fast Fastest pipeline Same as default
trained Trained pipeline artifact Service + artifact_id
robust Two-stage extract + verify Service only
robust-trained Trained extract + verification Service + artifact_id

The schema can be provided as:

  • File path: absolute path to a .json schema file
  • Inline dict: the LLM constructs a JSON schema from the conversation (e.g., the user says "extract the invoice number and total" and the LLM builds {"type": "object", "properties": {"invoice_number": {"type": "string"}, "total": {"type": "number"}}})

to_markdown

Converts any supported document to markdown. Useful for inspecting document contents or feeding them to the LLM for analysis before deciding on an extraction schema.

doctor

Runs diagnostics on the ClicheFactory setup — config file, API keys, Python dependencies, system binaries. Call this when things aren't working.

Execution Modes

The server supports two modes, matching the SDK and CLI:

  • local — Runs extraction on your machine. You bring your own LLM key (BYOK). Supports Gemini, OpenAI, Anthropic, and Ollama models. Requires the clichefactory[local] dependencies for document parsing.

  • service — Uses the ClicheFactory cloud service. Requires a ClicheFactory API key. Supports all extraction modes including trained pipelines and robust verification. Optionally accepts BYOK model overrides.

Installation

Prerequisites

  • Python ≥ 3.12
  • uv (recommended) or pip

From PyPI

pip install clichefactory-mcp

For local-mode extraction (document parsing on your machine), install with the local extras:

pip install "clichefactory-mcp[local]"

Configuration

Environment Variables

Set these in your MCP client configuration (see below) or in ~/.clichefactory/config.toml via clichefactory configure.

Variable Required Description
CLICHEFACTORY_API_KEY Service mode ClicheFactory API key (format: cliche-...)
CLICHEFACTORY_API_URL No Service URL override (default: production)
LLM_MODEL_NAME Local mode Model name, e.g. gemini/gemini-3-flash-preview
LLM_API_KEY Local mode API key for the LLM provider
OCR_MODEL_NAME No Separate OCR/VLM model (defaults to main model)
OCR_API_KEY No API key for OCR model (defaults to main key)

The config file at ~/.clichefactory/config.toml (created by clichefactory configure) is also respected. Environment variables take precedence over the config file.

Cursor

Add to .cursor/mcp.json in your project (or global Cursor settings):

{
  "mcpServers": {
    "clichefactory": {
      "command": "uv",
      "args": ["--directory", "/absolute/path/to/cliche-mcp", "run", "clichefactory-mcp"],
      "env": {
        "LLM_MODEL_NAME": "gemini/gemini-3-flash-preview",
        "LLM_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

For service mode:

{
  "mcpServers": {
    "clichefactory": {
      "command": "uv",
      "args": ["--directory", "/absolute/path/to/cliche-mcp", "run", "clichefactory-mcp"],
      "env": {
        "CLICHEFACTORY_API_KEY": "cliche-your-key-here",
        "CLICHEFACTORY_API_URL": "https://api.clichefactory.com"
      }
    }
  }
}

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "clichefactory": {
      "command": "uv",
      "args": ["--directory", "/absolute/path/to/cliche-mcp", "run", "clichefactory-mcp"],
      "env": {
        "LLM_MODEL_NAME": "gemini/gemini-3-flash-preview",
        "LLM_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

OpenClaw

Register the MCP server with your OpenClaw agent:

openclaw mcp set clichefactory '{"command":"uv","args":["--directory","/absolute/path/to/cliche-mcp","run","clichefactory-mcp"],"env":{"LLM_MODEL_NAME":"gemini/gemini-3-flash-preview","LLM_API_KEY":"your-gemini-api-key"}}'

For service mode:

openclaw mcp set clichefactory '{"command":"uv","args":["--directory","/absolute/path/to/cliche-mcp","run","clichefactory-mcp"],"env":{"CLICHEFACTORY_API_KEY":"cliche-your-key-here","CLICHEFACTORY_API_URL":"https://api.clichefactory.com"}}'

Verify with openclaw mcp list. The agent can now use extract, to_markdown, and doctor tools in any conversation.

An OpenClaw skill with agent instructions is also available in integrations/openclaw/. To install it into your workspace:

cp -r /path/to/cliche-mcp/integrations/openclaw ~/.openclaw/skills/clichefactory

Or, once published to ClawHub:

openclaw skills install clichefactory

When published on PyPI

Once clichefactory-mcp is on PyPI, replace the command in any of the above configurations with uvx:

Cursor / Claude Desktop:

{
  "mcpServers": {
    "clichefactory": {
      "command": "uvx",
      "args": ["clichefactory-mcp"],
      "env": {
        "LLM_MODEL_NAME": "gemini/gemini-3-flash-preview",
        "LLM_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

OpenClaw:

openclaw mcp set clichefactory '{"command":"uvx","args":["clichefactory-mcp"],"env":{"LLM_MODEL_NAME":"gemini/gemini-3-flash-preview","LLM_API_KEY":"your-gemini-api-key"}}'

Supported File Types

PDF, PNG, JPG, JPEG, WebP, GIF, BMP, DOCX, DOC, ODT, XLSX, CSV, EML, TXT, MD.

Differences from the CLI

This MCP server covers the core extraction and conversion workflows. The following CLI features are not included in v1:

Feature Reason
Batch operations (extract-batch, to-markdown-batch) MCP tools are typically called one-at-a-time by the LLM. For multiple documents, the LLM calls extract in sequence. Batch support may be added in a future version.
configure Interactive prompts don't work in MCP. Use env vars or run clichefactory configure in a terminal.
--output / -o flag MCP tools return results directly to the LLM rather than writing to files.
allow_partial Not exposed as a tool parameter in v1.
OCR engine selection Uses the SDK defaults (RapidOCR). Configure via ~/.clichefactory/config.toml or pass parsing options through the SDK if needed.

Development

# Install in development mode
uv sync

# Run the server directly (stdio transport, for testing with MCP clients)
uv run clichefactory-mcp

# Inspect available tools (requires mcp CLI)
uv run mcp dev cliche_mcp/server.py

License

MIT — Copyright (c) 2026 Urban Susnik s.p.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clichefactory_mcp-0.1.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clichefactory_mcp-0.1.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file clichefactory_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: clichefactory_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clichefactory_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ed27a1750931faead8b91ef54273836702c753a8f6e5df0786841e9e024c3e41
MD5 06034a71f87c3385e1c60089b665e49f
BLAKE2b-256 b30c70e7d224fe422d46284f924ce2f02a2b0067ecd717cd874636091d388be7

See more details on using hashes here.

File details

Details for the file clichefactory_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: clichefactory_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clichefactory_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0720ceee40d9ea505cec6f7536fd68175267e699a7972bf4422a0af79ce9b208
MD5 2ae9f19d97a68041bb4188646dfc5344
BLAKE2b-256 14079fbff2a1ba8af89a183f6b2b8b3868cc33fd02b515dd0f87e8f1857bf08b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page