Skip to main content

MCP server for Chinese figurative language lookup, backed by CC-CEDICT

Project description

zh-dict-mcp

MCP server for Chinese figurative language lookup, backed by CC-CEDICT.

What it does: given a Chinese word or phrase, tells you whether its figurative usage has been lexicalized (recorded in the dictionary as an independent sense) or is a one-off creative expression.

Why it exists: LLMs writing Chinese dialogue, fiction, or roleplay tend to invent purple-prose figurative expressions that no real person would say (e.g., "他把心锁进铁盒里" / "墙比夜更厚"). This tool gives you an objective dictionary-backed check.


Install + use in one line

In Claude Code (or any MCP-aware client):

claude mcp add zh-dict-mcp -- uvx zh-dict-mcp

Or paste into your .mcp.json:

{
  "mcpServers": {
    "zh-dict-mcp": {
      "command": "uvx",
      "args": ["zh-dict-mcp"]
    }
  }
}

That's it. uvx pulls the package from PyPI on first run, caches it, launches the stdio MCP server. No pip install needed.

The lookup_dictionary tool is now available in your Claude Code sessions.


What you get

A single MCP tool:

lookup_dictionary(word: string) → JSON

Example: lookup_dictionary("看见") returns:

{
  "word": "看见",
  "found_in_cedict": true,
  "simplified": "看见",
  "traditional": "看見",
  "pinyin": "kan4 jian4",
  "definitions": ["to see", "to catch sight of"],
  "tags": {
    "has_figurative": false,
    "is_neologism": false,
    "is_slang": false,
    "has_idiom_marker": false
  }
}

Example: lookup_dictionary("内卷") returns:

{
  "word": "内卷",
  "found_in_cedict": true,
  "definitions": [
    "(embryology) to involute; involution",
    "(neologism, attested by 2017) (of a society) to become more and more involuted..."
  ],
  "tags": { "is_neologism": true, ... }
}

Example: lookup_dictionary("锁进铁盒里") (a creative one-off) returns:

{
  "word": "锁进铁盒里",
  "found_in_cedict": false,
  "found_in_whitelist": false,
  "definitions": []
}

Use cases

  • AI-generated dialogue review: catch live metaphors LLM invents but no real speaker would use
  • AI writing lint: pipeline filter for game NPC dialogue / interactive fiction / chatbot scripts
  • Lexicalization research: check whether a figurative expression has been recorded in standard dictionaries
  • New word verification: confirm neologisms / slang with (neologism, attested by YEAR) attribution
  • Idiom / 典故 lookup: get figurative sense for idioms like "滑铁卢" → "(fig.) a defeat"

Data source

CC-CEDICT — open Chinese-English dictionary, 12.5万条目, community-maintained, weekly updates.

License: CC BY-SA 4.0. Bundled in package. See LICENSE-CC-CEDICT.

Why CC-CEDICT vs 现代汉语词典 (XDHYCD) or other sources:

Source Coverage on AI-writing test set Notes
chinese-xinhua (GitHub data) 46% Heavy classical/古汉语 bias
现代汉语词典 第7版 (XDHYCD7th) 56% Doesn't list literal compound words (放下/抓住/等等)
CC-CEDICT ~95% Modern usage + neologisms + (fig.) / (slang) / (neologism) markers

CC-CEDICT explicitly tags figurative senses, neologisms with attestation years, slang, and idioms — exactly the structure needed for figurative-language analysis.


Optional: project whitelist

For project-specific overrides (e.g., words CC-CEDICT happens to miss):

# my_whitelist.yaml
allowed:
  - word: 凛然
    note: Standard literary usage, CC-CEDICT misses it
  - word: 头疼
    note: Override to include "annoyance" figurative sense

Pass via CLI:

{
  "mcpServers": {
    "zh-dict-mcp": {
      "command": "uvx",
      "args": ["zh-dict-mcp", "--whitelist", "/abs/path/to/my_whitelist.yaml"]
    }
  }
}

Or via env var ZH_DICT_WHITELIST=/path/to/file.yaml.

When a word is in the whitelist, the result includes "found_in_whitelist": true and the note.


Python API (no MCP needed)

Use the lookup library directly without launching a server:

from zh_dict_mcp import DictionaryLookup

lookup = DictionaryLookup()  # bundled CC-CEDICT loads in ~200ms
result = lookup.lookup("滑铁卢")

print(result.found)              # True
print(result.definitions)        # ['Waterloo (Belgium)', 'Battle of Waterloo (1815)', '(fig.) a defeat']
print(result.tags.has_figurative)  # True
print(result.pinyin)             # 'Hua2 tie3 lu2'

With custom whitelist:

from pathlib import Path
lookup = DictionaryLookup(whitelist_path=Path("my_whitelist.yaml"))

lookup.py has zero external dependencies (stdlib only). The mcp dependency is only needed for the MCP server.


Install standalone (no MCP, just Python library)

pip install zh-dict-mcp

Or with uv:

uv add zh-dict-mcp

Limitations

  • English-language definitions (CC-CEDICT is a Chinese-English dictionary). Works well with LLMs that handle cross-lingual judgment (Claude, GPT-4+, Gemini). For monolingual Chinese consumers you'd need a translation layer.
  • Sense matching is on the caller — this tool returns all senses; deciding whether the speaker's intended sense matches a returned sense is left to the LLM or human reviewer.
  • Single-word / single-phrase lookup — doesn't parse full sentences. Wrap with your own NLP layer for sentence-level work.
  • 9.4 MB data bundle — CC-CEDICT data is included in the wheel for offline use.

How it fits with broader writing-quality pipelines

This tool is one piece of a larger "AI-generated text quality" framework. Typical usage flow:

LLM generates Chinese dialogue
   ↓
Scan for figurative expressions (比喻 / 借代 / 委婉 / ...)
   ↓
For each: lookup_dictionary(expression)
   ↓
  ├── found + sense matches intent → pass
  └── not found or sense mismatch → flag for rewrite

A reference review prompt for this flow is documented in Forgewright (the project that spawned this tool).


Project status

v0.1.0 — initial release. Validated on a 39-case test set covering 6 categories (dead metaphors / live metaphors / literal words / boundary cases / idioms / neologisms) with 100% accuracy.

Bug reports and PRs welcome.

License

  • Code: MIT (see LICENSE)
  • CC-CEDICT data: CC BY-SA 4.0 (see LICENSE-CC-CEDICT)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zh_dict_mcp-0.1.0.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zh_dict_mcp-0.1.0-py3-none-any.whl (4.0 MB view details)

Uploaded Python 3

File details

Details for the file zh_dict_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: zh_dict_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zh_dict_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 85edf11b650a3eb8131499791bf2680b62bf2cfc69b97e5c526c3bf317f703b4
MD5 4978a5d0e06fb5c77fdf0597fe811b5a
BLAKE2b-256 47d8d87b3c020c2d21604614a1d77877b6cbb3c82bd80e2b462e3fb2c1996220

See more details on using hashes here.

File details

Details for the file zh_dict_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: zh_dict_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zh_dict_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe5a1f8f0965d8cc296ea93debdf3cef8c1094d9c0cefadb2e82cbc5658a8590
MD5 76d40f245744b9e9ab8832c4f0e9d4b9
BLAKE2b-256 3ed4422e52a9b4dc91a3d856aaaec59f2c1b09513e3db3b994fe751598802392

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page