MCP server for Chinese figurative language lookup, backed by CC-CEDICT
Project description
zh-dict-mcp
MCP server for Chinese figurative language lookup, backed by CC-CEDICT.
What it does: given a Chinese word or phrase, tells you whether its figurative usage has been lexicalized (recorded in the dictionary as an independent sense) or is a one-off creative expression.
Why it exists: LLMs writing Chinese dialogue, fiction, or roleplay tend to invent purple-prose figurative expressions that no real person would say (e.g., "他把心锁进铁盒里" / "墙比夜更厚"). This tool gives you an objective dictionary-backed check.
Install + use in one line
In Claude Code (or any MCP-aware client):
claude mcp add zh-dict-mcp -- uvx zh-dict-mcp
Or paste into your .mcp.json:
{
"mcpServers": {
"zh-dict-mcp": {
"command": "uvx",
"args": ["zh-dict-mcp"]
}
}
}
That's it. uvx pulls the package from PyPI on first run, caches it, launches the stdio MCP server. No pip install needed.
The lookup_dictionary tool is now available in your Claude Code sessions.
What you get
A single MCP tool:
lookup_dictionary(word: string) → JSON
Example: lookup_dictionary("看见") returns:
{
"word": "看见",
"found_in_cedict": true,
"simplified": "看见",
"traditional": "看見",
"pinyin": "kan4 jian4",
"definitions": ["to see", "to catch sight of"],
"tags": {
"has_figurative": false,
"is_neologism": false,
"is_slang": false,
"has_idiom_marker": false
}
}
Example: lookup_dictionary("内卷") returns:
{
"word": "内卷",
"found_in_cedict": true,
"definitions": [
"(embryology) to involute; involution",
"(neologism, attested by 2017) (of a society) to become more and more involuted..."
],
"tags": { "is_neologism": true, ... }
}
Example: lookup_dictionary("锁进铁盒里") (a creative one-off) returns:
{
"word": "锁进铁盒里",
"found_in_cedict": false,
"found_in_whitelist": false,
"definitions": []
}
Use cases
- AI-generated dialogue review: catch live metaphors LLM invents but no real speaker would use
- AI writing lint: pipeline filter for game NPC dialogue / interactive fiction / chatbot scripts
- Lexicalization research: check whether a figurative expression has been recorded in standard dictionaries
- New word verification: confirm neologisms / slang with
(neologism, attested by YEAR)attribution - Idiom / 典故 lookup: get figurative sense for idioms like "滑铁卢" → "(fig.) a defeat"
Data source
CC-CEDICT — open Chinese-English dictionary, 12.5万条目, community-maintained, weekly updates.
License: CC BY-SA 4.0. Bundled in package. See LICENSE-CC-CEDICT.
Why CC-CEDICT vs 现代汉语词典 (XDHYCD) or other sources:
| Source | Coverage on AI-writing test set | Notes |
|---|---|---|
| chinese-xinhua (GitHub data) | 46% | Heavy classical/古汉语 bias |
| 现代汉语词典 第7版 (XDHYCD7th) | 56% | Doesn't list literal compound words (放下/抓住/等等) |
| CC-CEDICT | ~95% | Modern usage + neologisms + (fig.) / (slang) / (neologism) markers |
CC-CEDICT explicitly tags figurative senses, neologisms with attestation years, slang, and idioms — exactly the structure needed for figurative-language analysis.
Optional: project whitelist
For project-specific overrides (e.g., words CC-CEDICT happens to miss):
# my_whitelist.yaml
allowed:
- word: 凛然
note: Standard literary usage, CC-CEDICT misses it
- word: 头疼
note: Override to include "annoyance" figurative sense
Pass via CLI:
{
"mcpServers": {
"zh-dict-mcp": {
"command": "uvx",
"args": ["zh-dict-mcp", "--whitelist", "/abs/path/to/my_whitelist.yaml"]
}
}
}
Or via env var ZH_DICT_WHITELIST=/path/to/file.yaml.
When a word is in the whitelist, the result includes "found_in_whitelist": true and the note.
Python API (no MCP needed)
Use the lookup library directly without launching a server:
from zh_dict_mcp import DictionaryLookup
lookup = DictionaryLookup() # bundled CC-CEDICT loads in ~200ms
result = lookup.lookup("滑铁卢")
print(result.found) # True
print(result.definitions) # ['Waterloo (Belgium)', 'Battle of Waterloo (1815)', '(fig.) a defeat']
print(result.tags.has_figurative) # True
print(result.pinyin) # 'Hua2 tie3 lu2'
With custom whitelist:
from pathlib import Path
lookup = DictionaryLookup(whitelist_path=Path("my_whitelist.yaml"))
lookup.py has zero external dependencies (stdlib only). The mcp dependency is only needed for the MCP server.
Install standalone (no MCP, just Python library)
pip install zh-dict-mcp
Or with uv:
uv add zh-dict-mcp
Limitations
- English-language definitions (CC-CEDICT is a Chinese-English dictionary). Works well with LLMs that handle cross-lingual judgment (Claude, GPT-4+, Gemini). For monolingual Chinese consumers you'd need a translation layer.
- Sense matching is on the caller — this tool returns all senses; deciding whether the speaker's intended sense matches a returned sense is left to the LLM or human reviewer.
- Single-word / single-phrase lookup — doesn't parse full sentences. Wrap with your own NLP layer for sentence-level work.
- 9.4 MB data bundle — CC-CEDICT data is included in the wheel for offline use.
How it fits with broader writing-quality pipelines
This tool is one piece of a larger "AI-generated text quality" framework. Typical usage flow:
LLM generates Chinese dialogue
↓
Scan for figurative expressions (比喻 / 借代 / 委婉 / ...)
↓
For each: lookup_dictionary(expression)
↓
├── found + sense matches intent → pass
└── not found or sense mismatch → flag for rewrite
A reference review prompt for this flow is documented in Forgewright (the project that spawned this tool).
Project status
v0.1.0 — initial release. Validated on a 39-case test set covering 6 categories (dead metaphors / live metaphors / literal words / boundary cases / idioms / neologisms) with 100% accuracy.
Bug reports and PRs welcome.
License
- Code: MIT (see
LICENSE) - CC-CEDICT data: CC BY-SA 4.0 (see
LICENSE-CC-CEDICT)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zh_dict_mcp-0.1.0.tar.gz.
File metadata
- Download URL: zh_dict_mcp-0.1.0.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85edf11b650a3eb8131499791bf2680b62bf2cfc69b97e5c526c3bf317f703b4
|
|
| MD5 |
4978a5d0e06fb5c77fdf0597fe811b5a
|
|
| BLAKE2b-256 |
47d8d87b3c020c2d21604614a1d77877b6cbb3c82bd80e2b462e3fb2c1996220
|
File details
Details for the file zh_dict_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: zh_dict_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe5a1f8f0965d8cc296ea93debdf3cef8c1094d9c0cefadb2e82cbc5658a8590
|
|
| MD5 |
76d40f245744b9e9ab8832c4f0e9d4b9
|
|
| BLAKE2b-256 |
3ed4422e52a9b4dc91a3d856aaaec59f2c1b09513e3db3b994fe751598802392
|