Skip to main content

A traceable Chinese-history MCP server: 4 tools over 9 classical texts, every result cited (book→chapter→paragraph). Zero runtime dependencies.

Project description

chinese-history-mcp

CI License: MIT Data: CC BY 4.0 Python 3.9+ Dependencies: zero MCP Release

A traceable Chinese-history MCP server. Four Model Context Protocol tools over 9 classical Chinese texts (pre-Qin to Wei-Jin — 史记 / 汉书 / 后汉书 / 三国志 / 左传 / 论语 / 孟子 / 吕氏春秋 / 资治通鉴). Every result carries a 【book → chapter → paragraph】 citation, and honestly reports its review_status — the server never claims per-item human review it doesn't have.

一个可溯源的中国历史故事 MCP server:按事件 / 人物 / 今地名 / 品质四轴查询 先秦-汉魏九部正史子书,每条返回都带原文出处,机器生成/机审内容如实标注。

Demo — every result is cited

  • Zero runtime dependencies — pure Python standard library. No pip install of a framework, no MCP SDK; the whole server is auditable in a few files.
  • Read-only — opens the corpus with mode=ro + PRAGMA query_only; never writes.
  • Honest by construction — machine-generated punctuation / translation and machine-adjudicated status are labeled in every response (AIGC-compliant).

Why this exists: as of mid-2026 the public MCP ecosystem has no classical Chinese / Chinese-history server. This fills that gap. Income expectation is zero; the goal is a useful public good.

Contents: The four tools · Install & run · The corpus database · Honesty · Data & provenance · Design notes


The four tools

tool input returns
search_events keyword / book / person / limit Cross-book fused historical events with per-source provenance (book · chapter · paragraph + role: primary/detailed/brief/comment/corroborating). canonical_summary is an LLM-fused machine narrative.
get_person name (given name or alias) Person profile (LLM-synthesized, draft) + others' appraisals (verbatim source quotes, each cited) + attributed qualities + events mentioning them.
query_by_place place (today's place name) / limit Ancient stories set on the land of a modern place, with citations. Same-name-different-place returns candidates for you to disambiguate — it never silently picks one. Directional/regional generic names are excluded.
query_by_quality quality (from a 55-term controlled vocabulary, e.g. 忠 loyalty, 谋略 strategy) / limit / include_draft Representative events and people for a quality, each with an original-text evidence_quote and rationale.

Each tool call returns JSON. Multi-source events, person appraisals, and place/quality edges all carry the exact 【book → chapter → paragraph】 they came from — that is the point of the server.


Install & run

Requires Python 3.7+ (standard library only). The server speaks MCP over stdio (newline-delimited JSON-RPC 2.0).

# 1. Get the corpus DB (see "The corpus database" below), then:
PYTHONPATH=src python3 -m storyextractor.mcp.server --db /path/to/corpus.db

Or pip install . to get a chinese-history-mcp console command:

pip install .
chinese-history-mcp --db /path/to/corpus.db

Configure in an MCP client

Claude Desktop (claude_desktop_config.json), Cline, Continue, etc. — add one stdio server:

{
  "mcpServers": {
    "chinese-history": {
      "command": "python3",
      "args": ["-m", "storyextractor.mcp.server", "--db", "/path/to/corpus.db"],
      "env": { "PYTHONPATH": "src" },
      "cwd": "/absolute/path/to/chinese-history-mcp"
    }
  }
}

(After pip install . you can instead use "command": "chinese-history-mcp", "args": ["--db", "/path/to/corpus.db"] and drop env/cwd.)

Try one handshake by hand

printf '%s\n' \
  '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{}}}' \
  '{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
  '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"query_by_quality","arguments":{"quality":"忠","limit":2}}}' \
  | PYTHONPATH=src python3 -m storyextractor.mcp.server --db /path/to/corpus.db

Demo + hallucination comparison

python3 scripts/mcp_demo.py --db /path/to/corpus.db runs a scripted tour of all four tools (also a minimal MCP-client reference). See docs/MCP_DEMO.md for a side-by-side of a bare LLM (fabricated / uncitable) vs. this server (cited) on the same questions.


The corpus database

corpus.db is not in this repository (it is a ~90 MB binary). Download it from this repo's Releases and point --db at it, or set STORYEXTRACTOR_DB=/path/to/corpus.db.

The database is read-only at runtime. If you host it on a read-only medium, make sure the release artifact was produced with sqlite3 corpus.db "VACUUM INTO 'corpus_release.db'" (single file, no -wal/-shm sidecars).


Honesty (please read)

This server is designed for provenance, not to launder machine output as scholarship. Downstream clients and LLMs must not present its results as "individually human-reviewed." Every response labels what it is:

  • Events review_status='approved' — mostly machine bulk-approved credible inferences, not per-item human review.
  • Person profiles review_status='draft' — LLM-synthesized, not human-vetted.
  • Quality mappingsauto_approved = multi-LLM machine consensus, draft = pending review; evidence_quote is a real substring of the source, rationale is an LLM's reasoning.
  • Place mappings — mostly multi-LLM machine consensus (auto_approved), a few human-approved; confidence is bucketed high/medium/doubtful.
  • Text — original is public-domain 白文 with machine-generated punctuation/segmentation; vernacular translation is fully machine-generated.

The server also does not eliminate downstream hallucination: it gives you citable retrieval facts; an LLM built on top can still confabulate around them. The citations are anchors for human verification.

Scope is the 9 texts above — "not found" means "not in this corpus," not "did not happen."


Data & provenance

  • Original text: public-domain classical Chinese 白文 (unpunctuated base text from public-domain editions), with self-produced, machine-generated punctuation and segmentation (not copied from any modern annotated/collated edition).
  • Vernacular translation: machine-generated across the whole corpus.
  • Annotations (events / entities / places / qualities): machine-assisted, with human review gating on selected layers; status is reported per record.

License

  • Code (this repository): MIT — see LICENSE.
  • Corpus data (corpus.db, distributed via Releases): CC BY 4.0.

The text layer is self-produced (punctuation/segmentation) over public-domain base text, so it is distributed freely; machine-generated attributes are labeled throughout for AIGC compliance.


Design notes

  • Pure stdlib hand-written stdio JSON-RPC 2.0 (initialize / tools/list / tools/call + ping / notifications). No third-party MCP SDK.
  • Read-only DB access (src/storyextractor/mcp/db.py): mode=ro + PRAGMA query_only; the migration-running db.connect is never used at serve time.
  • Tests: python3 tests/test_mcp_server.py (read-only enforcement, protocol shapes/error codes, honest review_status, alias token-exact matching + disambiguation, LIKE-wildcard escaping) — builds a temporary fixture DB, so it runs without corpus.db.

Contributing & project meta

  • CONTRIBUTING.md — how to run tests/lint and the principles this project holds to.
  • CHANGELOG.md — release history.
  • SECURITY.md — threat surface (read-only, no network) and how to report issues.

Issues and pull requests are welcome. Please keep the constraints in mind: zero runtime dependencies, read-only, every result cited, honest review_status.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chinese_history_mcp-0.1.0.tar.gz (34.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chinese_history_mcp-0.1.0-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file chinese_history_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: chinese_history_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 34.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for chinese_history_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd4e69e06dbcdf2226217d454c663b8fe35f4594f23483896c17e8f4ae98a5af
MD5 cc635c7c0cd9d403b12cad2c79922a4d
BLAKE2b-256 ab423e8ab0e15d1b0354c6cb4cd8041ec42b2c240d8158aed06bbb782adbec2f

See more details on using hashes here.

File details

Details for the file chinese_history_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chinese_history_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5369bc162fc9be4f5081bfd4eedb4708785c1e7d4ae841989c375c6e1138484d
MD5 552339b05c9b8bcca33e6aecf45676c0
BLAKE2b-256 4c32ffe9c08158ba3f7eaed74eaea9f172e8757ca9c52ef2457062b5a3ac24f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page