MCP server for DAO-centred research: Zenon DAI, IAA Publications, ADAJ, plus a curated cross-platform paper-search surface (Crossref, OpenAlex, Semantic Scholar, arXiv, CORE, Zenodo, bioRxiv) — every hit pre-renders an inline_citation block for verbatim agent copy.
Project description
dao-paper-search-mcp
DAO-centred MCP server for academic research — Levant archaeology, biblical archaeology, Bronze/Iron Age, plus a curated cross-platform paper-search surface for the wider humanities. Started as a vertical complement to paper-search-mcp; now also reimplements the cross-platform adapters that matter for DAO/DH workflows (Crossref, OpenAlex, Semantic Scholar, arXiv, CORE, Zenodo) so every hit carries the same pre-rendered inline_citation block.
The DAO-specific sources (Zenon DAI, IAA Publications, ADAJ) are the original raison d'être and remain the strongest reason to use this server for Levantine archaeology. The horizontal adapters reduce the need to run paper-search-mcp alongside.
Sources
DAO-specific (Tier 1, original scope)
| Tool | Source | Status |
|---|---|---|
search_zenon |
Zenon DAI (~1M records, multilingual DE/EN/FR/IT/HE/AR) | implemented |
search_iaa |
IAA Publications (ʿAtiqot, HA-ESI, IAA Book Series, Favissa, …) | implemented (OAI-PMH backend since v0.5.0) |
search_adaj |
DoA Publication Archive (ADAJ, SHAJ, Munjazat, JERD, Athar) | implemented |
resolve_author |
Wikidata SPARQL + local override list + GND fallback | implemented |
resolve_site |
iDAI.gazetteer (DAI's authoritative place register) | implemented |
Cross-platform (selective paper-search-mcp substitution)
| Tool | Source | Status |
|---|---|---|
search_crossref |
Crossref (~150M DOI-bearing scholarly works) | implemented (Sprint 1) |
search_openalex |
OpenAlex (~250M works, broadest open scholarly graph) | implemented (Sprint 1) |
search_semantic_scholar |
Semantic Scholar (citation graph, recommendations) | implemented (Sprint 2) |
search_arxiv |
arXiv (preprints, esp. Digital Humanities methods) | implemented (Sprint 2) |
search_core |
CORE (open-access full-text aggregator) | implemented (Sprint 3) |
search_zenodo |
Zenodo (data, software, preprints, every record gets a DOI) | implemented (Sprint 3) |
search_biorxiv |
bioRxiv + medRxiv preprints (via Europe PMC) — aDNA / paleogenomic currency | implemented (Sprint 4) |
Tier 2 (DAO-specific)
| Tool | Source | Status |
|---|---|---|
search_propylaeum |
PropylaeumDOK — FID Altertumswissenschaften OA repository (UB Heidelberg); classical archaeology, ancient history, Near East, Levant; multilingual DE/EN/IT/FR | implemented |
search_ixtheo |
IxTheo — Index Theologicus (UB Tübingen); theology, biblical studies, church history | implemented |
search_openedition |
OpenEdition — ~600 French SSH journals + books incl. Syria, Semitica, Yod, Topoi (OAI-PMH, CC0); multilingual FR/EN | implemented |
search_gnomon |
Gnomon Bibliographische Datenbank (via K10plus SRU) — classical studies bibliography | implemented |
search_perse |
Persée — French humanities journals (SPARQL backend) | planned |
search_tocs_in |
TOCS-IN — Toronto Classics journal tables of contents | planned |
Install / Run
Via uvx directly from GitHub:
uvx --from git+https://github.com/leiverkus/dao-paper-search-mcp \
python -m dao_paper_search_mcp.server
Or clone the repo and run from the working tree:
uvx --from git+file:///path/to/dao-paper-search-mcp \
python -m dao_paper_search_mcp.server
OpenCode integration
Add to ~/.config/opencode/opencode.jsonc under the mcp block:
"dao-paper-search": {
"type": "local",
"command": [
"uvx",
"--from", "git+https://github.com/leiverkus/dao-paper-search-mcp",
"python", "-m", "dao_paper_search_mcp.server"
],
"enabled": true,
"environment": {
"CORE_API_KEY": "your-core-api-key",
"DAO_PAPER_SEARCH_CONTACT_EMAIL": "your-email@example.com",
"DAO_PAPER_SEARCH_RATE_LIMIT_MS": "1000"
}
}
Then update ~/.config/opencode/agent/research.md to route Levant/IAA/DoA-Jordan queries to this server first.
Environment variables
| Variable | Required | Default | Purpose |
|---|---|---|---|
CORE_API_KEY |
yes (for search_core) |
— | Bearer token for the CORE v3 API. Register free at core.ac.uk/services/api. Without it search_core raises an error on every call. |
DAO_PAPER_SEARCH_CONTACT_EMAIL |
no | "anonymous" |
Your e-mail, sent as mailto: in User-Agent headers to polite-pool APIs (OpenAlex, Crossref, CORE, arXiv). Improves rate-limit priority on those APIs. |
SEMANTIC_SCHOLAR_API_KEY |
no | — | API key for Semantic Scholar. Without it the adapter runs on the public bucket (~100 req/min); with it limits rise substantially. Register at semanticscholar.org/product/api. |
DAO_PAPER_SEARCH_RATE_LIMIT_MS |
no | 1000 |
Minimum milliseconds between outbound requests (simple rate limiter). |
Architecture principles
- Vertical scope. Only sources
paper-search-mcpdoes not cover. - Tool independence. No internal calls to
paper-search-mcp. Citation-graph delegation is the agent's responsibility. This preserves the cross-validation property ofresearch.md. - Schema fidelity. All search tools return the same Pydantic model (
DAOPaper). - Structured verification notes. When uncertain, the adapter sets
verification_note, never guesses. - Stdio cleanliness. MCP stdout is reserved for JSON-RPC; all logging goes to stderr.
- Output-shape lock-in for citations. Every hit carries an
inline_citationblock whosemarkdownfield is a pre-rendered Markdown link (with⚠️-prefix whenaudit.warn_markeroraudit.aggregatoris set). The agent copies this verbatim instead of formatting citations itself — structural enforcement of theAGENTS.mdinline-link rule. Two extra fields (authoritative_authors_label,authoritative_bibliography_line) carry the tool-authoritative Author-Year and reference-list strings to prevent DOI-consistent author-year hallucinations. - Centralised DOI normalisation. All adapters run DOIs through
utils.doi.normalize_doi()— stripping resolver prefixes (https://doi.org/,info:doi/,doi:…) and lower-casing (DOI Handbook §2.4). Produces bare, case-insensitive10.<registrant>/…strings for reliable deduplication. The render layer always reconstructs the full link from the bare string, so case drift in upstream APIs never leaks into bibliography output.
Inline citations
Each DAOPaper carries three blocks the agent should consume directly:
identifiers: structured DOI / Zenon / IAA / ADAJ IDs (coexists with the legacydoi_or_idstring).audit:primary_source,aggregator,verification_note,warn_marker— flags that drive the citation renderer.inline_citation: pre-rendered Markdown plus the tool-authoritative bibliography strings.
inline_citation fields (Schema v2.1, since v0.7.1)
| Field | Purpose |
|---|---|
url |
Canonical URL (priority: DOI > OpenAlex > Zenon > IAA > ADAJ > arXiv > Semantic Scholar > CORE > Europe PMC > open_access_url > landing_page_url). |
markdown |
Finished in-text Markdown link. Author-Year form for academic hits ([(Cohen 1979)](https://doi.org/…)), Domain-Title form for web hits ([(example.org — Title…)](url)), Domain-only as last resort, and ⚠️-prefixed for aggregator/warn-flagged hits. Print-only hits (no URL) collapse to fallback_text. Copy this verbatim in prose. |
authoritative_authors_label |
Plain-text Author-Year string ("Finkelstein 1999") — None when no author context exists. Use this when you want to render Author-Year yourself instead of copying markdown; do not reconstruct from authors/year. |
authoritative_bibliography_line |
Full reference-list line with a trailing clickable link: "Finkelstein, I. (1999). Title. *BASOR* 314, 55–70. DOI: [10.2307/1357451](https://doi.org/10.2307/1357451)". DOI takes priority over primary_url; no suffix when neither is present. None when author/year/title metadata is incomplete. Copy verbatim in the references list. |
fallback_text |
"Cohen 1979: 61–79" — used when no url exists (print-only). |
Author-label rules (inline):
- 1 author →
"Cohen 1979". - 2 authors →
"Cohen & Yisrael 1995". - 3 authors →
"Boaretto, Finkelstein & Shahack-Gross 2010"(explicit, no et al.). - ≥4 authors →
"Bruins et al. 2011". - Particle names (
van der Plicht,von Daniken) stay intact in both inline and bibliography forms.
Bibliography author-rules: family + initials, Oxford comma before &, full list (no et al.). 1 author: "Cohen, R."; 2: "Cohen, R., & Yisrael, Y."; ≥3: "Boaretto, E., Finkelstein, I., & Shahack-Gross, R.".
markdown cascade: print-only → fallback_text; aggregator → ⚠️[(domain — Title…)](url); Author + Year → [(Author-Label Year)](url); URL + Title without Author-Year → [(domain — Title…)](url); URL only → [(domain)](url); audit.warn_marker prepends ⚠️ to the link form.
Authority overrides
src/dao_paper_search_mcp/data/authority_overrides.yml is the DAO-curated disambiguation list. To add an entry:
- canonical: "Steven A. Rosen"
variants: ["S.A. Rosen", "Rosen, S.A.", "Steven Rosen"]
q_id: "Q7613131"
domain: "Levant archaeology, lithics, Negev Highlands Survey"
affiliation: "Ben Gurion University"
sites: ["Negev Highlands", "Camel Site"]
resolve_author checks the override list before consulting Wikidata. Add an entry whenever you encounter a misattribution in real research output.
Tests
uv sync --extra test
uv run pytest -v
The verification suite (tests/test_verification_suite.py) contains five frozen reference fingerprints drawn from the 2026-05-15 Negev-fortress test. One of them is a negative test — a hallucinated reference (Ben-Ami 2026 Levant 58(1):25–42) that converged across three LLM outputs. The suite asserts the server returns no result for this query; a false positive would mean the server is echoing the LLM hallucination.
Known limitations
search_iaa has no full-text search
The IAA backend (BePress/Solr) does not expose a public free-text search API. The v0.5.0 OAI-PMH-backed adapter compensates by AND-matching query tokens against title + description + subject + author fields client-side — broad enough for most archaeology queries, but not as deep as a real Solr q= would be. Recommendation: always pass at least a 5-year year_from/year_to window so the OAI listing stays manageable.
The reverse-engineered /do/search/results/json endpoint pulled from the page's 2019 JS bundle has been retired — BePress migrated the route without updating the bundle. See docs/2026-05-15-iaa-solr-probe.md for the full sondierungsbericht.
Disclaimer
MIT licensed. No cloud upload, runs entirely locally. GDPR-compliant: no personal data is collected or transmitted by the server itself.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dao_paper_search_mcp-1.0.0.tar.gz.
File metadata
- Download URL: dao_paper_search_mcp-1.0.0.tar.gz
- Upload date:
- Size: 158.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb7be28eab473d16898e642f28c439f27098ef2cdc1e220cb77b34be5ec029ee
|
|
| MD5 |
f0ad0106fcf495e00869289ad9c52cc5
|
|
| BLAKE2b-256 |
c11537dbb31e81c5b27c99d54eebee81987042e3a35829d35ed94d6ada74b4a7
|
Provenance
The following attestation bundles were made for dao_paper_search_mcp-1.0.0.tar.gz:
Publisher:
publish.yml on leiverkus/dao-paper-search-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dao_paper_search_mcp-1.0.0.tar.gz -
Subject digest:
eb7be28eab473d16898e642f28c439f27098ef2cdc1e220cb77b34be5ec029ee - Sigstore transparency entry: 1622169689
- Sigstore integration time:
-
Permalink:
leiverkus/dao-paper-search-mcp@4677784563a0e756323112ed4588fddd539a146c -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/leiverkus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4677784563a0e756323112ed4588fddd539a146c -
Trigger Event:
release
-
Statement type:
File details
Details for the file dao_paper_search_mcp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dao_paper_search_mcp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 100.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55ee510e46f08231ff2ac56e21a3b8817b6198d6a8194e0b510b8b6aafc0303e
|
|
| MD5 |
1c4ecf92a8fd27cfac3d321ea470a2b8
|
|
| BLAKE2b-256 |
29686f702df0def9fc9b9655d2be4cc853f548be7dd6561780e1ae4f4c7b8120
|
Provenance
The following attestation bundles were made for dao_paper_search_mcp-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on leiverkus/dao-paper-search-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dao_paper_search_mcp-1.0.0-py3-none-any.whl -
Subject digest:
55ee510e46f08231ff2ac56e21a3b8817b6198d6a8194e0b510b8b6aafc0303e - Sigstore transparency entry: 1622169814
- Sigstore integration time:
-
Permalink:
leiverkus/dao-paper-search-mcp@4677784563a0e756323112ed4588fddd539a146c -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/leiverkus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4677784563a0e756323112ed4588fddd539a146c -
Trigger Event:
release
-
Statement type: