Model Context Protocol server for the UniProt protein knowledgebase.
Project description
uniprot-mcp
A Model Context Protocol server for the
UniProt protein knowledgebase with
per-query provenance verification. 41 tools across 8
families. Apache-2.0. Every successful response carries a verifiable
Provenance record — UniProt release, retrieval timestamp, resolved
URL, and a SHA-256 of the canonical response body — that the agent
(or a third party, a year later) can re-check with a single tool
call: uniprot_provenance_verify.
The wedge: **per-response SHA-256 + verify primitive + release pinning
- offline replay** is, to the best of my survey of public MCPs as of 2026-04-26, absent from every other bio-MCP server I could find (BioMCP, Augmented-Nature/UniProt-MCP, biothings-mcp, gget-mcp, and others). If you are a regulated-bio-pharma user who needs to prove, years later, that a UniProt-derived claim still holds, this is the mechanism. Comparison and citations: docs/COMPETITIVE_LANDSCAPE.md.
Author: Santiago Maniches · ORCID 0009-0005-6480-1987 · TOPOLOGICA LLC
For researchers — where to start
If you are a biomedical researcher visiting this repo, the highest-signal places to look are:
| Resource | What it gives you |
|---|---|
examples/atlas/ |
25 disease & target worked examples — TP53, BRCA1, CFTR, HTT, EGFR, BRAF, KRAS, TEM-1 β-lactamase, more — each linking the canonical UniProt accession to MONDO / OMIM / PharmGKB / ARO IDs and the relevant tool sequence. JSON-LD manifest at examples/atlas/atlas.json for machine consumption. Methodology (how compiled, what's verified, what's community-reviewable) at examples/atlas/METHODOLOGY.md. |
examples/01..04.jsonl |
Full Claude-Desktop transcripts of clinical-variant interpretation (TP53 R175H), drug-target dossier (BRCA1), provenance verification a year later, pathogen drug-discovery (TEM-1). |
tests/benchmark/ |
Pre-registered 30-prompt benchmark with SHA-256 commitments on main. The 2026-04-26 v1.1.0 run verified 30/30 against live UniProt — transcript at tests/benchmark/run-2026-04-26-v1.1.0/. |
scripts/replicate.sh |
One-command verification that the published PyPI wheel was built from this exact repo (cross-checks SHA-256 across PyPI / GitHub Release / SLSA attestation; runs --self-test; re-runs the benchmark live). POSIX + scripts/replicate.ps1 for Windows. |
docs/COMPETITIVE_LANDSCAPE.md |
Honest 14-server survey of the bio-MCP space (April 2026) and the specific differentiation this server claims. |
Issues / corrections welcome at https://github.com/smaniches/uniprot-mcp/issues. The atlas in particular is community-reviewable — see METHODOLOGY.md for what is machine-verified vs what needs human review.
What makes this different
| uniprot-mcp | Vanilla LLM + WebFetch | A typical bio-MCP | |
|---|---|---|---|
| Tool surface | 41 tools, 8 families | none — caller writes URLs | usually 5–10 |
| Provenance on every response | release • date • URL • SHA-256 | none | sometimes URL only |
| Per-query auditability | uniprot_provenance_verify re-checks any prior response |
not possible | not possible |
| Release pinning | --pin-release=YYYY_MM raises on drift |
n/a | n/a |
| Pre-registered benchmark | 30 prompts, SHA-256 committed on main + reproducible verifier |
n/a | n/a |
| Local provenance cache | offline replay via UNIPROT_MCP_CACHE_DIR |
n/a | n/a |
| Clinical primitives | sequence chemistry / position-aware features / HGVS variant lookup / disease associations / AlphaFold pLDDT / ClinVar | none | none |
| Composition tool | uniprot_target_dossier — one call, nine sections |
n/a | n/a |
| Input validation | regex + length cap before any HTTP call | none | partial |
| Error-channel safety | upstream exception text never echoed to LLM | n/a | partial |
| Cross-origin allowlist | enumerated, threat-modelled, privacy-listed | n/a | usually unaudited |
| Supply chain | SLSA build provenance + Sigstore + CycloneDX SBOM (post-flip) | n/a | rare |
| Test layers | unit + property + contract + client + integration + benchmark | n/a | usually unit only |
| Mutation testing | weekly + on-demand workflow; baseline measurement on v1.1.0; ≥ 95 % gate planned post-baseline | n/a | rare |
The provenance + verify chain is, in my 2026-04-26 survey, absent
from every other bio-MCP I could find. A regulated user can take any
prior uniprot-mcp answer and prove — without contacting the author
— that UniProt still returns the same bytes, or detect exactly how the
upstream has drifted. If you find a counter-example I missed, please
file an issue and I will update the comparison.
Tools (41)
Eight endpoint families. All read-only (readOnlyHint: true). All
but uniprot_replay_from_cache interact with at least one upstream
service (openWorldHint: true). No UniProt API key required.
Core UniProtKB (10)
| Tool | Purpose |
|---|---|
uniprot_get_entry |
Full UniProt entry (e.g. P04637 for p53). Function, gene, organism, disease, cross-refs. |
uniprot_search |
UniProt query language — gene, organism, taxon ID, reviewed flag, free text. |
uniprot_get_sequence |
FASTA. PIR-style provenance comment block above the first record (BLAST+ / biopython compatible). |
uniprot_get_features |
Domains, binding sites, PTMs, signal peptides — optional type filter. |
uniprot_get_variants |
Natural variants and disease mutations. |
uniprot_get_go_terms |
GO annotations grouped by aspect (F / P / C). |
uniprot_get_cross_refs |
Raw cross-references to PDB, Pfam, Ensembl, Reactome, KEGG, STRING … |
uniprot_id_mapping |
Map IDs between databases (Gene_Name → UniProtKB, PDB → UniProtKB, …). |
uniprot_batch_entries |
Up to 100 entries in one call; invalid accessions filtered client-side. |
uniprot_taxonomy_search |
Search UniProt taxonomy by organism name. |
Controlled vocabularies (4)
| Tool | Purpose |
|---|---|
uniprot_get_keyword |
Keyword by ID (e.g. KW-0007 = Acetylation). Definition, synonyms, GO refs, hierarchy. |
uniprot_search_keywords |
Free-text keyword search. |
uniprot_get_subcellular_location |
Subcellular-location term by ID (e.g. SL-0039 = Cell membrane). |
uniprot_search_subcellular_locations |
Free-text location search. |
Sequence archives & clusters (4)
| Tool | Purpose |
|---|---|
uniprot_get_uniref |
UniRef cluster by ID (UniRef50_P04637, UniRef90_P04637, UniRef100_P04637). |
uniprot_search_uniref |
Cluster search with identity_tier filter (50 / 90 / 100). |
uniprot_get_uniparc |
Sequence-archive record by UPI (UPI000002ED67). |
uniprot_search_uniparc |
UniParc full-text search. |
Proteomes & literature (4)
| Tool | Purpose |
|---|---|
uniprot_get_proteome |
Proteome by UP ID (UP000005640 = human). Counts, BUSCO score, components. |
uniprot_search_proteomes |
Filter by organism / type / completeness. |
uniprot_get_citation |
Citation record by ID (typically a PubMed numeric ID). |
uniprot_search_citations |
Index search across UniProt citations. |
Structured cross-DB resolvers (4)
Gateway-only — no calls leave the UniProt origin. These extract the relevant cross-references from a UniProt entry and return structured records (typed lists / objects, not passthrough strings).
| Tool | Purpose |
|---|---|
uniprot_resolve_pdb |
PDB structures: id + method + resolution + chain coverage. |
uniprot_resolve_alphafold |
AlphaFold model id + EBI viewer URL (model id only — for pLDDT call the dedicated tool below). |
uniprot_resolve_interpro |
InterPro signatures: id + entry name. |
uniprot_resolve_chembl |
ChEMBL drug-target id + EBI target-card URL. |
Biomedical features (7)
Pure-Python compositions over the entry — no extra origin. The first
four answer per-residue and per-variant questions; the last three are
the v1.1.0 expansion targeting drug discovery, therapeutic-protein
engineering, and pathogen-secretion analysis: each is a filter over the
entry's features array, with a structured grouping by feature type
and an honest empty-set advisory.
| Tool | Purpose |
|---|---|
uniprot_compute_properties |
Derived sequence chemistry from the FASTA: MW / pI / GRAVY / aromaticity / charge / ε₂₈₀. |
uniprot_features_at_position |
Every feature overlapping a residue position. Critical for variant-effect interpretation. |
uniprot_lookup_variant |
HGVS-shorthand match (R175H, V600E, R248*) against UniProt's natural-variant features. |
uniprot_get_disease_associations |
Structured disease records from DISEASE-type comments: name + acronym + UniProt disease ID + OMIM cross-ref + description. |
uniprot_get_active_sites |
Catalytic and ligand-binding residues: active sites, binding sites, sites, metal binding, DNA binding. The residue-level chemistry of the protein. |
uniprot_get_processing_features |
Maturation features: signal peptide, propeptide, transit peptide, initiator methionine, chain, peptide. Essential for therapeutic-protein engineering and pathogen-secretion analysis. |
uniprot_get_ptms |
Post-translational modifications: modified residues (phospho/acetyl/methyl), glycosylation, lipidation (GPI/prenyl/palmitoyl), disulfide bonds, cross-links. |
Cross-origin enrichment (3)
The only tools that consult origins outside rest.uniprot.org. Each is documented in PRIVACY.md and in the threat model.
| Tool | Origin | Purpose |
|---|---|---|
uniprot_get_alphafold_confidence |
alphafold.ebi.ac.uk |
pLDDT mean + four-band distribution; lets the agent decide whether to trust the model. |
uniprot_resolve_clinvar |
eutils.ncbi.nlm.nih.gov |
ClinVar significance + condition + review status by gene + optional HGVS shorthand. |
uniprot_get_publications |
rest.uniprot.org |
Pure-Python over the entry's references — listed here because it complements the cross-origin enrichment. |
Composition + provenance (5)
| Tool | Purpose |
|---|---|
uniprot_resolve_orthology |
Group orthology cross-references by source DB (KEGG / OMA / OrthoDB / eggNOG / 8 more). |
uniprot_get_evidence_summary |
Aggregate ECO codes (Evidence and Conclusion Ontology) across an entry. Distinguishes wet-lab confirmed from inferred-by-similarity from automatic. |
uniprot_target_dossier |
One-call comprehensive characterisation: nine sections — identity / function / chemistry / structure / drug-target / disease / variants / functional annotations / cross-refs. |
uniprot_provenance_verify |
Re-fetch a previously recorded URL and compare release tag + canonical response SHA-256. Five verdicts (verified, release_drift, hash_drift, release_and_hash_drift, url_unreachable) each with an advice string. |
uniprot_replay_from_cache |
Read a cached UniProt response without hitting the upstream. Opt-in via UNIPROT_MCP_CACHE_DIR. |
Provenance & verification
Every successful tool response includes a footer like:
---
_Source: UniProt release 2026_01 (28-January-2026) • Retrieved 2026-04-25T17:09:00Z_
_Query: https://rest.uniprot.org/uniprotkb/P04637_
_SHA-256: 0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e_
A year later, an auditor can call uniprot_provenance_verify with
those exact fields:
> uniprot_provenance_verify(
url="https://rest.uniprot.org/uniprotkb/P04637",
release="2026_01",
response_sha256="0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e"
)
## Provenance Verification
**Status:** verified
**URL:** https://rest.uniprot.org/uniprotkb/P04637
- ✓ URL resolves (HTTP 200)
- ✓ Release: recorded '2026_01', current '2026_01'
- ✓ Response SHA-256: recorded 0040d79bb39e2f73…, current 0040d79bb39e2f73…
**Advice:** Both checks passed. The recorded provenance is reproducible against the live UniProt API.
If UniProt has moved on, the tool tells you exactly how:
| Verdict | Meaning | Advice |
|---|---|---|
verified |
Both release and hash match | The provenance is reproducible |
release_drift |
UniProt released a new version | Pin via the FTP snapshot if you need the historical answer |
hash_drift |
Same release, body changed | An in-release edit; investigate or re-fetch |
release_and_hash_drift |
Both moved on | Use a release-specific FTP snapshot |
url_unreachable |
Endpoint dropped or rate-limited | Retry or report to UniProt |
For strict reproducibility, opt into release pinning:
export UNIPROT_PIN_RELEASE=2026_01
uniprot-mcp
# every response is checked against the pinned release;
# any drift raises `ReleaseMismatchError`, which the server surfaces
# as an agent-actionable error envelope.
For offline replay (post-cache-population):
export UNIPROT_MCP_CACHE_DIR=~/.uniprot-mcp-cache
uniprot-mcp
# every successful response is mirrored to disk; later replay via
# uniprot_replay_from_cache(url) without touching the upstream.
A live end-to-end demonstration is committed at
tests/benchmark/run-2026-04-25-roundtrip/transcript.md
— real values, real verdicts, no mocks.
Pre-registered benchmark
tests/benchmark/ ships a 30-prompt evaluation (Tier A / B / C × 10)
with SHA-256-committed expected answers on main. The plaintext
expected.jsonl is held local-only until a benchmark run is
published; the cryptographic commitments mean the author cannot
rewrite "correct" answers post-hoc.
A reviewer can re-derive every Tier A and B answer from primary-source UniProt REST in two commands:
python tests/benchmark/verify_answers.py tests/benchmark/expected.jsonl
# OK: all 30 prompts verified against https://rest.uniprot.org
python tests/benchmark/verify.py tests/benchmark/expected.jsonl tests/benchmark/expected.hashes.jsonl
# OK: 30 commitments verified
See tests/benchmark/AUDIT.md for the
per-prompt source attribution and the formal independence statement
(uniprot-mcp was not used during answer authoring).
Install
pip install uniprot-mcp-server # PyPI distribution
# or, for a pinned, isolated install:
uvx --from uniprot-mcp-server uniprot-mcp
Why three different names? This is the standard Python packaging pattern, exactly because PyPI's namespace is global and collisions force disambiguation:
Concept Value What it is GitHub repository smaniches/uniprot-mcpsource code + issue tracker PyPI distribution uniprot-mcp-serverwhat you pip install(the bareuniprot-mcpname was already claimed on PyPI when this project published)Python module uniprot_mcpwhat you import(PEP-8 underscore form)Console script + MCP server identity uniprot-mcpwhat you run from the shell and what Claude Desktop sees Cross-checks that prove the wheel you installed was built from this repo: each release ships a Sigstore signature, SLSA build provenance, and a CycloneDX SBOM, all attached to the v1.1.0 GitHub Release. Run
bash scripts/replicate.sh(POSIX) orpwsh scripts/replicate.ps1(Windows) to verify the full chain end-to-end. Common precedents for the same one-thing-three-names pattern:pillow/PIL,python-dateutil/dateutil,beautifulsoup4/bs4,python-Levenshtein/Levenshtein.
From source:
git clone https://github.com/smaniches/uniprot-mcp.git
cd uniprot-mcp
pip install -e .
Claude Desktop
claude_desktop_config.json:
{
"mcpServers": {
"uniprot": {
"command": "uniprot-mcp"
}
}
}
For pinned, reproducibility-grade access:
{
"mcpServers": {
"uniprot": {
"command": "uniprot-mcp",
"args": ["--pin-release=2026_01"]
}
}
}
For offline replay via the local provenance cache:
{
"mcpServers": {
"uniprot": {
"command": "uniprot-mcp",
"env": {
"UNIPROT_MCP_CACHE_DIR": "/absolute/path/to/cache"
}
}
}
}
Claude Code (CLI)
claude mcp add uniprot -- uniprot-mcp
Self-test (live UniProt smoke check)
uniprot-mcp --self-test
# [tools] registered: 41/41
# [live] P04637 -> TP53 OK
# [PASS]
Example workflows
1. Clinical-variant interpretation packet for TP53 R175H.
> What's at residue 175 of P04637? Is R175H a known variant? Pull
> the UniProt and ClinVar evidence and tell me how confident the
> AlphaFold model is at that residue.
→ uniprot_features_at_position("P04637", 175)
→ uniprot_lookup_variant("P04637", "R175H")
→ uniprot_resolve_clinvar("P04637", change="R175H")
→ uniprot_get_alphafold_confidence("P04637")
2. Drug-target dossier in one call.
> Give me a complete drug-target characterisation of human BRCA1.
→ uniprot_target_dossier("P38398")
# nine sections, two upstream calls (entry + FASTA), one tool call.
3. Sequence chemistry for buffer choice / expression-system selection.
> What's the molecular weight, pI, and hydrophobicity of human insulin?
→ uniprot_compute_properties("P01308")
# MW 11,981 Da, pI 4.93, ε₂₈₀ 24,980 M⁻¹·cm⁻¹ — pure Python on the FASTA.
4. Provenance round-trip — proving an answer is reproducible.
> [later, with the provenance footer from a prior session in hand]
> Verify the recorded provenance for P04637.
→ uniprot_provenance_verify(
url="https://rest.uniprot.org/uniprotkb/P04637",
release="2026_01",
response_sha256="0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e"
)
5. Air-gapped clinical workflow with sealed cache.
# Day 1, online: cache populates as queries run.
export UNIPROT_MCP_CACHE_DIR=~/sealed-cache
# … every uniprot-mcp tool call writes to ~/sealed-cache/<sha>.json
# Day N, offline: replay any prior answer.
> uniprot_replay_from_cache("https://rest.uniprot.org/uniprotkb/P04637")
Testing
| Layer | Path | What |
|---|---|---|
| Unit | tests/unit/ |
Behaviour of every public function. |
| Property | tests/property/ |
Hypothesis-driven invariants on regexes + query construction. |
| Contract | tests/contract/ |
Manifest / pyproject / docs / incident-policy / benchmark drift prevention. |
| Client | tests/client/ |
Retry / back-off / id-mapping polling against respx-mocked HTTP. |
| Integration | tests/integration/ |
Live UniProt + AlphaFold; opt-in via --integration. |
| Benchmark | tests/benchmark/ |
30 SHA-256-committed prompts + reproducible verifier. |
446 offline + 42 live integration tests, all green (real counts via pytest --collect-only on commit 01ab7a8). Mypy (strict),
ruff (check + format), bandit (0 issues at any severity), pip-audit
(--strict, no known vulnerabilities) all clean. Mutation testing
(mutmut) gate ≥ 95 % kill, populated post-billing-reset.
# Fast, offline (CI on every push):
pytest tests/unit tests/property tests/client tests/contract -v
# Live UniProt (opt-in, nightly in CI):
pytest --integration tests/integration -v
# Lint / type-check / security / SCA:
ruff check . && ruff format --check . && mypy src/uniprot_mcp
bandit -r src/uniprot_mcp && pip-audit --strict
Architecture & threat model
docs/THREAT_MODEL.md— twelve STRIDE-shaped threats, each receipt-anchored to a code path or commit SHA, plus the cross-origin allowlist policy (§T3b).docs/INCIDENT_POLICY.md+docs/POSTMORTEM_TEMPLATE.md+docs/INCIDENT_LOG.md— every nightly integration breakage triggers a postmortem entry.AUDIT.md— pre-1.0.1 professional audit, P0/P1 remediations recorded.docs/MERGE_PLAN.md— five-phase merge → tag → flip operational plan with rollback policy.docs/PENDING_V1.md— the binary punch list to v1.0.1.mkdocs.yml— Material-themed docs site, deployable togh-pagesvia.github/workflows/docs.yml. Build locally withpip install -e ".[docs]" && mkdocs serve.
Citation
Cite via CITATION.cff (GitHub renders a "Cite this
repository" button). Always also cite the UniProt Consortium:
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Research (2025). doi:10.1093/nar/gkae1010
License
Apache-2.0 — see LICENSE and NOTICE.
This project is the gateway layer of the Topologica
Bio MCP suite. Multi-source
orchestration and tamper-evident provenance ledgers live in the
companion topologica-bio repository under BUSL-1.1 (Change Date
2030-04-19, auto-reverts to Apache-2.0). uniprot-mcp itself is and
will remain permissively Apache-2.0.
Copyright © 2026 Santiago Maniches. TOPOLOGICA LLC.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uniprot_mcp_server-1.1.2.tar.gz.
File metadata
- Download URL: uniprot_mcp_server-1.1.2.tar.gz
- Upload date:
- Size: 164.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a63ce32bda2d4139ff7df41f0becc95a3f6311bdc14586b663c22f4e30c5ba5
|
|
| MD5 |
d3c3286284d8ce43a0c1f3e2ab961916
|
|
| BLAKE2b-256 |
b23510ee75e1be82b60f4738ff4c4605a436e698f7e74718db4c5ad2b1394f8c
|
Provenance
The following attestation bundles were made for uniprot_mcp_server-1.1.2.tar.gz:
Publisher:
release.yml on smaniches/uniprot-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uniprot_mcp_server-1.1.2.tar.gz -
Subject digest:
7a63ce32bda2d4139ff7df41f0becc95a3f6311bdc14586b663c22f4e30c5ba5 - Sigstore transparency entry: 1394237045
- Sigstore integration time:
-
Permalink:
smaniches/uniprot-mcp@96e58540decea639399629deebeb491dc176cc6d -
Branch / Tag:
refs/tags/v1.1.2 - Owner: https://github.com/smaniches
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@96e58540decea639399629deebeb491dc176cc6d -
Trigger Event:
push
-
Statement type:
File details
Details for the file uniprot_mcp_server-1.1.2-py3-none-any.whl.
File metadata
- Download URL: uniprot_mcp_server-1.1.2-py3-none-any.whl
- Upload date:
- Size: 65.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47875abbe1ba6b51ea236eaee061ce3bb5f189851b9a7a9e73f40a031d9b31ed
|
|
| MD5 |
ace00c295bcd1f89455e91a3dffbc1d9
|
|
| BLAKE2b-256 |
fda248b8982eee99eb10a1a2154d02efea76993c7d3b710f905ee8ebdc074dbb
|
Provenance
The following attestation bundles were made for uniprot_mcp_server-1.1.2-py3-none-any.whl:
Publisher:
release.yml on smaniches/uniprot-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uniprot_mcp_server-1.1.2-py3-none-any.whl -
Subject digest:
47875abbe1ba6b51ea236eaee061ce3bb5f189851b9a7a9e73f40a031d9b31ed - Sigstore transparency entry: 1394237123
- Sigstore integration time:
-
Permalink:
smaniches/uniprot-mcp@96e58540decea639399629deebeb491dc176cc6d -
Branch / Tag:
refs/tags/v1.1.2 - Owner: https://github.com/smaniches
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@96e58540decea639399629deebeb491dc176cc6d -
Trigger Event:
push
-
Statement type: