Skip to main content

Model Context Protocol server for the UniProt protein knowledgebase.

Project description

uniprot-mcp

CI License: Apache 2.0 Python 3.11+ MCP compatible Tests Provenance: SHA-256 + verify ORCID DOI

A Model Context Protocol server for the UniProt protein knowledgebase with per-query provenance verification. 41 tools across 8 families. Apache-2.0. Every successful response carries a verifiable Provenance record — UniProt release, retrieval timestamp, resolved URL, and a SHA-256 of the canonical response body — that the agent (or a third party, a year later) can re-check with a single tool call: uniprot_provenance_verify.

The wedge: **per-response SHA-256 + verify primitive + release pinning

  • offline replay** is, to the best of my survey of public MCPs as of 2026-04-26, absent from every other bio-MCP server I could find (BioMCP, Augmented-Nature/UniProt-MCP, biothings-mcp, gget-mcp, and others). If you are a regulated-bio-pharma user who needs to prove, years later, that a UniProt-derived claim still holds, this is the mechanism. Comparison and citations: docs/COMPETITIVE_LANDSCAPE.md.

Author: Santiago Maniches · ORCID 0009-0005-6480-1987 · TOPOLOGICA LLC


For researchers — where to start

If you are a biomedical researcher visiting this repo, the highest-signal places to look are:

Resource What it gives you
examples/atlas/ 25 disease & target worked examples — TP53, BRCA1, CFTR, HTT, EGFR, BRAF, KRAS, TEM-1 β-lactamase, more — each linking the canonical UniProt accession to MONDO / OMIM / PharmGKB / ARO IDs and the relevant tool sequence. JSON-LD manifest at examples/atlas/atlas.json for machine consumption. Methodology (how compiled, what's verified, what's community-reviewable) at examples/atlas/METHODOLOGY.md.
examples/01..04.jsonl Full Claude-Desktop transcripts of clinical-variant interpretation (TP53 R175H), drug-target dossier (BRCA1), provenance verification a year later, pathogen drug-discovery (TEM-1).
tests/benchmark/ Pre-registered 30-prompt benchmark with SHA-256 commitments on main. The 2026-04-26 v1.1.0 run verified 30/30 against live UniProt — transcript at tests/benchmark/run-2026-04-26-v1.1.0/.
scripts/replicate.sh One-command verification that the published PyPI wheel was built from this exact repo (cross-checks SHA-256 across PyPI / GitHub Release / SLSA attestation; runs --self-test; re-runs the benchmark live). POSIX + scripts/replicate.ps1 for Windows.
docs/COMPETITIVE_LANDSCAPE.md Honest 14-server survey of the bio-MCP space (April 2026) and the specific differentiation this server claims.

Issues / corrections welcome at https://github.com/smaniches/uniprot-mcp/issues. The atlas in particular is community-reviewable — see METHODOLOGY.md for what is machine-verified vs what needs human review.


What makes this different

uniprot-mcp Vanilla LLM + WebFetch A typical bio-MCP
Tool surface 41 tools, 8 families none — caller writes URLs usually 5–10
Provenance on every response release • date • URL • SHA-256 none sometimes URL only
Per-query auditability uniprot_provenance_verify re-checks any prior response not possible not possible
Release pinning --pin-release=YYYY_MM raises on drift n/a n/a
Pre-registered benchmark 30 prompts, SHA-256 committed on main + reproducible verifier n/a n/a
Local provenance cache offline replay via UNIPROT_MCP_CACHE_DIR n/a n/a
Clinical primitives sequence chemistry / position-aware features / HGVS variant lookup / disease associations / AlphaFold pLDDT / ClinVar none none
Composition tool uniprot_target_dossier — one call, nine sections n/a n/a
Input validation regex + length cap before any HTTP call none partial
Error-channel safety upstream exception text never echoed to LLM n/a partial
Cross-origin allowlist enumerated, threat-modelled, privacy-listed n/a usually unaudited
Supply chain SLSA build provenance + Sigstore + CycloneDX SBOM (post-flip) n/a rare
Test layers unit + property + contract + client + integration + benchmark n/a usually unit only
Mutation testing weekly + on-demand workflow; baseline measurement on v1.1.0; ≥ 95 % gate planned post-baseline n/a rare

The provenance + verify chain is, in my 2026-04-26 survey, absent from every other bio-MCP I could find. A regulated user can take any prior uniprot-mcp answer and prove — without contacting the author — that UniProt still returns the same bytes, or detect exactly how the upstream has drifted. If you find a counter-example I missed, please file an issue and I will update the comparison.


Tools (41)

Eight endpoint families. All read-only (readOnlyHint: true). All but uniprot_replay_from_cache interact with at least one upstream service (openWorldHint: true). No UniProt API key required.

Core UniProtKB (10)

Tool Purpose
uniprot_get_entry Full UniProt entry (e.g. P04637 for p53). Function, gene, organism, disease, cross-refs.
uniprot_search UniProt query language — gene, organism, taxon ID, reviewed flag, free text.
uniprot_get_sequence FASTA. PIR-style provenance comment block above the first record (BLAST+ / biopython compatible).
uniprot_get_features Domains, binding sites, PTMs, signal peptides — optional type filter.
uniprot_get_variants Natural variants and disease mutations.
uniprot_get_go_terms GO annotations grouped by aspect (F / P / C).
uniprot_get_cross_refs Raw cross-references to PDB, Pfam, Ensembl, Reactome, KEGG, STRING …
uniprot_id_mapping Map IDs between databases (Gene_Name → UniProtKB, PDB → UniProtKB, …).
uniprot_batch_entries Up to 100 entries in one call; invalid accessions filtered client-side.
uniprot_taxonomy_search Search UniProt taxonomy by organism name.

Controlled vocabularies (4)

Tool Purpose
uniprot_get_keyword Keyword by ID (e.g. KW-0007 = Acetylation). Definition, synonyms, GO refs, hierarchy.
uniprot_search_keywords Free-text keyword search.
uniprot_get_subcellular_location Subcellular-location term by ID (e.g. SL-0039 = Cell membrane).
uniprot_search_subcellular_locations Free-text location search.

Sequence archives & clusters (4)

Tool Purpose
uniprot_get_uniref UniRef cluster by ID (UniRef50_P04637, UniRef90_P04637, UniRef100_P04637).
uniprot_search_uniref Cluster search with identity_tier filter (50 / 90 / 100).
uniprot_get_uniparc Sequence-archive record by UPI (UPI000002ED67).
uniprot_search_uniparc UniParc full-text search.

Proteomes & literature (4)

Tool Purpose
uniprot_get_proteome Proteome by UP ID (UP000005640 = human). Counts, BUSCO score, components.
uniprot_search_proteomes Filter by organism / type / completeness.
uniprot_get_citation Citation record by ID (typically a PubMed numeric ID).
uniprot_search_citations Index search across UniProt citations.

Structured cross-DB resolvers (4)

Gateway-only — no calls leave the UniProt origin. These extract the relevant cross-references from a UniProt entry and return structured records (typed lists / objects, not passthrough strings).

Tool Purpose
uniprot_resolve_pdb PDB structures: id + method + resolution + chain coverage.
uniprot_resolve_alphafold AlphaFold model id + EBI viewer URL (model id only — for pLDDT call the dedicated tool below).
uniprot_resolve_interpro InterPro signatures: id + entry name.
uniprot_resolve_chembl ChEMBL drug-target id + EBI target-card URL.

Biomedical features (7)

Pure-Python compositions over the entry — no extra origin. The first four answer per-residue and per-variant questions; the last three are the v1.1.0 expansion targeting drug discovery, therapeutic-protein engineering, and pathogen-secretion analysis: each is a filter over the entry's features array, with a structured grouping by feature type and an honest empty-set advisory.

Tool Purpose
uniprot_compute_properties Derived sequence chemistry from the FASTA: MW / pI / GRAVY / aromaticity / charge / ε₂₈₀.
uniprot_features_at_position Every feature overlapping a residue position. Critical for variant-effect interpretation.
uniprot_lookup_variant HGVS-shorthand match (R175H, V600E, R248*) against UniProt's natural-variant features.
uniprot_get_disease_associations Structured disease records from DISEASE-type comments: name + acronym + UniProt disease ID + OMIM cross-ref + description.
uniprot_get_active_sites Catalytic and ligand-binding residues: active sites, binding sites, sites, metal binding, DNA binding. The residue-level chemistry of the protein.
uniprot_get_processing_features Maturation features: signal peptide, propeptide, transit peptide, initiator methionine, chain, peptide. Essential for therapeutic-protein engineering and pathogen-secretion analysis.
uniprot_get_ptms Post-translational modifications: modified residues (phospho/acetyl/methyl), glycosylation, lipidation (GPI/prenyl/palmitoyl), disulfide bonds, cross-links.

Cross-origin enrichment (3)

The only tools that consult origins outside rest.uniprot.org. Each is documented in PRIVACY.md and in the threat model.

Tool Origin Purpose
uniprot_get_alphafold_confidence alphafold.ebi.ac.uk pLDDT mean + four-band distribution; lets the agent decide whether to trust the model.
uniprot_resolve_clinvar eutils.ncbi.nlm.nih.gov ClinVar significance + condition + review status by gene + optional HGVS shorthand.
uniprot_get_publications rest.uniprot.org Pure-Python over the entry's references — listed here because it complements the cross-origin enrichment.

Composition + provenance (5)

Tool Purpose
uniprot_resolve_orthology Group orthology cross-references by source DB (KEGG / OMA / OrthoDB / eggNOG / 8 more).
uniprot_get_evidence_summary Aggregate ECO codes (Evidence and Conclusion Ontology) across an entry. Distinguishes wet-lab confirmed from inferred-by-similarity from automatic.
uniprot_target_dossier One-call comprehensive characterisation: nine sections — identity / function / chemistry / structure / drug-target / disease / variants / functional annotations / cross-refs.
uniprot_provenance_verify Re-fetch a previously recorded URL and compare release tag + canonical response SHA-256. Five verdicts (verified, release_drift, hash_drift, release_and_hash_drift, url_unreachable) each with an advice string.
uniprot_replay_from_cache Read a cached UniProt response without hitting the upstream. Opt-in via UNIPROT_MCP_CACHE_DIR.

Provenance & verification

Every successful tool response includes a footer like:

---
_Source: UniProt release 2026_01 (28-January-2026) • Retrieved 2026-04-25T17:09:00Z_
_Query: https://rest.uniprot.org/uniprotkb/P04637_
_SHA-256: 0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e_

A year later, an auditor can call uniprot_provenance_verify with those exact fields:

> uniprot_provenance_verify(
    url="https://rest.uniprot.org/uniprotkb/P04637",
    release="2026_01",
    response_sha256="0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e"
  )

## Provenance Verification

**Status:** verified

**URL:** https://rest.uniprot.org/uniprotkb/P04637
- ✓ URL resolves (HTTP 200)
- ✓ Release: recorded '2026_01', current '2026_01'
- ✓ Response SHA-256: recorded 0040d79bb39e2f73…, current 0040d79bb39e2f73…

**Advice:** Both checks passed. The recorded provenance is reproducible against the live UniProt API.

If UniProt has moved on, the tool tells you exactly how:

Verdict Meaning Advice
verified Both release and hash match The provenance is reproducible
release_drift UniProt released a new version Pin via the FTP snapshot if you need the historical answer
hash_drift Same release, body changed An in-release edit; investigate or re-fetch
release_and_hash_drift Both moved on Use a release-specific FTP snapshot
url_unreachable Endpoint dropped or rate-limited Retry or report to UniProt

For strict reproducibility, opt into release pinning:

export UNIPROT_PIN_RELEASE=2026_01
uniprot-mcp
# every response is checked against the pinned release;
# any drift raises `ReleaseMismatchError`, which the server surfaces
# as an agent-actionable error envelope.

For offline replay (post-cache-population):

export UNIPROT_MCP_CACHE_DIR=~/.uniprot-mcp-cache
uniprot-mcp
# every successful response is mirrored to disk; later replay via
# uniprot_replay_from_cache(url) without touching the upstream.

A live end-to-end demonstration is committed at tests/benchmark/run-2026-04-25-roundtrip/transcript.md — real values, real verdicts, no mocks.


Pre-registered benchmark

tests/benchmark/ ships a 30-prompt evaluation (Tier A / B / C × 10) with SHA-256-committed expected answers on main. The plaintext expected.jsonl is held local-only until a benchmark run is published; the cryptographic commitments mean the author cannot rewrite "correct" answers post-hoc.

A reviewer can re-derive every Tier A and B answer from primary-source UniProt REST in two commands:

python tests/benchmark/verify_answers.py tests/benchmark/expected.jsonl
# OK: all 30 prompts verified against https://rest.uniprot.org

python tests/benchmark/verify.py tests/benchmark/expected.jsonl tests/benchmark/expected.hashes.jsonl
# OK: 30 commitments verified

See tests/benchmark/AUDIT.md for the per-prompt source attribution and the formal independence statement (uniprot-mcp was not used during answer authoring).


Install

pip install uniprot-mcp-server   # PyPI distribution
# or, for a pinned, isolated install:
uvx --from uniprot-mcp-server uniprot-mcp

Why three different names? This is the standard Python packaging pattern, exactly because PyPI's namespace is global and collisions force disambiguation:

Concept Value What it is
GitHub repository smaniches/uniprot-mcp source code + issue tracker
PyPI distribution uniprot-mcp-server what you pip install (the bare uniprot-mcp name was already claimed on PyPI when this project published)
Python module uniprot_mcp what you import (PEP-8 underscore form)
Console script + MCP server identity uniprot-mcp what you run from the shell and what Claude Desktop sees

Cross-checks that prove the wheel you installed was built from this repo: each release ships a Sigstore signature, SLSA build provenance, and a CycloneDX SBOM, all attached to the v1.1.0 GitHub Release. Run bash scripts/replicate.sh (POSIX) or pwsh scripts/replicate.ps1 (Windows) to verify the full chain end-to-end. Common precedents for the same one-thing-three-names pattern: pillow/PIL, python-dateutil/dateutil, beautifulsoup4/bs4, python-Levenshtein/Levenshtein.

From source:

git clone https://github.com/smaniches/uniprot-mcp.git
cd uniprot-mcp
pip install -e .

Claude Desktop

claude_desktop_config.json:

{
  "mcpServers": {
    "uniprot": {
      "command": "uniprot-mcp"
    }
  }
}

For pinned, reproducibility-grade access:

{
  "mcpServers": {
    "uniprot": {
      "command": "uniprot-mcp",
      "args": ["--pin-release=2026_01"]
    }
  }
}

For offline replay via the local provenance cache:

{
  "mcpServers": {
    "uniprot": {
      "command": "uniprot-mcp",
      "env": {
        "UNIPROT_MCP_CACHE_DIR": "/absolute/path/to/cache"
      }
    }
  }
}

Claude Code (CLI)

claude mcp add uniprot -- uniprot-mcp

Self-test (live UniProt smoke check)

uniprot-mcp --self-test
# [tools] registered: 41/41
# [live] P04637 -> TP53 OK
# [PASS]

Example workflows

1. Clinical-variant interpretation packet for TP53 R175H.

> What's at residue 175 of P04637? Is R175H a known variant? Pull
> the UniProt and ClinVar evidence and tell me how confident the
> AlphaFold model is at that residue.
→ uniprot_features_at_position("P04637", 175)
→ uniprot_lookup_variant("P04637", "R175H")
→ uniprot_resolve_clinvar("P04637", change="R175H")
→ uniprot_get_alphafold_confidence("P04637")

2. Drug-target dossier in one call.

> Give me a complete drug-target characterisation of human BRCA1.
→ uniprot_target_dossier("P38398")
   # nine sections, two upstream calls (entry + FASTA), one tool call.

3. Sequence chemistry for buffer choice / expression-system selection.

> What's the molecular weight, pI, and hydrophobicity of human insulin?
→ uniprot_compute_properties("P01308")
   # MW 11,981 Da, pI 4.93, ε₂₈₀ 24,980 M⁻¹·cm⁻¹ — pure Python on the FASTA.

4. Provenance round-trip — proving an answer is reproducible.

> [later, with the provenance footer from a prior session in hand]
> Verify the recorded provenance for P04637.
→ uniprot_provenance_verify(
    url="https://rest.uniprot.org/uniprotkb/P04637",
    release="2026_01",
    response_sha256="0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e"
  )

5. Air-gapped clinical workflow with sealed cache.

# Day 1, online: cache populates as queries run.
export UNIPROT_MCP_CACHE_DIR=~/sealed-cache
# … every uniprot-mcp tool call writes to ~/sealed-cache/<sha>.json
# Day N, offline: replay any prior answer.
> uniprot_replay_from_cache("https://rest.uniprot.org/uniprotkb/P04637")

Testing

Layer Path What
Unit tests/unit/ Behaviour of every public function.
Property tests/property/ Hypothesis-driven invariants on regexes + query construction.
Contract tests/contract/ Manifest / pyproject / docs / incident-policy / benchmark drift prevention.
Client tests/client/ Retry / back-off / id-mapping polling against respx-mocked HTTP.
Integration tests/integration/ Live UniProt + AlphaFold; opt-in via --integration.
Benchmark tests/benchmark/ 30 SHA-256-committed prompts + reproducible verifier.

446 offline + 42 live integration tests, all green (real counts via pytest --collect-only on commit 01ab7a8). Mypy (strict), ruff (check + format), bandit (0 issues at any severity), pip-audit (--strict, no known vulnerabilities) all clean. Mutation testing (mutmut) gate ≥ 95 % kill, populated post-billing-reset.

# Fast, offline (CI on every push):
pytest tests/unit tests/property tests/client tests/contract -v

# Live UniProt (opt-in, nightly in CI):
pytest --integration tests/integration -v

# Lint / type-check / security / SCA:
ruff check . && ruff format --check . && mypy src/uniprot_mcp
bandit -r src/uniprot_mcp && pip-audit --strict

Architecture & threat model


Citation

Cite via CITATION.cff (GitHub renders a "Cite this repository" button). Always also cite the UniProt Consortium:

The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Research (2025). doi:10.1093/nar/gkae1010


License

Apache-2.0 — see LICENSE and NOTICE.

This project is the gateway layer of the Topologica Bio MCP suite. Multi-source orchestration and tamper-evident provenance ledgers live in the companion topologica-bio repository under BUSL-1.1 (Change Date 2030-04-19, auto-reverts to Apache-2.0). uniprot-mcp itself is and will remain permissively Apache-2.0.

Copyright © 2026 Santiago Maniches. TOPOLOGICA LLC.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniprot_mcp_server-1.1.2.tar.gz (164.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uniprot_mcp_server-1.1.2-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file uniprot_mcp_server-1.1.2.tar.gz.

File metadata

  • Download URL: uniprot_mcp_server-1.1.2.tar.gz
  • Upload date:
  • Size: 164.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for uniprot_mcp_server-1.1.2.tar.gz
Algorithm Hash digest
SHA256 7a63ce32bda2d4139ff7df41f0becc95a3f6311bdc14586b663c22f4e30c5ba5
MD5 d3c3286284d8ce43a0c1f3e2ab961916
BLAKE2b-256 b23510ee75e1be82b60f4738ff4c4605a436e698f7e74718db4c5ad2b1394f8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniprot_mcp_server-1.1.2.tar.gz:

Publisher: release.yml on smaniches/uniprot-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file uniprot_mcp_server-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for uniprot_mcp_server-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 47875abbe1ba6b51ea236eaee061ce3bb5f189851b9a7a9e73f40a031d9b31ed
MD5 ace00c295bcd1f89455e91a3dffbc1d9
BLAKE2b-256 fda248b8982eee99eb10a1a2154d02efea76993c7d3b710f905ee8ebdc074dbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniprot_mcp_server-1.1.2-py3-none-any.whl:

Publisher: release.yml on smaniches/uniprot-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page