Skip to main content

Model Context Protocol server for the UniProt protein knowledgebase.

Project description

uniprot-mcp

CI License: Apache 2.0 Python 3.11+ MCP compatible Tests Provenance: SHA-256 + verify ORCID

A reference-quality Model Context Protocol server for the UniProt protein knowledgebase. 41 tools across 8 families. Every successful response carries a verifiable Provenance record — UniProt release, retrieval timestamp, resolved URL, and a SHA-256 of the canonical response body — that the agent (or a third party, a year later) can re-check with a single tool call: uniprot_provenance_verify.

Author: Santiago Maniches · ORCID 0009-0005-6480-1987 · TOPOLOGICA LLC


What makes this different

uniprot-mcp Vanilla LLM + WebFetch A typical bio-MCP
Tool surface 41 tools, 8 families none — caller writes URLs usually 5–10
Provenance on every response release • date • URL • SHA-256 none sometimes URL only
Per-query auditability uniprot_provenance_verify re-checks any prior response not possible not possible
Release pinning --pin-release=YYYY_MM raises on drift n/a n/a
Pre-registered benchmark 30 prompts, SHA-256 committed on main + reproducible verifier n/a n/a
Local provenance cache offline replay via UNIPROT_MCP_CACHE_DIR n/a n/a
Clinical primitives sequence chemistry / position-aware features / HGVS variant lookup / disease associations / AlphaFold pLDDT / ClinVar none none
Composition tool uniprot_target_dossier — one call, nine sections n/a n/a
Input validation regex + length cap before any HTTP call none partial
Error-channel safety upstream exception text never echoed to LLM n/a partial
Cross-origin allowlist enumerated, threat-modelled, privacy-listed n/a usually unaudited
Supply chain SLSA build provenance + Sigstore + CycloneDX SBOM (post-flip) n/a rare
Test layers unit + property + contract + client + integration + benchmark n/a usually unit only
Mutation testing target ≥ 95 % kill (gated, not aspirational) n/a rare

The provenance + verify chain is the single feature nothing else in the bio-MCP space currently has. A regulated user can take any prior uniprot-mcp answer and prove — without contacting the author — that UniProt still returns the same bytes, or detect exactly how the upstream has drifted.


Tools (41)

Eight endpoint families. All read-only (readOnlyHint: true). All but uniprot_replay_from_cache interact with at least one upstream service (openWorldHint: true). No UniProt API key required.

Core UniProtKB (10)

Tool Purpose
uniprot_get_entry Full UniProt entry (e.g. P04637 for p53). Function, gene, organism, disease, cross-refs.
uniprot_search UniProt query language — gene, organism, taxon ID, reviewed flag, free text.
uniprot_get_sequence FASTA. PIR-style provenance comment block above the first record (BLAST+ / biopython compatible).
uniprot_get_features Domains, binding sites, PTMs, signal peptides — optional type filter.
uniprot_get_variants Natural variants and disease mutations.
uniprot_get_go_terms GO annotations grouped by aspect (F / P / C).
uniprot_get_cross_refs Raw cross-references to PDB, Pfam, Ensembl, Reactome, KEGG, STRING …
uniprot_id_mapping Map IDs between databases (Gene_Name → UniProtKB, PDB → UniProtKB, …).
uniprot_batch_entries Up to 100 entries in one call; invalid accessions filtered client-side.
uniprot_taxonomy_search Search UniProt taxonomy by organism name.

Controlled vocabularies (4)

Tool Purpose
uniprot_get_keyword Keyword by ID (e.g. KW-0007 = Acetylation). Definition, synonyms, GO refs, hierarchy.
uniprot_search_keywords Free-text keyword search.
uniprot_get_subcellular_location Subcellular-location term by ID (e.g. SL-0039 = Cell membrane).
uniprot_search_subcellular_locations Free-text location search.

Sequence archives & clusters (4)

Tool Purpose
uniprot_get_uniref UniRef cluster by ID (UniRef50_P04637, UniRef90_P04637, UniRef100_P04637).
uniprot_search_uniref Cluster search with identity_tier filter (50 / 90 / 100).
uniprot_get_uniparc Sequence-archive record by UPI (UPI000002ED67).
uniprot_search_uniparc UniParc full-text search.

Proteomes & literature (4)

Tool Purpose
uniprot_get_proteome Proteome by UP ID (UP000005640 = human). Counts, BUSCO score, components.
uniprot_search_proteomes Filter by organism / type / completeness.
uniprot_get_citation Citation record by ID (typically a PubMed numeric ID).
uniprot_search_citations Index search across UniProt citations.

Structured cross-DB resolvers (4)

Gateway-only — no calls leave the UniProt origin. These extract the relevant cross-references from a UniProt entry and return structured records (typed lists / objects, not passthrough strings).

Tool Purpose
uniprot_resolve_pdb PDB structures: id + method + resolution + chain coverage.
uniprot_resolve_alphafold AlphaFold model id + EBI viewer URL (model id only — for pLDDT call the dedicated tool below).
uniprot_resolve_interpro InterPro signatures: id + entry name.
uniprot_resolve_chembl ChEMBL drug-target id + EBI target-card URL.

Biomedical features (7)

Pure-Python compositions over the entry — no extra origin. The first four answer per-residue and per-variant questions; the last three are the v1.1.0 expansion targeting drug discovery, therapeutic-protein engineering, and pathogen-secretion analysis: each is a filter over the entry's features array, with a structured grouping by feature type and an honest empty-set advisory.

Tool Purpose
uniprot_compute_properties Derived sequence chemistry from the FASTA: MW / pI / GRAVY / aromaticity / charge / ε₂₈₀.
uniprot_features_at_position Every feature overlapping a residue position. Critical for variant-effect interpretation.
uniprot_lookup_variant HGVS-shorthand match (R175H, V600E, R248*) against UniProt's natural-variant features.
uniprot_get_disease_associations Structured disease records from DISEASE-type comments: name + acronym + UniProt disease ID + OMIM cross-ref + description.
uniprot_get_active_sites Catalytic and ligand-binding residues: active sites, binding sites, sites, metal binding, DNA binding. The residue-level chemistry of the protein.
uniprot_get_processing_features Maturation features: signal peptide, propeptide, transit peptide, initiator methionine, chain, peptide. Essential for therapeutic-protein engineering and pathogen-secretion analysis.
uniprot_get_ptms Post-translational modifications: modified residues (phospho/acetyl/methyl), glycosylation, lipidation (GPI/prenyl/palmitoyl), disulfide bonds, cross-links.

Cross-origin enrichment (3)

The only tools that consult origins outside rest.uniprot.org. Each is documented in PRIVACY.md and in the threat model.

Tool Origin Purpose
uniprot_get_alphafold_confidence alphafold.ebi.ac.uk pLDDT mean + four-band distribution; lets the agent decide whether to trust the model.
uniprot_resolve_clinvar eutils.ncbi.nlm.nih.gov ClinVar significance + condition + review status by gene + optional HGVS shorthand.
uniprot_get_publications rest.uniprot.org Pure-Python over the entry's references — listed here because it complements the cross-origin enrichment.

Composition + provenance (5)

Tool Purpose
uniprot_resolve_orthology Group orthology cross-references by source DB (KEGG / OMA / OrthoDB / eggNOG / 8 more).
uniprot_get_evidence_summary Aggregate ECO codes (Evidence and Conclusion Ontology) across an entry. Distinguishes wet-lab confirmed from inferred-by-similarity from automatic.
uniprot_target_dossier One-call comprehensive characterisation: nine sections — identity / function / chemistry / structure / drug-target / disease / variants / functional annotations / cross-refs.
uniprot_provenance_verify Re-fetch a previously recorded URL and compare release tag + canonical response SHA-256. Five verdicts (verified, release_drift, hash_drift, release_and_hash_drift, url_unreachable) each with an advice string.
uniprot_replay_from_cache Read a cached UniProt response without hitting the upstream. Opt-in via UNIPROT_MCP_CACHE_DIR.

Provenance & verification

Every successful tool response includes a footer like:

---
_Source: UniProt release 2026_01 (28-January-2026) • Retrieved 2026-04-25T17:09:00Z_
_Query: https://rest.uniprot.org/uniprotkb/P04637_
_SHA-256: 0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e_

A year later, an auditor can call uniprot_provenance_verify with those exact fields:

> uniprot_provenance_verify(
    url="https://rest.uniprot.org/uniprotkb/P04637",
    release="2026_01",
    response_sha256="0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e"
  )

## Provenance Verification

**Status:** verified

**URL:** https://rest.uniprot.org/uniprotkb/P04637
- ✓ URL resolves (HTTP 200)
- ✓ Release: recorded '2026_01', current '2026_01'
- ✓ Response SHA-256: recorded 0040d79bb39e2f73…, current 0040d79bb39e2f73…

**Advice:** Both checks passed. The recorded provenance is reproducible against the live UniProt API.

If UniProt has moved on, the tool tells you exactly how:

Verdict Meaning Advice
verified Both release and hash match The provenance is reproducible
release_drift UniProt released a new version Pin via the FTP snapshot if you need the historical answer
hash_drift Same release, body changed An in-release edit; investigate or re-fetch
release_and_hash_drift Both moved on Use a release-specific FTP snapshot
url_unreachable Endpoint dropped or rate-limited Retry or report to UniProt

For strict reproducibility, opt into release pinning:

export UNIPROT_PIN_RELEASE=2026_01
uniprot-mcp
# every response is checked against the pinned release;
# any drift raises `ReleaseMismatchError`, which the server surfaces
# as an agent-actionable error envelope.

For offline replay (post-cache-population):

export UNIPROT_MCP_CACHE_DIR=~/.uniprot-mcp-cache
uniprot-mcp
# every successful response is mirrored to disk; later replay via
# uniprot_replay_from_cache(url) without touching the upstream.

A live end-to-end demonstration is committed at tests/benchmark/run-2026-04-25-roundtrip/transcript.md — real values, real verdicts, no mocks.


Pre-registered benchmark

tests/benchmark/ ships a 30-prompt evaluation (Tier A / B / C × 10) with SHA-256-committed expected answers on main. The plaintext expected.jsonl is held local-only until a benchmark run is published; the cryptographic commitments mean the author cannot rewrite "correct" answers post-hoc.

A reviewer can re-derive every Tier A and B answer from primary-source UniProt REST in two commands:

python tests/benchmark/verify_answers.py tests/benchmark/expected.jsonl
# OK: all 30 prompts verified against https://rest.uniprot.org

python tests/benchmark/verify.py tests/benchmark/expected.jsonl tests/benchmark/expected.hashes.jsonl
# OK: 30 commitments verified

See tests/benchmark/AUDIT.md for the per-prompt source attribution and the formal independence statement (uniprot-mcp was not used during answer authoring).


Install

pip install uniprot-mcp-server   # PyPI distribution
# or, for a pinned, isolated install:
uvx --from uniprot-mcp-server uniprot-mcp

From source:

git clone https://github.com/smaniches/uniprot-mcp.git
cd uniprot-mcp
pip install -e .

Claude Desktop

claude_desktop_config.json:

{
  "mcpServers": {
    "uniprot": {
      "command": "uniprot-mcp"
    }
  }
}

For pinned, reproducibility-grade access:

{
  "mcpServers": {
    "uniprot": {
      "command": "uniprot-mcp",
      "args": ["--pin-release=2026_01"]
    }
  }
}

For offline replay via the local provenance cache:

{
  "mcpServers": {
    "uniprot": {
      "command": "uniprot-mcp",
      "env": {
        "UNIPROT_MCP_CACHE_DIR": "/absolute/path/to/cache"
      }
    }
  }
}

Claude Code (CLI)

claude mcp add uniprot -- uniprot-mcp

Self-test (live UniProt smoke check)

uniprot-mcp --self-test
# [tools] registered: 41/41
# [live] P04637 -> TP53 OK
# [PASS]

Example workflows

1. Clinical-variant interpretation packet for TP53 R175H.

> What's at residue 175 of P04637? Is R175H a known variant? Pull
> the UniProt and ClinVar evidence and tell me how confident the
> AlphaFold model is at that residue.
→ uniprot_features_at_position("P04637", 175)
→ uniprot_lookup_variant("P04637", "R175H")
→ uniprot_resolve_clinvar("P04637", change="R175H")
→ uniprot_get_alphafold_confidence("P04637")

2. Drug-target dossier in one call.

> Give me a complete drug-target characterisation of human BRCA1.
→ uniprot_target_dossier("P38398")
   # nine sections, two upstream calls (entry + FASTA), one tool call.

3. Sequence chemistry for buffer choice / expression-system selection.

> What's the molecular weight, pI, and hydrophobicity of human insulin?
→ uniprot_compute_properties("P01308")
   # MW 11,981 Da, pI 4.93, ε₂₈₀ 24,980 M⁻¹·cm⁻¹ — pure Python on the FASTA.

4. Provenance round-trip — proving an answer is reproducible.

> [later, with the provenance footer from a prior session in hand]
> Verify the recorded provenance for P04637.
→ uniprot_provenance_verify(
    url="https://rest.uniprot.org/uniprotkb/P04637",
    release="2026_01",
    response_sha256="0040d79bb39e2f7386d55f81071e87858ec2e5c2cd9552e93c3633897f78345e"
  )

5. Air-gapped clinical workflow with sealed cache.

# Day 1, online: cache populates as queries run.
export UNIPROT_MCP_CACHE_DIR=~/sealed-cache
# … every uniprot-mcp tool call writes to ~/sealed-cache/<sha>.json
# Day N, offline: replay any prior answer.
> uniprot_replay_from_cache("https://rest.uniprot.org/uniprotkb/P04637")

Testing

Layer Path What
Unit tests/unit/ Behaviour of every public function.
Property tests/property/ Hypothesis-driven invariants on regexes + query construction.
Contract tests/contract/ Manifest / pyproject / docs / incident-policy / benchmark drift prevention.
Client tests/client/ Retry / back-off / id-mapping polling against respx-mocked HTTP.
Integration tests/integration/ Live UniProt + AlphaFold; opt-in via --integration.
Benchmark tests/benchmark/ 30 SHA-256-committed prompts + reproducible verifier.

402 offline + 31 live integration tests, all green. Mypy (strict), ruff (check + format), bandit (0 issues at any severity), pip-audit (--strict, no known vulnerabilities) all clean. Mutation testing (mutmut) gate ≥ 95 % kill, populated post-billing-reset.

# Fast, offline (CI on every push):
pytest tests/unit tests/property tests/client tests/contract -v

# Live UniProt (opt-in, nightly in CI):
pytest --integration tests/integration -v

# Lint / type-check / security / SCA:
ruff check . && ruff format --check . && mypy src/uniprot_mcp
bandit -r src/uniprot_mcp && pip-audit --strict

Architecture & threat model


Citation

Cite via CITATION.cff (GitHub renders a "Cite this repository" button). Always also cite the UniProt Consortium:

The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Research (2025). doi:10.1093/nar/gkae1010


License

Apache-2.0 — see LICENSE and NOTICE.

This project is the gateway layer of the Topologica Bio MCP suite. Multi-source orchestration and tamper-evident provenance ledgers live in the companion topologica-bio repository under BUSL-1.1 (Change Date 2030-04-19, auto-reverts to Apache-2.0). uniprot-mcp itself is and will remain permissively Apache-2.0.

Copyright © 2026 Santiago Maniches. TOPOLOGICA LLC.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniprot_mcp_server-1.1.0.tar.gz (146.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uniprot_mcp_server-1.1.0-py3-none-any.whl (63.5 kB view details)

Uploaded Python 3

File details

Details for the file uniprot_mcp_server-1.1.0.tar.gz.

File metadata

  • Download URL: uniprot_mcp_server-1.1.0.tar.gz
  • Upload date:
  • Size: 146.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for uniprot_mcp_server-1.1.0.tar.gz
Algorithm Hash digest
SHA256 a134f44a649f1afae587697568c4c6f24010b60d837c1ee845ebcf43c3685cf5
MD5 2128d74c702fb36f9cadb0d806c58375
BLAKE2b-256 01188385ad6d18cbd437db351819672a56ccd4823adf153bac40bda69e21d529

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniprot_mcp_server-1.1.0.tar.gz:

Publisher: release.yml on smaniches/uniprot-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file uniprot_mcp_server-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for uniprot_mcp_server-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e21c4b07e10f3edbf4b2d6bc5c3ddba9e39d6f055702ec964e37b4207830ef23
MD5 7ff6e8275e675506e24212b06f28e649
BLAKE2b-256 3a6b7b5ffc3ab3f1b4e06ec88dc51985f4c80018140351bfe73c8d1fcf4fbdd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for uniprot_mcp_server-1.1.0-py3-none-any.whl:

Publisher: release.yml on smaniches/uniprot-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page