Skip to main content

MCP server wrapping AlphaFold DB and 13 other biomedical data sources, with a local SQLite knowledge graph.

Project description

AlphaFold Sovereign MCP

A Model Context Protocol server that wraps AlphaFold DB and 13 other public biomedical data sources behind a set of MCP tool calls, and persists each result to a local SQLite knowledge graph for later querying.

This is an unfunded, independent open-source project. It is not a service, not certified for any regulated use, and its outputs are research aids that should be reviewed by qualified humans before any clinical or regulatory use.

CI Docs OpenSSF Scorecard Release License: Apache 2.0 Python 3.10+ MCP Spec 2025-06-18 Tests Coverage ORCID

Status: v1.1.0-rc1 — release candidate. Engineering-grade (610 tests, 99% branch coverage, full legal kit). Scientifically unvalidated by independent domain experts — see STATUS.md and LIMITATIONS.md.


What this is

A Python MCP server that:

  • Wraps AlphaFold DB, UniProt, MONDO, HPO, Open Targets, ClinVar, gnomAD, DisGeNET, ChEMBL, Ensembl, InterPro, RCSB PDB, Gene Ontology, and Human Protein Atlas behind MCP tool calls. Each call is a thin orchestration over those upstreams; the server does not add scientific judgement.
  • Composes upstreams into multi-source workflows: variant cross-reference reports, disease–target landscape summaries, heuristic target-druggability scoring, drug-repurposing candidate ranking, and cross-species structural-distance computation.
  • Persists every tool result to a local SQLite knowledge graph (storage/knowledge_graph.py) so a research session accumulates a queryable, exportable database.
  • Includes a topological-data-analysis (TDA) module that computes persistent-homology fingerprints (Betti numbers β₀, β₁, β₂) over Vietoris-Rips filtrations of Cα coordinates, and a Wasserstein-distance comparator between fingerprints. The full persistent-homology features require the optional [tda] extra (gudhi).

It targets mcp-spec 2025-06-18 and runs on Python 3.10–3.13.

What this is not

  • It is not a hosted service or a SaaS.
  • It is not certified for any regulated use (HIPAA, GxP, 21 CFR Part 11, FedRAMP, FIPS, SOC 2). The code structures audit logging in a way that could later support such a certification, but no such audit has been performed.
  • It does not train, fine-tune, or publish AlphaFold models — it consumes AlphaFold DB's public REST API.
  • The "ACMG/AMP criteria" that generate_variant_clinical_report produces are a draft surface of the upstream evidence the server can fetch automatically. They are not a substitute for clinical-laboratory variant review.
  • The "druggability tier" that assess_target_druggability returns is a heuristic built from drug-precedent counts, Open Targets tractability labels, pLDDT, and gnomAD constraint. It is not a validated prediction.
  • "Structural distance" between proteins via TDA Wasserstein distance measures topological similarity of the Cα point cloud. It is not a sequence similarity, RMSD, or functional-equivalence measure.

For a complete, itemised list of known limitations (with module references, impact, and planned resolution), see LIMITATIONS.md. For the high-level posture — what is engineering-validated vs. what is not yet scientifically validated — see STATUS.md.


Install

No PyPI release yet. v1.1.0-rc1 is intentionally source-install only. PyPI publication is held back until the v1.2.0 validation work lands (see STATUS.md §"Roadmap to v1.2.0"). The pip install alphafold-sovereign-mcp command below will not work until then.

Install from source

git clone https://github.com/smaniches/alphafold-sovereign-mcp
cd alphafold-sovereign-mcp
uv pip install -e .
# With persistent-homology TDA (requires gudhi):
# uv pip install -e ".[tda]"

Verify the install

alphafold-sovereign --version       # → 1.1.0-rc1
alphafold-sovereign --self-test     # → PASS on the offline BRCA1 fixture

--self-test boots the server in offline mode and exercises the deterministic logic of generate_variant_clinical_report against a built-in BRCA1:c.5266dupC fixture. No network calls; returns exit code 0 on PASS, non-zero on FAIL.

Configure Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "alphafold-sovereign": {
      "command": "alphafold-sovereign-mcp",
      "args": []
    }
  }
}

Restart Claude Desktop and the tools become available in conversations. See the examples/ directory for three end-to-end illustrations of what a session looks like.

Offline mode

ALPHAFOLD_OFFLINE=1 alphafold-sovereign-mcp

Refuses all outbound HTTP. Serves only from the local SQLite cache.

Future (planned, not active)

# Once published to PyPI in v1.2.0:
pip install alphafold-sovereign-mcp
# Or via uvx (no install required):
uvx alphafold-sovereign-mcp

Tool inventory

The server exposes 29 MCP tools across four modules. Each tool's input schema is a Pydantic model; results are JSON.

Disease & ontology (tools/disease.py)

Tool What it does
lookup_disease MONDO record + hierarchy + ICD cross-references
search_diseases Full-text MONDO ontology search
lookup_phenotype HPO term + associated diseases
get_gene_phenotype_profile HPO phenotypes + gnomAD constraint for a gene
get_disease_targets Top drug targets for a MONDO disease (Open Targets)
get_target_diseases Top diseases for a UniProt target (Open Targets)
get_common_disease_targets Parallel profiling across curated MONDO diseases
triage_variant_3d HGVS → ClinVar + gnomAD + MONDO disease context
phenotype_to_structures HPO → diseases → OT targets → UniProt IDs
get_orphan_disease_atlas Orphanet → MONDO → HPO + OT targets
compare_disease_target_overlap Jaccard similarity of target sets for two diseases
resolve_icd10_to_mondo ICD-10 code → MONDO disease record

Precision medicine (tools/precision_medicine.py)

Tool What it does
generate_variant_clinical_report HGVS → multi-source report + draft ACMG/AMP criteria
assess_target_druggability UniProt → HOT/WARM/COLD/NOT_DRUGGABLE tier
synthesize_protein_dossier UniProt → multi-source briefing
map_disease_drug_landscape MONDO → approved drugs + pipeline + ChEMBL phase counts
classify_variant_acmg HGVS → ACMG/AMP criteria checklist (PVS1, PM2, PP3, BS1, BP4)
find_drug_repurposing_candidates MONDO → candidates ranked by OT evidence × ChEMBL phase

The ACMG/AMP criteria produced are a draft: they reflect the upstream evidence the server can fetch automatically, and they are not a substitute for clinical-laboratory review.

Structure intelligence (tools/structure_intelligence.py)

Tool What it does
analyze_structural_confidence pLDDT distribution + PAE-derived domain map
compute_topology_fingerprint 64-dim TDA fingerprint (Betti numbers β₀ β₁ β₂)
compare_proteins_topologically Pairwise Wasserstein distance matrix for 2–10 proteins
find_evolutionary_structural_shifts Cross-species structural divergence (TDA + Ensembl orthologs)
score_binding_pocket_geometry Geometric pocket detection + heuristic druggability index
detect_intrinsically_disordered IDR map (linkers, tails, long IDRs)

Knowledge graph (tools/knowledge_graph_tools.py)

Tool What it does
query_variant_database Search locally stored variant triage results
query_protein_database Search locally stored protein assessments
get_knowledge_graph_stats Database size, entity counts, last activity
export_research_dataset Export tables to JSON for pandas/ML pipelines
find_drug_gene_network Traverse the accumulated drug–gene–disease graph

Example usage

For three documented end-to-end illustrations of a Claude Desktop session against this server — variant triage on BRCA1 c.5266dupC, target characterisation on EGFR, and a drug-discovery walk-through on Imatinib → BCR-ABL → CML — see the examples/ directory. Each example includes the user prompt, the tool calls the model issues, the server's response shape, and the model's paraphrased reply.

Clinical variant report

generate_variant_clinical_report(hgvs="BRCA1:c.181T>G")

The server resolves the HGVS, fetches ClinVar, gnomAD, AlphaMissense (via AlphaFold DB), Open Targets disease evidence, ChEMBL drug data, and Ensembl VEP consequence annotations, and returns a single JSON record with the cross-referenced fields plus the ACMG/AMP criteria that the available evidence supports.

Drug repurposing

find_drug_repurposing_candidates(disease_mondo_id="MONDO:0007739")

Returns drugs whose Open Targets evidence connects them to the disease, ranked by a composite of OT evidence score × the maximum ChEMBL clinical phase reached against the target.

Cross-species structural divergence

find_evolutionary_structural_shifts(
    gene_symbol="ACE2",
    target_species=["mus_musculus", "rhinolophus_ferrumequinum"]
)

For each species: fetches the ortholog (Ensembl), the AlphaFold structure, computes the TDA fingerprint, and returns the Wasserstein distance from the human structure along with sequence identity.


Data sources

Source What we use License
AlphaFold DB v4 (EBI/DeepMind) Structures, pLDDT, PAE, AlphaMissense CC BY 4.0
UniProt Protein function, domains, GO CC BY 4.0
MONDO (OLS4) Disease ontology, ICD cross-refs CC BY 4.0
HPO (JAX) Phenotype terms, gene-disease links hpo.jax.org
Open Targets Disease–target evidence Apache 2.0
ClinVar (NCBI) Variant pathogenicity Public domain
gnomAD v4 Population allele frequencies ODbL
DisGeNET Gene–disease association scores CC BY-NC-SA 4.0
ChEMBL v34 (EMBL-EBI) Drug bioactivity, MoA, ADMET CC BY-SA 3.0
Ensembl (EMBL-EBI) VEP, orthologs, gene lookup Apache 2.0
InterPro Domain + family annotations CC0
RCSB PDB Experimental structures CC0
Gene Ontology Biological process, molecular function CC BY 4.0
Human Protein Atlas Tissue expression CC BY-SA 3.0

See NOTICE for full attributions.


Architecture

clients/_base.py
  ├── Air-gap enforcement (refuses sockets when ALPHAFOLD_OFFLINE=1)
  ├── Token-bucket rate limiting (aiolimiter)
  ├── Exponential backoff with jitter (tenacity)
  ├── Circuit breaker (CLOSED / OPEN / HALF_OPEN)
  └── Content-addressed SHA-256 dedup of upstream responses

storage/knowledge_graph.py
  ├── SQLite WAL mode (embedded, ACID)
  ├── 6 entity tables: proteins, variants, diseases, drugs, genes, phenotypes
  ├── 4 relationship tables: protein_disease, protein_drug, variant_disease, gene_phenotype
  ├── tool_invocations audit table (SHA-256 of input + output, timestamps)
  └── Analytical views: variant_summary, drug_landscape

domain/disease.py
  └── Pure Python frozen dataclasses (PathogenicityClass, VariantReport, ...)

See ARCHITECTURE.md for the full module map.


Testing & quality

  • 610 unit tests with respx-mocked upstreams; the full suite runs hermetically in under 15 seconds on a laptop.
  • Coverage on the shipped surface (src/alphafold_sovereign/clients, domain, storage, server, tools): 99% line + branch, with 19 of 20 modules at 100%.
  • Lint: ruff (full ruleset, no per-file ignores on the production tree). Type checking: mypy --strict on the domain, clients, and storage subtrees.
  • Security: bandit plus CodeQL security-extended.
  • Supply chain: SBOM generation in CI; reproducible-build script at scripts/replicate.sh.

The full CI matrix (Python 3.10, 3.11, 3.12, 3.13 × Ubuntu, macOS) runs on every push. Test counts and coverage percentages above are the numbers a git clone && uv run pytest produces on the current HEAD; if you find a divergence, please open an issue.


Contributing

DCO sign-off required (git commit -s). No copyright assignment. Coverage gate: ≥95% line / ≥90% branch for new modules. Full guide: CONTRIBUTING.md.


Citation

Machine-readable metadata: CITATION.cff (GitHub renders a "Cite this repository" button in the sidebar that consumes this file).

@software{maniches_alphafold_sovereign_mcp,
  author    = {Maniches, Santiago},
  title     = {AlphaFold Sovereign MCP},
  year      = {2026},
  version   = {1.1.0-rc1},
  url       = {https://github.com/smaniches/alphafold-sovereign-mcp},
  license   = {Apache-2.0},
  orcid     = {0009-0005-6480-1987}
  % Add when the Zenodo release is published:
  % doi   = {10.5281/zenodo.XXXXXXX}
}

When citing results derived from this software, please also cite the upstream data sources (AlphaFold DB, UniProt, Open Targets, ChEMBL, Ensembl, ClinVar, gnomAD, MONDO, HPO, DisGeNET, RCSB PDB, InterPro, Gene Ontology, Human Protein Atlas) according to their own citation requirements.

License

Copyright 2024–2026 Santiago Maniches.

Licensed under the Apache License, Version 2.0. See LICENSE.

Patent reservation: see PATENTS.md. Trademark policy: see TRADEMARKS.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphafold_sovereign_mcp-1.1.0.tar.gz (155.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alphafold_sovereign_mcp-1.1.0-py3-none-any.whl (107.3 kB view details)

Uploaded Python 3

File details

Details for the file alphafold_sovereign_mcp-1.1.0.tar.gz.

File metadata

  • Download URL: alphafold_sovereign_mcp-1.1.0.tar.gz
  • Upload date:
  • Size: 155.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alphafold_sovereign_mcp-1.1.0.tar.gz
Algorithm Hash digest
SHA256 11684e94a3b67342cac47499b5846dba519fe9fd4383e2ca18f1fb5eaeb32b72
MD5 4147fd3df4943ab0bb7f2febd6f77d3f
BLAKE2b-256 96a1a372282b67a0c7483a0f36d655cf57f4a3ced29bdef7827272d78b9beb1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphafold_sovereign_mcp-1.1.0.tar.gz:

Publisher: release.yml on smaniches/alphafold-sovereign-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alphafold_sovereign_mcp-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for alphafold_sovereign_mcp-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b93c945c04c61143a188f8083f217a0d9837248386e81f4a26cb10a56a9c999
MD5 c23b79fffaf55be3ea1580d52bcdfd77
BLAKE2b-256 592c7f588c9cb4d7771231130cc9f2323938d8fa6258019f43c8f4ff60e93d95

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphafold_sovereign_mcp-1.1.0-py3-none-any.whl:

Publisher: release.yml on smaniches/alphafold-sovereign-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page