An MCP server for interrogating PDB structures — search, inspect, and cross-reference across the RCSB Search, Data, and Sequence Coordinates APIs
Project description
rcsb-mcp
An MCP server for interrogating Protein Data Bank structures — discover, inspect, and cross-reference — from LLM clients (Claude Desktop, MCP Inspector, Cursor, etc.). It spans three RCSB APIs:
- Discover — find structures with the Search API (keyword, attribute, sequence, chemistry, 3D shape, motif).
- Inspect — fetch entry / entity / assembly / ligand details and annotations from the Data API.
- Relate — map sequences and positional features across PDB, UniProt, and NCBI with the Sequence Coordinates API.
Tools
Search (search.rcsb.org)
| Tool | What it does |
|---|---|
rcsb_list_pdb_search_attributes |
Discover searchable attribute paths, types, and operators. schema="structure" (default, ~677) or schema="chemical" (~57: chem_comp.*, drugbank_info.*, ...). |
rcsb_find_go_terms |
Resolve a free-text molecular function / biological process / cellular component to Gene Ontology ids (via EBI QuickGO), annotated with PDB entry counts — then search by rcsb_polymer_entity_annotation.annotation_lineage.id. |
rcsb_find_interpro_domains |
Resolve a free-text protein domain / family / fold to InterPro ids (via EBI InterPro API), annotated with PDB entry counts — then search by rcsb_polymer_entity_annotation.annotation_id. |
rcsb_find_enzyme_classes |
Resolve a free-text enzyme / reaction to Enzyme Commission (EC) numbers (via EBI Search/IntEnz), annotated with PDB entry counts — then search by rcsb_polymer_entity.rcsb_ec_lineage.id (hierarchical). |
rcsb_find_disease_terms |
Resolve a free-text disease / condition to MONDO ids (via EBI OLS), annotated with PDB entry counts — then search by rcsb_uniprot_annotation.annotation_lineage.id (hierarchical, UniProt-based). |
rcsb_find_organisms |
Resolve a free-text organism / common name / clade to NCBI Taxonomy ids (via UniProt taxonomy), annotated with PDB entry counts — then search by rcsb_entity_source_organism.taxonomy_lineage.id (hierarchical: a clade id matches every organism beneath it). |
rcsb_search_fulltext |
Free-text keyword search (e.g. "CRISPR Cas9"). |
rcsb_search_by_attribute |
Structured search on an indexed attribute (resolution, organism, release date, ...). Supports exists, negation, case_sensitive, and chemical=True (text_chem). |
rcsb_search_combined |
Combine free text + multiple attribute filters (AND/OR) in one query, with optional sort. |
rcsb_search_count |
Return only the number of matches — for "how many ..." questions. |
rcsb_search_facets |
Aggregate matches into buckets/statistics (terms, histogram, date_histogram, range, cardinality) — for "distribution / breakdown / per X" questions. |
rcsb_search_by_sequence |
MMseqs2 sequence-similarity search (BLAST-like). |
rcsb_search_by_chemical |
Chemical search by SMILES/InChI descriptor (whole-molecule or substructure) or molecular formula. |
rcsb_search_by_structure |
3D shape-similarity search against a reference PDB assembly or chain. |
rcsb_search_by_seqmotif |
Short sequence-motif search (PROSITE pattern, regex, or simple wildcards). |
rcsb_search_strucmotif |
3D structural-motif search: structures sharing a geometric arrangement of specific residues (e.g. a catalytic triad). |
rcsb_search_advanced |
Escape hatch: run a raw Search API query body (return_all_hits, grouped results, deeply nested boolean queries, ...). |
The three text tools (rcsb_search_fulltext, rcsb_search_by_attribute, rcsb_search_combined)
also take group_by_identity (100/95/90/70/50/30) to return one representative
per sequence-identity cluster — i.e. non-redundant results. To search
chemical-component attributes, find the path with
rcsb_list_pdb_search_attributes(schema="chemical"), then pass chemical=True to
rcsb_search_by_attribute / rcsb_search_combined (usually with return_type="mol_definition").
The chemical catalog is generated from the live metadata schema by
scripts/generate_chemical_attributes.py.
Paging. Every search tool that returns hits accepts limit (1–100, default
10) and offset (default 0). Each response reports total_count, has_more,
and next_offset; to fetch the next page, call the tool again with the same
query and offset set to the returned next_offset.
Data (data.rcsb.org/graphql)
There is one tool per Data API GraphQL root field. Each takes a list of IDs
(singular lookups = a one-element list) plus an optional fields argument to
override the curated default selection with your own GraphQL sub-selection.
Unknown IDs are reported under not_found.
| Tool | Object | Example ID |
|---|---|---|
rcsb_get_entries |
PDB entries | "4HHB" |
rcsb_get_entry_annotations |
Entry biological/functional annotations (GO, domains, disease, ...) | "4HHB" |
rcsb_get_entry_exp_info |
Entry experimental conditions / determination metadata | "4HHB" |
rcsb_get_polymer_entities |
Polymer entities (protein/NA) | "4HHB_1" |
rcsb_get_nonpolymer_entities |
Ligand/cofactor entities | "4HHB_3" |
rcsb_get_branched_entities |
Carbohydrate entities | "5FMB_2" |
rcsb_get_polymer_entity_instances |
Polymer chains | "4HHB.A" |
rcsb_get_nonpolymer_entity_instances |
Bound-ligand instances | "4HHB.E" |
rcsb_get_branched_entity_instances |
Glycan chains | "5FMB.C" |
rcsb_get_assemblies |
Biological assemblies | "4HHB-1" |
rcsb_get_interfaces |
Assembly interfaces | "1BMV-1.1" |
rcsb_get_chem_comps |
Chemical components / ligands | "HEM", "ATP" |
rcsb_get_entry_groups |
Entry groups | "G_1002266" |
rcsb_get_polymer_entity_groups |
Polymer entity groups (seq. clusters) | "85_70" |
rcsb_get_nonpolymer_entity_groups |
Non-polymer entity groups | "ATP" |
rcsb_get_uniprot |
UniProt record (single) | "P69905" |
rcsb_get_pubmed |
PubMed record (single, integer) | 6726807 |
rcsb_get_group_provenance |
Grouping provenance (single) | "provenance_sequence_identity" |
rcsb_data_graphql |
Escape hatch: run any GraphQL query against the Data API. | — |
The Search API only returns identifiers, so the search tools optionally
enrich entry hits with metadata. Enrichment and all Data API tools query
the GraphQL endpoint, batching every requested ID into one request. All 18
typed tools are generated from a single registry in
queries.py (DATA_OBJECTS), so adding a field or
endpoint is a one-line change.
Sequence Coordinates (sequence-coordinates.rcsb.org/graphql)
Maps alignments and positional annotations between sequence reference systems
(UNIPROT, NCBI_PROTEIN, NCBI_GENOME, PDB_ENTITY, PDB_INSTANCE). Each
tool takes an optional fields argument to override the default selection; use
rcsb_describe_seqcoord_object to discover what fields are available.
This is the only RCSB API that cross-references NCBI (RefSeq protein /
genome) — the Data API only knows UniProt. So "what NCBI proteins map to a PDB
structure?" is answered by rcsb_seqcoord_alignments, not the Data API. PDB query
ids must be entity-level (4HHB_1), not a bare entry (4HHB); for a whole
entry, query each polymer entity.
| Tool | What it does |
|---|---|
rcsb_seqcoord_alignments |
Cross-reference a sequence across PDB / UniProt / NCBI with aligned ranges (e.g. 4HHB_1 → NCBI proteins NP_000508, NP_000549). |
rcsb_seqcoord_annotations |
Positional features for one sequence, from one or more annotation sources (UNIPROT, PDB_ENTITY, PDB_INSTANCE, PDB_INTERFACE). |
rcsb_seqcoord_group_alignments |
Alignments among members of a sequence group (MATCHING_UNIPROT_ACCESSION / SEQUENCE_IDENTITY). |
rcsb_seqcoord_group_annotations |
Annotations across a group; summary=True returns a positional summary. |
rcsb_seqcoord_graphql |
Escape hatch: run any GraphQL query against the Sequence Coordinates API. |
rcsb_describe_seqcoord_object |
Introspect the live schema to discover fields available on a seqcoord object (for use with fields=). |
Install
# run the published package without installing (recommended for clients)
uvx rcsb-mcp
# or install it
pip install rcsb-mcp
rcsb-mcp is listed in the Official MCP Registry
as io.github.rcsb/rcsb-mcp, so registry-aware clients can discover it directly.
For local development, install from the project root instead:
pip install -e .
# or with uv
uv pip install -e .
Run / test
# unit tests (no network)
hatch test # or: python tests/test_queries.py
# run the server over stdio
python -m rcsb_mcp.server
# or, after install:
rcsb-mcp
# inspect interactively
npx @modelcontextprotocol/inspector python -m rcsb_mcp.server
There is also an end-to-end evaluation suite (evals/) — 10
read-only, stable questions that measure how well an LLM can drive these tools to
answer real PDB questions. See evals/README.md to run it.
Connect to Claude Desktop
Edit claude_desktop_config.json:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"rcsb-mcp": {
"command": "uvx",
"args": ["rcsb-mcp"]
}
}
}
For a local source checkout, point at the module instead:
{
"mcpServers": {
"rcsb-mcp": {
"command": "python",
"args": ["-m", "rcsb_mcp.server"],
"cwd": "/absolute/path/to/rcsb-mcp/src"
}
}
}
Restart Claude Desktop. The tools appear under the connectors (plug) icon.
Example prompts
- "Find high-resolution human hemoglobin structures." →
rcsb_search_by_attribute+rcsb_search_fulltext - "Human hemoglobin structures better than 2 Å, best resolution first." →
rcsb_search_combined - "What PDB entries match this protein sequence: MTEY..." →
rcsb_search_by_sequence - "Find structures containing a ligand like this SMILES / with formula C8H9NO2." →
rcsb_search_by_chemical - "Which structures have a 3D fold similar to 4HHB?" →
rcsb_search_by_structure - "Find proteins with a zinc-finger motif." →
rcsb_search_by_seqmotif - "Structures of proteins with kinase activity / involved in DNA repair / in the mitochondrial membrane." →
rcsb_find_go_terms→rcsb_search_by_attributeonrcsb_polymer_entity_annotation.annotation_lineage.id - "Structures containing an SH2 domain / immunoglobulin fold." →
rcsb_find_interpro_domains→rcsb_search_by_attributeonrcsb_polymer_entity_annotation.annotation_id - "Alcohol dehydrogenase structures / any EC 3.4.21 serine protease." →
rcsb_find_enzyme_classes→rcsb_search_by_attributeonrcsb_polymer_entity.rcsb_ec_lineage.id - "Structures of proteins associated with cystic fibrosis / breast cancer." →
rcsb_find_disease_terms→rcsb_search_by_attributeonrcsb_uniprot_annotation.annotation_lineage.id - "Structures from mammals / from a particular organism or clade." →
rcsb_find_organisms→rcsb_search_by_attributeonrcsb_entity_source_organism.taxonomy_lineage.id - "Non-redundant human kinase structures (90% identity clusters)." →
rcsb_search_fulltext/rcsb_search_combinedwithgroup_by_identity=90 - "How many human X-ray structures are there?" →
rcsb_search_count - "Break down ribosome structures by experimental method / by release year." →
rcsb_search_facets - "Find structures with the same catalytic-site geometry as residues 162/193/219 of 2MNR." →
rcsb_search_strucmotif - "Find chemical components under 150 Da." →
rcsb_list_pdb_search_attributes(schema="chemical")+rcsb_search_by_attributewithchemical=True - "Summarize PDB entries 4HHB, 1MBN and 6VXX." →
rcsb_get_entries - "What's the sequence and organism of entity 4HHB_1?" →
rcsb_get_polymer_entities - "Tell me about the ligand HEM." →
rcsb_get_chem_comps - "What's the composition of the 4HHB biological assembly?" →
rcsb_get_assemblies - "Which PDB entries does P69905 map to?" →
rcsb_get_uniprot - "Which PDB entities align to UniProt P69905, and over what ranges?" →
rcsb_seqcoord_alignments - "What NCBI proteins map to 4HHB?" →
rcsb_seqcoord_alignmentsper entity (4HHB_1,4HHB_2),to_ref=NCBI_PROTEIN - "Show UniProt features mapped onto PDB entity 4HHB_1." →
rcsb_seqcoord_annotations - "Pull a field GraphQL doesn't expose by default / combine objects." →
rcsb_data_graphql
Notes
- Search endpoint:
https://search.rcsb.org/rcsbsearch/v2/query(POST, JSON body). - Data endpoint:
https://data.rcsb.org/graphql(POST, GraphQL). It returns HTTP 200 even for query errors, reporting them in anerrorsarray. - Sequence Coordinates endpoint:
https://sequence-coordinates.rcsb.org/graphql(POST, GraphQL; same HTTP-200-with-errorsbehavior). - The
rcsb_find_*resolvers map free text to ontology ids via EBI services — the non-RCSB dependencies: GO via QuickGO (.../QuickGO/services/ontology/go/search), InterPro (.../interpro/api/entry/interpro/), EC via EBI Search/IntEnz (.../ebisearch/ws/rest/intenz), and disease via OLS/MONDO (.../ols4/api/search?ontology=mondo). The resolved ids then drive RCSB annotation searches (rcsb_polymer_entity_annotation.*,rcsb_polymer_entity.rcsb_ec_lineage.id,rcsb_uniprot_annotation.annotation_lineage.id). - No API key required; the APIs are public. Be considerate with request volume.
- A full list of searchable attributes for
rcsb_search_by_attributeis in the Search API attribute reference; the Data API schema is documented at data.rcsb.org/index.html#gql-api.
Instructions prompt
Use prompts/pdb-assistant.md as the instruction prompt for your project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rcsb_mcp-0.1.0.tar.gz.
File metadata
- Download URL: rcsb_mcp-0.1.0.tar.gz
- Upload date:
- Size: 99.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
541a1da917db3e7842e9c3c53f69b388b9e884e6febc5819466261ba5b861464
|
|
| MD5 |
cff0bd84f9209757ca4413b1058cd5a7
|
|
| BLAKE2b-256 |
eaa4468943c76af54c3fb40f1da764effa126223f979a1f0c40cd79e58cbe13a
|
Provenance
The following attestation bundles were made for rcsb_mcp-0.1.0.tar.gz:
Publisher:
publish.yaml on rcsb/rcsb-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rcsb_mcp-0.1.0.tar.gz -
Subject digest:
541a1da917db3e7842e9c3c53f69b388b9e884e6febc5819466261ba5b861464 - Sigstore transparency entry: 1956912209
- Sigstore integration time:
-
Permalink:
rcsb/rcsb-mcp@4f003383f0b6160c702dca730587e17637a3d12e -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/rcsb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@4f003383f0b6160c702dca730587e17637a3d12e -
Trigger Event:
release
-
Statement type:
File details
Details for the file rcsb_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rcsb_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 80.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ceabaa1821fb565e8c8674da0bfda8d61248febe5e95a962ff5a12053f90fb1
|
|
| MD5 |
b9d8ee8db5cd66c8c29859494b47ce00
|
|
| BLAKE2b-256 |
8e5ebd64a5e87bd89966b2874918cd352009958cf3d203980101cd0f7725b036
|
Provenance
The following attestation bundles were made for rcsb_mcp-0.1.0-py3-none-any.whl:
Publisher:
publish.yaml on rcsb/rcsb-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rcsb_mcp-0.1.0-py3-none-any.whl -
Subject digest:
6ceabaa1821fb565e8c8674da0bfda8d61248febe5e95a962ff5a12053f90fb1 - Sigstore transparency entry: 1956912797
- Sigstore integration time:
-
Permalink:
rcsb/rcsb-mcp@4f003383f0b6160c702dca730587e17637a3d12e -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/rcsb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@4f003383f0b6160c702dca730587e17637a3d12e -
Trigger Event:
release
-
Statement type: