Skip to main content

Python package interface for RCSB.org API services

Project description

PyPi Release Build Status Documentation Status

rcsb-api

Python interface for RCSB PDB API services at RCSB.org.

This package requires Python 3.8 or later.

Installation

Get it from PyPI:

pip install rcsb-api

Or, download from GitHub

Getting Started

Full documentation available at readthedocs.

The RCSB PDB Search API supports RESTful requests according to a defined schema. This package provides an rcsbapi.search module that simplifies generating complex search queries.

The RCSB PDB Data API supports requests using GraphQL, a language for API queries. This package provides an rcsbapi.data module that simplifies generating queries in GraphQL syntax.

Search API

The rcsbapi.search module supports all available Advanced Search services, as listed below. For more details on their usage, see Search Service Types.

Search service QueryType
Full-text TextQuery()
Attribute (structure or chemical) AttributeQuery()
Sequence similarity SeqSimilarityQuery()
Sequence motif SeqMotifQuery()
Structure similarity StructSimilarityQuery()
Structure motif StructMotifQuery()
Chemical similarity ChemSimilarityQuery()

Search API Examples

To perform a search for all structures from humans associated with the term "Hemoglobin", you can combine a "full-text" query (TextQuery) with an "attribute" query (AttributeQuery):

from rcsbapi.search import AttributeQuery, TextQuery
from rcsbapi.search import search_attributes as attrs

# Construct a "full-text" sub-query for structures associated with the term "Hemoglobin"
q1 = TextQuery(value="Hemoglobin")

# Construct an "attribute" sub-query to search for structures from humans
q2 = AttributeQuery(
    attribute="rcsb_entity_source_organism.scientific_name",
    operator="exact_match",  # Other operators include "contains_phrase", "exists", and more
    value="Homo sapiens"
)
# OR, do so by using Python bitwise operators:
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"

# Combine the sub-queries (can sub-group using parentheses and standard operators, "&", "|", etc.)
query = q1 & q2

# Fetch the results by iterating over the query execution
for rId in query():
    print(rId)

# OR, capture them into a variable
results = list(query())

These examples are in operator syntax. You can also make queries in fluent syntax. Learn more about both syntaxes and implementation details in Query Syntax and Execution.

Data API

The rcsbapi.data module allows you to easily construct GraphQL queries to the RCSB.org Data API.

This is done by specifying the following input:

  • "input_type": the data hierarchy level you are starting from (e.g., "entry", "polymer_entity", etc.) (See full list here).
  • "input_ids": the list of IDs for which to fetch data (corresponding to the specified "input_type")
  • "return_data_list": the list of data items ("fields") to retrieve. (Available fields can be explored here or via the GraphiQL editor's Documentation Explorer panel.)

Data API Examples

This is a simple query requesting the experimental method of a structure with PDB ID 4HHB (Hemoglobin).

The query must be executed using the .exec() method, which will return the JSON response as well as store the response as an attribute of the DataQuery object. From the object, you can access the Data API response, get an interactive editor link, or access the arguments used to create the query. The package is able to automatically build queries based on the "input_type" and path segment passed into "return_data_list". If using this package in code intended for long-term use, it's recommended to use fully qualified paths. When autocompletion is being used, an WARNING message will be printed out as a reminder.

from rcsbapi.data import DataQuery as Query
query = Query(
    input_type="entries",
    input_ids=["4HHB"],
    return_data_list=["exptl.method"]
)
print(query.exec())

Data is returned in JSON format

{
  "data": {
    "entries": [
      {
        "rcsb_id": "4HHB",
        "exptl": [
          {
            "method": "X-RAY DIFFRACTION"
          }
        ]
      }
    ]
  }
}

Here is a more complex query. Note that periods can be used to further specify requested data in return_data_list. Also note multiple return data items and ids can be requested in one query.

from rcsbapi.data import DataQuery as Query
query = Query(
    input_type="polymer_entities",
    input_ids=["2CPK_1", "3WHM_1", "2D5Z_1"],
    return_data_list=[
        "polymer_entities.rcsb_id",
        "rcsb_entity_source_organism.ncbi_taxonomy_id",
        "rcsb_entity_source_organism.ncbi_scientific_name",
        "cluster_id",
        "identity"
    ]
)
print(query.exec())

Jupyter Notebooks

Several Jupyter notebooks with example use cases and workflows for all package modules are provided under notebooks.

For example, one notebook using both Search and Data API packages for a COVID-19 related example is available in notebooks/search_data_workflow.ipynb or online through Google Colab Open In Colab.

Citing

Please cite the rcsb-api package by URL:

https://rcsbapi.readthedocs.io

You should also cite the RCSB.org API services this package utilizes:

Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley, John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive, Journal of Molecular Biology, 2020. DOI: 10.1016/j.jmb.2020.11.003

Documentation and Support

Please refer to the readthedocs page to learn more about package usage and other available features as well as to see more examples.

If you experience any issues installing or using the package, please submit an issue on GitHub and we will try to respond in a timely manner.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rcsb_api-0.5.0.tar.gz (60.6 kB view details)

Uploaded Source

Built Distribution

rcsb_api-0.5.0-py2.py3-none-any.whl (44.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file rcsb_api-0.5.0.tar.gz.

File metadata

  • Download URL: rcsb_api-0.5.0.tar.gz
  • Upload date:
  • Size: 60.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for rcsb_api-0.5.0.tar.gz
Algorithm Hash digest
SHA256 f4f7e82ad45d9820ec9fe96b2a564a29523589a1e87587262475b8ecb0f66a3f
MD5 b14ae9ecb5f56b76c61ed6e3b2d76269
BLAKE2b-256 8cf230d4f2d27ded8123dd4a95d6c13e6c78c4ad8ccd195e8ea34ef4d3498257

See more details on using hashes here.

File details

Details for the file rcsb_api-0.5.0-py2.py3-none-any.whl.

File metadata

  • Download URL: rcsb_api-0.5.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 44.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for rcsb_api-0.5.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 25e1cc95f5626ab342b8a6f3e89ad6a7333651de9056c2714cf8aebdfda888f6
MD5 e1e133bc55feb5e7366ac307c220018a
BLAKE2b-256 cab4340de248abbbf688ddc414d82185c79d141e659e89e3c47b0e7450758244

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page