Python SDK for interacting with the QDX Rush API and modules
Project description
rush-py
Quickstart
This document will walk through executing jobs on the Rush platform, by demonstrating how to prepare a protein. For a comprehensive guide on the concepts and constructing a full workflow, see the full rush-py explainer document.
First, install the following modules via pip—we require Python ≥ 3.9:
pip install rush-py pdb-tools
0) Code Sample
See the detailed breakdown in sections.
# Get a pdb to work with - we use the pdb-tools cli here
# but you can download directly from rcsb.org
!pdb_fetch '1brs' | pdb_selchain -A | pdb_delhetatm > '1B39_A_nohet.pdb'
# ...import the dependencies and set your configuration
from pathlib import Path
import rush
os.environ["RUSH_TOKEN"] = YOUR_TOKEN
# 1.3 Build your client
client = rush.build_blocking_provider_with_functions()
# 2.1 Prepare the protein
prepared_protein_qdxf, prepared_protein_pdb = client.prepare_protein(
Path("1B39_A_nohet.pdb"), tags=["example_prep"]
)
# 2.3 Return run values
print(prepared_protein_qdxf.download(overwrite=True).open().read()[0:50], "...")
2024-03-21 14:31:29,560 - rush - INFO - Restoring by default via env
2024-03-21 14:31:30,540 - rush - INFO - Trying to restore job with tags: ['example_prep'] and path: github:talo/prepare_protein/947cdbc000031e192153a20a9b4a8fbb12279102#prepare_protein_tengu
2024-03-21 14:31:30,586 - rush - INFO - Restoring job from previous run with id ea02e2b4-06b1-4576-a1f9-0ecec22e537b
[{"amino_acid_insertion_codes": ["", "", "", "", " ...
1) Setup
This is where we prepare the rush client, directories, and input data we’ll be working with.
1.0) Imports
import json
from pathlib import Path
from pdbtools import pdb_delhetatm, pdb_fetch, pdb_selchain
import rush
1.1) Credentials
Retrieve your API token from the Rush UI.
You can either set the RUSH_URL
and RUSH_TOKEN
environment variables
or provide them as variables to the client directly.
To see how to set environment variables, Wikipedia has an extensive article.
os.environ["RUSH_TOKEN"] = YOUR_TOKEN
1.2) Configuration
Lets set some global variables that define our project. These are not required, but are good practice to help organize the jobs that will be persisted under your account.
Make sure you create a unique set of tags for each run. Good practice is to have at least each of the experiment name and system name as a tag.
EXPERIMENT = "rush-py-quickstart"
SYSTEM = "1B39"
TAGS = ["qdx", EXPERIMENT, SYSTEM]
1.3) Build your client
Get our client, which we’ll use for calling modules and generally for using the Rush API.
As mentioned earlier, url
and access_token
are optional if you have
set the env variables RUSH_URL
and RUSH_TOKEN
respectively.
batch_tags
will be applied to each run that is spawned by this client.
A folder called .rush
will be created in your workspace directory
(defaults to the current working directory, can be overridden by passing
workspace=
to the provider builder).
# By using the `build_provider_with_functions` method,
# we will also build helper functions calling each module
client = rush.build_blocking_provider_with_functions(batch_tags=TAGS)
2024-03-21 14:31:32,815 - rush - INFO - Restoring by default via env
1.4) Input selection
Fetch a pdb from RCSB, stripping hetatoms and selecting a single chain to pass as input to the modules:
PROTEIN_PDB_PATH = client.workspace / f"{SYSTEM}_P.pdb"
complex = list(pdb_fetch.fetch_structure(SYSTEM))
protein = pdb_delhetatm.remove_hetatm(pdb_selchain.select_chain(complex, "A"))
with open(PROTEIN_PDB_PATH, "w") as f:
for l in protein:
f.write(str(l))
2) Running Rush Modules
You can view which modules are available, alongside their documentation, in the API Documentation.
2.0) Prep the protein
First we will run the protein preparation routine (using pdbfixer and pdb2pqr internally) to prepare the protein for a molecular dynamics simulation.
# we can check the arguments and outputs for prepare_protein with help()
help(client.prepare_protein)
Help on function prepare_protein in module rush.provider:
prepare_protein(*args: *tuple[RushObject[bytes]], target: 'Target | None' = None, resources: 'Resources | None' = None, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[list[Record]], RushObject[bytes]]
Prepare a PDB for downstream tasks: protonate, fill missing atoms, etc.
Module version:
`github:talo/prepare_protein/947cdbc000031e192153a20a9b4a8fbb12279102#prepare_protein_tengu`
QDX Type Description:
input_pdb: Object[@$Bytes]
->
output_qdxf: Object[[Conformer]];
output_pdb: Object[@$Bytes]
:param input_pdb: An input protein as a file; one PDB file
:return output_qdxf: An output protein a vec: one qdxf per model in pdb
:return output_pdb: An output protein as a file: one PDB file
# Here we run the function, it will return a Provider.Arg which you can use to
# fetch the results
# We set restore = True so that we can restore a previous run to the same path
# with the same tags
prepared_protein_qdxf, prepared_protein_pdb = client.prepare_protein(
PROTEIN_PDB_PATH,
)
# This initially only has the id of your result; we will show how to fetch the
# actual value later
prepared_protein_qdxf
2024-03-21 14:31:36,354 - rush - INFO - Trying to restore job with tags: ['qdx', 'rush-py-quickstart', '1B39'] and path: github:talo/prepare_protein/947cdbc000031e192153a20a9b4a8fbb12279102#prepare_protein_tengu
2024-03-21 14:31:36,400 - rush - INFO - Restoring job from previous run with id d746634a-8fe8-437d-b468-4bb66f5f4a12
Arg(id=127ac5f6-1227-49f4-ad2b-45a08e6c64ca, value=None)
2.1) Run statuses
This will show the status of all of your runs. You can also view run statuses on the Rush UI.
client.status()
{}
2.2) Run Values
This will return the “value” of the output from the function—for files you will recieve a url that you can download, otherwise you will recieve them as python types:
protein_qdxf_info = prepared_protein_qdxf.get()
protein_qdxf_info
Blocking get
'https://storage.googleapis.com/rush_store_default/af08031b-e871-45e2-a226-e8c7e1fd5719?x-goog-signature=0864503418ee439f8b34e2461b06c15c4e83be22a72b7acd977ed41c02da00915c749bd4f9145b4cec3553fed283ffee660f20b6f418df99ad4bad1c34f8edcdd1e337da2021ef8e0bfae9c8a7bc0b85729c605765e9512a2623f3dacdcaf079bf416a946881873a87f7fc17e3b54fe8651837aa2b47208ac9b9b42d5d8854d2214e2c7002f89d8b82a0ab3317da32aa5030a48590eda2e870bf23388ad4a77ce4a9c1602a1790248439ea8ceac3291824978332266fc39d548822b2f1dc93eb1ddbbcd326c312feac5bb24345cf0d4193657ea1d1e3bec3cb07fc858b924108aaea74415e12c861a355335ea8bc6507834bf42395d9e52c75846986b395ddd3&x-goog-algorithm=GOOG4-RSA-SHA256&x-goog-credential=qdx-store-user%40humming-bird-321603.iam.gserviceaccount.com%2F20240321%2Fasia-southeast1%2Fstorage%2Fgoog4_request&x-goog-date=20240321T063139Z&x-goog-expires=3600&x-goog-signedheaders=host'
2.3) Downloads
We provide a utility to download files into your workspace, you can
either provide a filename, which will be saved in
workspace/objects/[filename]
, or you can provide your own filepath
which the client will use as-is:
protein_qdxf_file = prepared_protein_qdxf.download(overwrite=True)
# qdxf files can be loaded as json
with open(protein_qdxf_file) as f:
protein_qdxf_data = json.load(f)[0]
protein_qdxf_data["amino_acid_seq"][:10]
['MET', 'GLU', 'ASN', 'PHE', 'GLN', 'LYS', 'VAL', 'GLU', 'LYS', 'ILE']
prepared_protein_pdb.download(filename="01_prepared_protein.pdb", overwrite=True)
PosixPath('/home/machineer/qdx/rush-py-quickstart/objects/01_prepared_protein.pdb')
# we can read our prepared protein pdb like this
with open(client.workspace / "objects" / "01_prepared_protein.pdb", "r") as f:
print(f.readline(), "...")
REMARK 1 CREATED WITH OPENMM 8.0, 2024-02-29
...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.