Skip to main content

Python SDK for interacting with the QDX Tengu API and modules

Project description

tengu-py

Below we’ll walk through the process of building and running a drug discovery workflow using tengu!

First, install the following modules via pip - we require Python > 3.10

pip install tengu-py pdb-tools
import json
import os
import sys
import tarfile

from pdbtools import *
import requests
from datetime import datetime
from pathlib import Path

import tengu

0) Setup

# Set our token - ensure you have exported TENGU_TOKEN in your shell; or just replace the os.getenv with your token
TOKEN = os.getenv("TENGU_TOKEN")
# Define our project information
DESCRIPTION = "tengu-py demo notebook"
TAGS = ["qdx", "tengu-py", "demo"]
WORK_DIR = Path.home() / "qdx" / "tengu-py-demo"
OUT_DIR = WORK_DIR / "runs"
OUT_DIR.mkdir(parents=True, exist_ok=True)

# Set our inputs
SYSTEM_PDB_PATH = WORK_DIR / "test.pdb"
PROTEIN_PDB_PATH = WORK_DIR / "test_P.pdb"
LIGAND_SMILES_STR = "CCCc1ccccc1O"
LIGAND_PDB_PATH = WORK_DIR / "test_L.pdb"
# fetch datafiles
complex = pdb_fetch.fetch_structure("3HTB")
protein = pdb_delhetatm.remove_hetatm(pdb_selchain.select_chain(complex, "A"))
ligand = pdb_selres.select_residuese(complex, "JZ4")
with open(SYSTEM_PDB_PATH, 'w') as f:
    for l in complex:
        f.write(str(l))
with open(PROTEIN_PDB_PATH, 'w') as f:
    for l in protein:
        f.write(str(l))
with open(LIGAND_PDB_PATH, 'w') as f:
    for l in ligand:
        f.write(str(l))
# Get our client, for calling modules and using the tengu API
client = tengu.Provider(access_token=TOKEN)
# Get our latest modules as a dict[module_name, module_path]
modules = client.get_latest_module_paths()
  • module_name is a descriptive string and indicates the “function” the module is calling;
  • module_path is a versioned tengu “endpoint” for a module accessible via the client.

Using the same module_path string across multiple runs provides reproducibility.

The following is an example of how save and load frozen modules:

frozen_modules_filepath = client.save_module_paths(modules)
frozen_modules = client.load_module_paths(frozen_modules_filepath)
assert(modules == frozen_modules)

You could save modules and provide a fixed string to load_module_paths:

FROZEN_MODULES_FILEPATH = 'tengu-modules-20231006T132244.json'
frozen_modules = client.load_module_paths(FROZEN_MODULES_FILEPATH)

Below we’ll call modules using client.run2(...).

The parameters to client.run2() are as follows: - module_path: The endpoint of the module we’ll be running; - args: A list of the arguments to the module; an argument can be one of the following: 1. A pathlib.Path or a file-like object like BufferedReader, FileIO, StringIO etc.:
Loads the data in the file as an argument.
NOTE: The uploaded value isn’t just the string of the file, so don’t pass the string directly; pass the path or wrap in StringIO. 2. A tengu.ArgId:
Uses an object already uploaded to tengu, such as outputs of other run calls.
See below for more details. It’s easier to understand when you see an example. 3. A parameter, i.e. a value of any other type, including None:
Tengu modules take configs as json in the backend; we’ll convert for you.
Just pass arguments directly, as per the schema for the module you’re running. - target: The machine we want to run on (NIX_SSH for a cluster, GADI for a supercomputer). - resources: The resources to use on the target. - tags: Tags to associate with our run, so we can easily look up our runs.

The return value is a dict that contains: - key "module_instance_id" -> val is a ModuleInstanceId for the run itself; - key "output_ids" -> val is a list of ArgIds, one for each output.

Both of these ID types have the form of a UUID. This ID lets you manipulate the output of this module without having to: 1) Wait for the module to finish its computation, or 2) Download the actual value corresponding to this output.

You can pass it to subsequent modules as if it were the value itself, or you can wait on it to obtain the value itself.

A coming improvement will provide explicit naming and type info for the inputs and outputs of each module, which will improve clarity and discoverability.

1.1) Prep the protein

pdb2pqr_result = client.run2(
    modules["pdb2pqr_tengu"],
    [
        PROTEIN_PDB_PATH,
    ],
    target="NIX_SSH",
    resources={"gpus": 1, "storage": 1_024_000_000, "walltime": 15},
    tags=TAGS,
)
pdb2pqr_run_id = pdb2pqr_result["module_instance_id"]
prepped_protein_id = pdb2pqr_result["output_ids"][0]
print(f"{datetime.now().time()} | Running protein prep!")
with open(OUT_DIR / f"01-pdb2pqr-{pdb2pqr_run_id}.json", "w") as f:
    json.dump(pdb2pqr_result, f, default=str, indent=2)
client.poll_module_instance(pdb2pqr_run_id)
client.download_object(prepped_protein_id, OUT_DIR / "01-prepped-protein.pdb")
print(f"{datetime.now().time()} | Downloaded prepped protein!")

1.2) Prep the ligand

ligand_prep_config = {
    "source": "",
    "output_folder": "./",
    "job_manager": "multiprocessing",
    "num_processors": -1,
    "max_variants_per_compound": 1,
    "thoroughness": 3,
    "separate_output_files": True,
    "min_ph": 6.4,
    "max_ph": 8.4,
    "pka_precision": 1.0,
    "skip_optimize_geometry": True,
    "skip_alternate_ring_conformations": True,
    "skip_adding_hydrogen": False,
    "skip_making_tautomers": True,
    "skip_enumerate_chiral_mol": True,
    "skip_enumerate_double_bonds": True,
    "let_tautomers_change_chirality": False,
    "use_durrant_lab_filters": True,
}
ligand_prep_result = client.run2(
    modules["prepare_ligand_tengu"],
    [
        LIGAND_SMILES_STR,
        LIGAND_PDB_PATH,
        ligand_prep_config,
    ],
    target="NIX_SSH",
    resources={"gpus": 1, "storage": 16_000_000, "walltime": 5},
    tags=TAGS,
)
ligand_prep_run_id = ligand_prep_result["module_instance_id"]
prepped_ligand_id = ligand_prep_result["output_ids"][0]
print(f"{datetime.now().time()} | Running ligand prep!")
with open(OUT_DIR / f"01-prepare-ligand-{ligand_prep_run_id}.json", "w") as f:
    json.dump(ligand_prep_result, f, default=str, indent=2)
client.poll_module_instance(ligand_prep_run_id)
client.download_object(prepped_ligand_id, OUT_DIR / "01-prepped-ligand.pdb")
print(f"{datetime.now().time()} | Downloaded prepped ligand!")

2) Run GROMACS (module: gmx_tengu / gmx_tengu_pdb)

gmx_config = {
    "param_overrides": {
        "md": [("nsteps", "5000")],
        "em": [("nsteps", "1000")],
        "nvt": [("nsteps", "1000")],
        "npt": [("nsteps", "1000")],
        "ions": [],
    },
    "num_gpus": 4,
    "num_replicas": 1,
    "ligand_charge": None,
    "frame_sel": {
        "begin_time": 2,
        "end_time": 10,
        "delta_time": 2,
    },
}
gmx_result = client.run2(
    # TODO: Should be using qdxf conformer verions of these modules
    modules["gmx_tengu_pdb"],
    [
        prepped_protein_id,
        prepped_ligand_id,
        gmx_config,
    ],
    target="GADI",
    resources={"gpus": 4, "storage": 1_024_000_000, "cpus": 48, "walltime": 60},
    tags=TAGS,
)
gmx_run_id = gmx_result["module_instance_id"]
gmx_output_id = gmx_result["output_ids"][0]
gmx_ligand_gro_id = gmx_result["output_ids"][3]
print(f"{datetime.now().time()} | Running GROMACS simulation!")
with open(OUT_DIR / f"02-gmx-{gmx_run_id}.json", "w") as f:
    json.dump(gmx_result, f, default=str, indent=2)
client.poll_module_instance(gmx_run_id, n_retries=60, poll_rate=60)
client.download_object(gmx_output_id, OUT_DIR / "02-gmx-output.zip")
# Get the "dry" (i.e. non-solvated) frames we asked for
with tarfile.open(OUT_DIR / "02-gmx-output.zip", "r") as tf:
    selected_frame_pdbs = [
        tf.extractfile(member)
        for member in sorted(tf, key=lambda m: m.name)
        if ("dry" in member.name and "pdb" in member.name)
    ]
client.download_object(gmx_ligand_gro_id, OUT_DIR / "02-gmx-ligand.gro")
print(f"{datetime.now().time()} | Downloaded GROMACS output!")

3.1) Run quantum energy calculation (modules: qp_gen_inputs, hermes_energy, qp_collate)

# We have a helper function for this, as it combines 3 modules without much need
# to inspect the intermediate results.
(_, _, qp_result) = client.run_qp(
    modules["qp_gen_inputs"],
    modules["hermes_energy"],
    modules["qp_collate"],
    pdb=selected_frame_pdbs[0],  # extractfile returns a BufferedReader, which is file-like
    gro=gmx_ligand_gro_id,
    lig=prepped_ligand_id,
    lig_type="sdf",
    lig_res_id="UNL",  # The ligand's residue code in the PDB file; this is what our prep uses
    target="GADI",
    resources={"storage": 1_024_000_000, "walltime": 600},
    tags=TAGS,
)
qp_run_id = qp_result["module_instance_id"]
qp_interaction_energy_id = qp_result["output_ids"][0]
print(f"{datetime.now().time()} | Running QP energy calculation!")
with open(OUT_DIR / f"03-qp-{qp_run_id}.json", "w") as f:
    json.dump(qp_result, f, default=str, indent=2)
client.poll_module_instance(qp_run_id)
client.download_object(qp_interaction_energy_id, OUT_DIR / "03-qp-interaction-energy.json")
print(f"{datetime.now().time()} | Downloaded qp interaction energy!")

3.2) Run MM-PBSA

mmpbsa_config = [
    401,  # start frame
    901,  # end frame
    None,  # optional argument for overriding raw GROMACS parameters
    12,  # num_cpus
]
mmpbsa_result = client.run2(
    modules["gmx_mmpbsa_tengu"],
    [
        gmx_output_id,
        *mmpbsa_config,
    ],
    target="GADI",
    resources={"storage": 1_024_000_000, "walltime": 600},
    tags=TAGS,
)
mmpbsa_run_id = mmpbsa_result["module_instance_id"]
mmpbsa_output_id = mmpbsa_result["output_ids"][0]
print(f"{datetime.now().time()} | Running GROMACS MM-PBSA calculation!")
with open(OUT_DIR / f"03-mmpbsa-{mmpbsa_run_id}.json", "w") as f:
    json.dump(mmpbsa_result, f, default=str, indent=2)
client.poll_module_instance(mmpbsa_run_id)
client.download_object(mmpbsa_output_id, OUT_DIR / "03-mmpbsa-output.zip")
print(f"{datetime.now().time()} | Downloaded MM-PBSA results!")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tengu_py-0.12.2.tar.gz (19.9 kB view hashes)

Uploaded Source

Built Distribution

tengu_py-0.12.2-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page