Skip to main content

pdb2reaction - Automated enzyme reaction path elucidation from PDB structures

Project description

pdb2reaction: End-to-End Reaction-Path Elucidation from PDB Structures Using Machine-Learning Interatomic Potentials

Overview

pdb2reaction workflow overview

pdb2reaction is a Python CLI for elucidating enzymatic reaction pathways from PDB structures using machine-learning interatomic potentials (MLIPs). Given (i) two or more PDB files (R → ... → P), (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt, it extracts an active-site cluster model, runs an MEP search, and optionally chains TS optimization → IRC → thermochemical correction → DFT single-point. Each stage is also exposed as an individual subcommand.

Test a reaction mechanism in a single command:

# Multi-PDB mode (R + P endpoints → MEP, with TS optimization + thermo)
pdb2reaction all -i R.pdb P.pdb -c 'LIG' -l 'LIG:-1' --tsopt --thermo

Inputs are not limited to full enzyme PDBs: pass a small molecule as .xyz / .gjf, or a cluster model you built yourself as a PDB, and omit --center/-c to skip extraction — the same end-to-end pipeline then runs on the structure as given.

Prerequisites: input PDBs must already contain hydrogens; multiple PDBs must share the same atoms in the same order (only coordinates differ). Small-molecule .xyz / .gjf inputs work when --center/-c and --ligand-charge/-l are omitted.

Related tools

Tool Use case
mlmm-toolkit ML/MM ONIOM with the full protein environment; automates MM parameterization and ML-region assignment from a single PDB.
uma_pysis Lightweight YAML-driven UMA–pysisyphus interface for quick/exploratory reaction-mechanism studies (GS / TS / IRC / ΔG).

pdb2reaction bundles a GPU-optimized pysisyphus fork that is not compatible with upstream pysisyphus — do not install it into an environment that already has upstream pysisyphus.

Documentation

System requirements

Component Requirement
OS / Python Linux recommended. Python >= 3.11.
GPU / CUDA / VRAM NVIDIA GPU, CUDA >= 12.6 (12.8+ recommended; required for RTX 50-series). 8 GB+ VRAM recommended.
RAM / Disk 16 GB+ RAM recommended; 20 GB free disk for the conda env, UMA cache, and artifacts.

CPU-only execution works but is 10–100× slower; not recommended for full TS / IRC / Hessian workflows. Full requirement and tuning details: docs/installation.md.

Installation

# 1. CUDA-enabled PyTorch (match your CUDA runtime)
pip install torch --index-url https://download.pytorch.org/whl/cu129

# 2. pdb2reaction (editable from a local clone, or `pip install pdb2reaction`)
pip install -e .

# 3. Authenticate Hugging Face once (only required for the default UMA backend)
#    Accept the FAIR Chemistry License v1 at https://huggingface.co/facebook/UMA, then:
hf auth login                               # interactive
# OR: export HF_TOKEN=hf_xxx && hf auth login --token "$HF_TOKEN" --add-to-git-credential   # CI / HPC

Optional extras (install only what you need):

Extra Adds
[orb] / [aimnet] Orb / AIMNet2 MLIP backend (-b orb / -b aimnet2) — not HF-gated
[dft] PySCF + GPU4PySCF single-point DFT (--dft / pdb2reaction dft)
[mcp] Model Context Protocol server for agent clients

The MACE backend (-b mace) is not a pip extra: mace-torch pins e3nn==0.4.4, which conflicts with fairchem-core's e3nn>=0.5 (UMA), so it needs a dedicated environment — pip uninstall -y fairchem-core && pip install mace-torch (see docs/installation.md).

CUDA module loads, alternative-backend recipes, DMF/cyipopt setup, Plotly Chromium, and HPC job-script templates: docs/installation.md and docs/hpc-example.md.

Quick Examples

Examples use GPP C6-methyltransferase BezA (Tsutsumi et al., Angew. Chem. Int. Ed. 2022, 61, e202111217) — runnable MEP and scan commands are in examples/run.sh.

# Multi-structure MEP (R + P → MEP, with TS + thermochemistry)
pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --tsopt --thermo --out-dir result_mep

# Scan mode (single structure → staged bond scan → MEP)
pdb2reaction -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    -s '[("CS1 SAM 320","GPP 321 C7",1.60)]' --tsopt --thermo --out-dir result_scan

# TS-only validation (single TS candidate → tsopt → IRC → freq)
pdb2reaction -i TS_candidate.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' --tsopt --thermo --out-dir result_tsonly

pdb2reaction can also be used to investigate reaction mechanisms of small molecules and user-defined cluster models.

# Small molecule (gas-phase): .xyz / .gjf input — omit -c, set charge with -q
pdb2reaction -i reactant.xyz product.xyz -q 0 --tsopt --thermo --out-dir result_small

# Your own cluster model (already-trimmed PDB): omit -c to use it as-is
pdb2reaction -i cluster_R.pdb cluster_P.pdb -q 0 --tsopt --thermo --out-dir result_cluster

Per-stage walkthrough (extractoptpath-opttsoptfreqircdft): docs/getting-started.md and docs/quickstart-all.md.

Output

A run writes its deliverables to --out-dir (default ./result_all/):

  • segments/seg_NN/{reactant,ts,product}.* — the canonical R / TS / P structures to cite
  • mep.pdb / mep_trj.xyz — the merged reaction path
  • energy_diagram_MEP.png — barrier diagram across all segments
  • summary.log (human-readable) / summary.json (machine-readable)

Pipeline scratch lives under _work/ (safe to delete). Full layout and filename conventions: docs/output-layout.md.

CLI Subcommands

Subcommand Role Doc
all (default) End-to-end: extract → MEP → TS → IRC → freq → DFT all
extract Build active-site cluster model extract
fix-altloc Resolve PDB alternate conformations fix-altloc
add-elem-info Repair PDB element columns (77–78) add-elem-info
opt Geometry optimization (L-BFGS / RFO) opt
tsopt TS optimization (Dimer / RS-I-RFO) tsopt
path-opt MEP via GSM or DMF path-opt
path-search Recursive MEP search with refinement path-search
scan / scan2d / scan3d 1D / 2D / 3D bond-distance scans scan · scan2d · scan3d
freq Vibrational analysis + thermochemistry freq
irc IRC (EulerPC) irc
dft Single-point DFT (GPU4PySCF / PySCF) dft
sp Single-point MLIP energy / forces / Hessian sp
bond-summary Compare structures, report bond changes bond-summary
trj2fig / energy-diagram Energy plot / R→TS→P diagram trj2fig · energy-diagram

Getting Help

pdb2reaction --help                       # top-level
pdb2reaction <subcmd> --help              # core options
pdb2reaction <subcmd> --help-advanced     # full option set

Issues: https://github.com/t-0hmura/pdb2reaction/issues.

Citation

@misc{ohmura2026pdb2reaction,
  author = {Ohmura, Takuto and Sato, Hajime and Terada, Tohru},
  title  = {pdb2reaction: End-to-End Reaction-Path Elucidation from PDB Structures Using Machine-Learning Interatomic Potentials},
  year   = {2026}, doi = {10.26434/chemrxiv.15003538}, note = {ChemRxiv preprint}
}

Agent Skills

Agent Skills for Claude Code / Codex / Cursor etc. in skills/ — copy into your project's skill location (e.g. .claude/skills/) to let an agent drive pdb2reaction workflows and subcommands.

Known limitations

  • MACE + UMA cannot coexist (e3nn version conflict). Use separate conda envs.
  • DFT single-point is practical up to ~300 atoms; larger systems incur high computational cost.
  • ORB backend sometimes converges TS with extra soft imaginary modes — for clean single-saddle spectra prefer UMA / MACE.
  • CPU-only execution is 10–100× slower than GPU.

Contributing

Issues and pull requests are welcome — see CONTRIBUTING.md.

License

GNU General Public License v3 (GPL-3.0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdb2reaction-0.4.0.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdb2reaction-0.4.0-py3-none-any.whl (816.5 kB view details)

Uploaded Python 3

File details

Details for the file pdb2reaction-0.4.0.tar.gz.

File metadata

  • Download URL: pdb2reaction-0.4.0.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdb2reaction-0.4.0.tar.gz
Algorithm Hash digest
SHA256 95a4fc3f6aeccfa6ca0ffd6a7b91751e62289f23287337a72e73628af1fe5501
MD5 62a96f091991441f01c0632ca228d625
BLAKE2b-256 fc7a6da886539ac625111c50de59353680896fa5c47abc152551fa142e9e0ad8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdb2reaction-0.4.0.tar.gz:

Publisher: release.yml on t-0hmura/pdb2reaction

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdb2reaction-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pdb2reaction-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 816.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdb2reaction-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f46e54ebd275b3ad8ee0adbcf3582a10a793c55cb2ac0f234993d47e24848a8f
MD5 ee275904f9fab229d214346e7919d0cb
BLAKE2b-256 0efb5deada7e80ee14a4ce492eff98b2a8a554a5a5b53540575e3ee26283c58d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdb2reaction-0.4.0-py3-none-any.whl:

Publisher: release.yml on t-0hmura/pdb2reaction

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page