Skip to main content

mlmm - ML/MM toolkit for enzyme reaction analysis

Project description

ML/MM toolkit — Towards Accelerated Mechanistic Investigation of Enzymatic Reactions

Overview

Overview of ML/MM toolkit

mlmm-toolkit is an open-source CLI for ML/MM ONIOM analyses of enzymatic reactions. It replaces the QM region of conventional QM/MM with a machine-learning interatomic potential (MLIP, default: UMA) while keeping the surrounding protein under an analytical Amber force field (hessian_ff), and chains MM parametrization → ML-region selection → MEP search → TS optimization → IRC → thermochemical correction → DFT single-point in one command. A link-atom boundary handles amino-acid residues straddling the ML/MM cut, and a microiteration scheme makes TS optimization and Hessian-based methods tractable on ~10 000-atom systems.

Test a reaction mechanism in a single command:

# Multi-structure MEP (R + P endpoints → MEP, with TS optimization + thermo)
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' --tsopt --thermo

For scan-mode on a single structure and the bundled methyltransferase walk-through, see examples/. Each stage is also exposed as an individual subcommand.

Prerequisites: input PDBs must already contain hydrogens; multiple PDBs must share the same atoms in the same order (only coordinates differ). mlmm all runs mm-parm automatically. Match -l RES:CHARGE to the H count actually present (e.g. SAM with 23 H = SAM:1 cation, 22 H = SAM:0 neutral) — full input-prep checklist in docs/getting-started.md.

Related tools

Tool Use case
pdb2reaction Pure-MLIP reaction paths for cluster models and small molecules from PDB / XYZ / GJF.
uma_pysis Lightweight YAML-driven UMA–pysisyphus interface for quick/exploratory reaction-mechanism studies (GS / TS / IRC / ΔG).

mlmm-toolkit bundles a GPU-optimized pysisyphus fork that is not compatible with upstream pysisyphus — do not install it into an environment that already has upstream pysisyphus.

Documentation

System requirements

Component Requirement
OS / Python Linux recommended; native Windows unsupported (AmberTools/tleap unavailable). Python >= 3.11.
GPU / CUDA / VRAM NVIDIA GPU, CUDA >= 12.6 (12.8+ recommended; required for RTX 50-series). 8 GB+ VRAM recommended.
RAM / Disk 32 GB+ RAM recommended; 20 GB free disk for the conda env, AmberTools, UMA cache, and artifacts.

Required external tools: AmberTools (tleap) and pdbfixerconda install -c conda-forge ambertools pdbfixer -y. CPU-only execution works for setup commands but is 10–100× slower for any ML/MM dynamics or Hessian step. Full requirement and tuning details: docs/getting-started.md#installation.

Installation

# 1. New env + AmberTools + CUDA-enabled PyTorch
conda create -n mlmm-toolkit python=3.11 -y && conda activate mlmm-toolkit
conda install -c conda-forge ambertools pdbfixer -y
pip install torch --index-url https://download.pytorch.org/whl/cu129

# 2. mlmm-toolkit (editable from a local clone, or `pip install mlmm-toolkit`)
pip install -e .

# 3. Authenticate Hugging Face once (only required for the default UMA backend)
#    Accept the FAIR Chemistry License v1 at https://huggingface.co/facebook/UMA, then:
hf auth login                               # interactive
# OR: export HF_TOKEN=hf_xxx && hf auth login --token "$HF_TOKEN"   # CI / HPC

Avoid AmberTools conflicts: on clusters with a system AmberTools module loaded, run module unload amber before installing to prevent a ParmEd conflict with the conda-installed AmberTools.

Optional extras (install only what you need):

Extra Adds
[orb] / [aimnet] Orb / AIMNet2 MLIP backend — not HF-gated
[dft] PySCF + GPU4PySCF single-point DFT (--dft / mlmm dft) — practical to ~500 ML-region atoms
[mcp] Model Context Protocol server (mlmm-mcp) for agent clients
[pdbfixer] PDBFixer extra (alternative to the conda install above)

The MACE backend (-b mace) is not a pip extra: mace-torch pins e3nn==0.4.4, which conflicts with fairchem-core's e3nn>=0.5 (UMA), so it needs a dedicated environment — pip uninstall -y fairchem-core && pip install mace-torch (see docs/getting-started.md#installation).

CUDA module-load recipes, alternative-backend installs, DMF / cyipopt, Plotly Chromium, and HPC job-script templates: docs/getting-started.md and docs/device-hpc.md.

Preparing an Enzyme-Substrate System

For most systems the only hard requirement is a PDB with explicit hydrogens (at the intended protonation state). mlmm all then builds the MM topology, selects the ML region, and runs the whole pipeline in one command — see Quick Examples for the three input modes (multi-structure R → P, single-structure scan, TS-only). The preparation steps below are optional.

  1. Build a structural model of the complex. Download coordinates from the Protein Data Bank. If an experimental structure is not available, use structure-prediction programs such as AlphaFold3, Boltz2, or Chai; docking programs; or GUI software such as PyMOL. Add hydrogens at the intended protonation state (or let mm-parm --add-h --ph 7 add them). For multi-structure (R → P) runs, every PDB must share the same atoms in the same order.

  2. (Optional) Build the MM topology yourself — it is automatic by default. mlmm all (via mlmm mm-parm) generates the Amber .parm7 / .rst7 from the PDB automatically; unknown residues (ligands, cofactors) are parameterized with GAFF2 / AM1-BCC — pass formal charges with -l 'RES:CHARGE'. Build the topology by hand when it helps — a custom force field, special solvation, or a system the automatic route cannot handle — then pass it with --parm. To mimic aqueous conditions, solvate the complex and remove water molecules beyond ~6 Å (see the OpenMM cookbook / tleap).

    Note: elemental information (columns 77–78) is omitted in PDB files generated by tleap. Use mlmm add-elem-info to fix this.

  3. (Optional) Define the ML region yourself. mlmm all extracts the ML region from -c/--center and -r/--radius automatically. To define it yourself instead, build an ML-region PDB — with mlmm extract or any molecular viewer — and feed it to mlmm all (or the per-stage subcommands) with --model-pdb; this skips the automatic extraction:

    mlmm extract -i complex.pdb -c 'SAM,GPP' -r 6.0 -l 'SAM:1,GPP:-3' -o ml_region.pdb
    

    Important: atom order, residue names, and residue numbers must match between the full PDB and the ML-region PDB. (In PyMOL, tick "Original atom order" when exporting.)

Quick Examples

# Multi-structure MEP (R + P → MEP, with TS + thermo + DFT)
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' --tsopt --thermo --dft

# Scan mode (single structure → staged bond scans → MEP)
mlmm all -i R.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
    --scan-lists "[('SAM 359 CS1','GPP 360 C8',1.3)]"

# TS-only validation (existing TS candidate)
mlmm tsopt -i TS_candidate_layered.pdb --parm complex.parm7 -q 1 --opt-mode grad

For Gaussian-ONIOM / ORCA-QM/MM round-trips use oniom-export / oniom-import. Per-stage walkthrough (mm-parmextractdefine-layeroptpath-searchtsoptfreqircdft): docs/getting-started.md and docs/quickstart-all.md. Working scripts (methyltransferase + toy_system): examples/.

Output

A run writes its deliverables to --out-dir (default ./result_all/):

  • segments/seg_NN/{reactant,ts,product}.pdb — the canonical R / TS / P structures to cite
  • mep.pdb / mep_trj.xyz — the merged reaction path; energy_diagram_MEP.png — barrier diagram
  • summary.log (human-readable) / summary.json (machine-readable)
  • Reusable inputs for follow-up runs: ml_region.pdb (--model-pdb), mm_parm/*.parm7 (--parm), layered/ (B-factor-annotated full-system PDBs)

Pipeline scratch lives under _work/ (safe to delete). Full layout and filename conventions: docs/output-layout.md.

CLI Subcommands

Subcommand Role Doc
all (default) End-to-end: mm-parm → extract → MEP → TS → IRC → freq → DFT all
mm-parm Generate parm7/rst7 via AmberTools mm-parm
extract Extract active-site pocket extract
define-layer Assign 3-layer ML/MM B-factor encoding define-layer
add-elem-info / fix-altloc Repair PDB element columns / resolve altlocs add-elem-info · fix-altloc
opt / tsopt Geometry / TS optimization opt · tsopt
path-opt / path-search MEP via GSM/DMF; recursive refinement path-opt · path-search
scan / scan2d / scan3d 1D / 2D / 3D bond-distance scans scan · scan2d · scan3d
freq / irc Vibrational analysis + thermo / IRC (EulerPC) freq · irc
dft / sp Single-point DFT / single-point ML/MM ONIOM dft · sp
bond-summary Compare structures, report bond changes bond-summary
trj2fig / energy-diagram Energy plot / R→TS→P diagram trj2fig · energy-diagram
oniom-export / oniom-import Gaussian ONIOM / ORCA QM/MM round-trip oniom-export · oniom-import

3-layer system (ML / Movable-MM / Frozen-MM, B-factor encoded), link-atom treatment, units (eV·Å in core / Ha·Bohr in pysisyphus CLI): docs/concepts.md. Python API (MLMMCore, MLMMASECalculator, pysisyphus mlmm calculator): docs/python-api.md.

Getting Help

mlmm --help                       # top-level
mlmm <subcmd> --help              # core options
mlmm <subcmd> --help-advanced     # full option set

Issues: https://github.com/t-0hmura/mlmm_toolkit/issues.

Citation

@article{ohmura2025mlmm,
  author = {Ohmura, Takuto and Inoue, Sei and Terada, Tohru},
  title  = {ML/MM Toolkit -- Towards Accelerated Mechanistic Investigation of Enzymatic Reactions},
  year   = {2025}, journal = {ChemRxiv}, doi = {10.26434/chemrxiv-2025-jft1k}
}

Agent Skills

Agent Skills for Claude Code / Codex / Cursor etc. in skills/ — copy into your project's skill location (e.g. .claude/skills/) to let an agent drive mlmm-toolkit workflows and subcommands.

Known limitations

  • MACE + UMA cannot coexist (e3nn version conflict). Use separate conda envs.
  • DFT single-point is practical to ~500 ML-region atoms; larger regions incur high computational cost.
  • ORB backend sometimes converges TS with extra soft imaginary modes — prefer UMA / MACE for clean single-saddle spectra.
  • CPU-only execution is 10–100× slower than GPU; AmberTools (tleap) is required for mm-parm.

Contributing

Issues and pull requests are welcome — see CONTRIBUTING.md.

License

GNU General Public License v3 (GPL-3.0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlmm_toolkit-0.3.0.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlmm_toolkit-0.3.0-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file mlmm_toolkit-0.3.0.tar.gz.

File metadata

  • Download URL: mlmm_toolkit-0.3.0.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlmm_toolkit-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ea60d87b6e0561063b8cbb40e2e3645a5420966ce075b4e35e6b50148dd52f4d
MD5 bd456a56a0d30a9ab749b34075c6cfae
BLAKE2b-256 d4b924bf9555bc2ec75c2096e901f25ed1d945952bc1f01aa73b0860137c3b9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlmm_toolkit-0.3.0.tar.gz:

Publisher: release.yml on t-0hmura/mlmm_toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlmm_toolkit-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: mlmm_toolkit-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlmm_toolkit-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e383cfee5d6aa305ad60beb2c2aaf123b4e63f8aad651f721f0424fd77197cea
MD5 3dcfcf7dc2d2edb662ecc6bf4156b067
BLAKE2b-256 b0fd0317edfa83e525f281fdfb36a82e16212e4d06df60bad89a45da830d8956

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlmm_toolkit-0.3.0-py3-none-any.whl:

Publisher: release.yml on t-0hmura/mlmm_toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page