mlmm - ML/MM toolkit for enzyme reaction analysis
Project description
ML/MM toolkit — Toward Accelerated Mechanistic Investigation of Enzymatic Reactions
Overview
Quantum mechanics/molecular mechanics (QM/MM) methods have long been used to analyze enzymatic reaction mechanisms in silico. While treating the active site with QM and the remainder with MM reduces the computational cost, the inherently high cost of the QM calculation remains a major limitation. Replacing QM with Machine Learning (ML) interatomic potentials yields ML/MM approaches that further reduce computational cost while retaining accuracy.
ML/MM toolkit is an open-source command-line toolkit for ML/MM calculations. It streamlines the workflow necessary for enzymatic reaction mechanism analyses — energy minimization, transition-state (TS) search, and vibrational analysis — to calculate the reaction free energy (ΔG) and activation free energy (ΔG‡) for a given enzyme–substrate complex structure. A link atom boundary treatment is implemented to include amino-acid residues in the ML region, and full-system Hessians are available for TS searches and vibrational analyses. A microiteration scheme separates ML and MM degrees of freedom, enabling efficient TS optimization and Hessian-based methods for systems comprising around 10,000 atoms.
Each workflow step is also available as an individual subcommand (opt, scan, scan2d, path-search, tsopt, freq, irc, dft, energy-diagram, etc.), allowing fine-grained control over each stage. A single command can generate a first-pass enzymatic reaction path with ML/MM accuracy:
mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2"
The full workflow — MEP search → TS optimization → IRC → thermochemistry → single-point DFT — can be run in one command:
mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2" --tsopt --thermo --dft
Given (i) two or more PDB files (R → ... → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt, mlmm-toolkit automatically:
- extracts an active-site pocket around user-defined substrates,
- assigns ONIOM regions (ML / Movable MM / Frozen MM) via B-factor encoding,
- generates MM parameters (parm7/rst7) using AmberTools,
- explores minimum-energy paths (MEPs) with GSM,
- optionally optimizes transition states, runs vibrational analysis, IRC, and single-point DFT.
Important (prerequisites):
- Input PDB files must already contain hydrogen atoms.
- When providing multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ).
- A parm7 topology file (AmberTools) is required for MM calculations; use
mlmm mm-parmto generate one.
Related tools
| Tool | Use case | Repository |
|---|---|---|
| pdb2reaction | CLI-based reaction mechanism analysis for cluster models and small molecules | https://github.com/t-0hmura/pdb2reaction |
| UMA–Pysisyphus Interface | YAML-input-based reaction mechanism analysis for small molecules | https://github.com/t-0hmura/uma_pysis |
mlmm-toolkit additionally automates MM force field generation and ML region assignment from a single PDB input — the all workflow requires only PDB files to run the full pipeline.
Both mlmm-toolkit and pdb2reaction include a custom GPU-optimized pysisyphus fork for geometry optimization, TS search, and IRC. This bundled fork is not compatible with the upstream pysisyphus package; do not install them side by side.
Installation
Quick install
For CUDA 12.6:
conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install mlmm-toolkit
huggingface-cli login
For CUDA 12.9 (recommended for RTX 50 series):
conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install mlmm-toolkit
huggingface-cli login
Previous version: v0.2.0 introduced breaking changes to the CLI interface and Hessian handling. To install the previous stable release (v0.1.1) from GitHub:
pip install git+https://github.com/t-0hmura/mlmm_toolkit.git@v0.1.1
Prerequisites
| Requirement | Notes |
|---|---|
| Python >= 3.11 | 3.12 also supported |
| CUDA runtime >= 12.6 | CUDA 12.9 recommended for RTX 50 series |
| Linux / WSL 2 | — |
Full setup (conda)
conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install mlmm-toolkit
plotly_get_chrome -y
huggingface-cli login
If you are on an HPC cluster that uses environment modules, load CUDA before installing PyTorch:
module load cuda/12.6
fairchem-coreis a core dependency.aimnet2and other ML backends (orb-models,mace-torch) are optional extras (e.g.pip install "mlmm-toolkit[aimnet2]").
DFT single-point (mlmm dft)
DFT dependencies are not installed by default. To use mlmm dft, install the [dft] extra:
pip install "mlmm-toolkit[dft]"
This installs PySCF, GPU4PySCF (x86_64 only), and related CUDA libraries. Note that DFT single-point calculations are practical only for systems up to ~500 atoms in the ML region; larger systems will require prohibitive compute time and memory.
Avoid AmberTools conflicts (optional)
If AmberTools is loaded, unload it before installing to prevent conflicts for ParmEd:
module unload amber
For DMF method
conda install -c conda-forge cyipopt -y
Authenticate for UMA (Hugging Face)
UMA model weights are on Hugging Face Hub. You need to log in once (see https://github.com/facebookresearch/fairchem):
huggingface-cli login
Supported ML potentials
| Potential | Repository | Install extra |
|---|---|---|
| UMA (default) | https://github.com/facebookresearch/fairchem | (included) |
| ORB | https://github.com/orbital-materials/orb-models | pip install "mlmm-toolkit[orb]" |
| MACE | https://github.com/ACEsuit/mace | see note below |
| AIMNet2 | https://github.com/isayevlab/aimnetcentral | pip install "mlmm-toolkit[aimnet2]" |
Note: MACE and UMA cannot coexist due to conflicting
e3nnversions (fairchem-corerequirese3nn>=0.5,mace-torchrequirese3nn==0.4.4). To use MACE, uninstallfairchem-corefirst:pip uninstall fairchem-core pip install mace-torchThis means UMA will no longer be available in that environment. We recommend using a separate conda environment for MACE.
Preparing an Enzyme–Substrate System
-
Build a structural model of the complex. Download coordinates from Protein Data Bank. If an experimental structure is not available, use structure prediction programs such as AlphaFold3, docking simulation programs, or GUI software such as PyMOL.
-
Generate parameter/topology and coordinate files. Create
.pdb,.parm7, and.rst7files of the complex (see the OpenMM tutorial). To mimic aqueous conditions, solvate the complex, then remove water molecules beyond ~6 Å.Note: elemental information (columns 77–78) is omitted in PDB files generated by tleap. Use
mlmm add-elem-infoto fix this. -
Define the ML region. Use
mlmm extractor any molecular viewer to select residues around the substrate:mlmm extract -i complex.pdb -c PRE -r 6.0 -l "PRE:-2" -o ml_region.pdb
Important: atom order, residue names, and residue numbers must match between the full PDB and the ML-region PDB. (In PyMOL, tick "Original atom order" when exporting.)
Quick Examples
Full workflow (multi-structure)
mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2" \
--tsopt --thermo --dft
Scan mode (single structure)
mlmm all -i R.pdb -c PRE -l "PRE:-2" \
--scan-lists "[('PRE 353 O1\'','PRE 353 C3',1.2)]"
TS optimization only
mlmm tsopt -i TS_candidate_layered.pdb --parm complex.parm7 \
-q 1 --opt-mode light
Step-by-step workflow
# 1. Generate MM parameters
mlmm mm-parm -i complex.pdb -l "PRE:-2"
# 2. Extract active-site pocket
mlmm extract -i complex.pdb -c '353' -l "PRE:-2" -r 6.0
# 3. Assign 3-layer ONIOM regions
mlmm define-layer -i complex.pdb --model-pdb pocket.pdb --radius-freeze 10.0
# 4. Optimize geometry
mlmm opt -i complex_layered.pdb --parm complex.parm7 -q 1 --opt-mode heavy
# 5. MEP search
mlmm path-search -i R_layered.pdb P_layered.pdb --parm complex.parm7 -q 1
# 6. TS optimization
mlmm tsopt -i hei.pdb --parm complex.parm7 -q 1 --opt-mode light
# 7. Frequency analysis
mlmm freq -i ts_optimized.pdb --parm complex.parm7 -q 1
# 8. IRC
mlmm irc -i ts_optimized.pdb --parm complex.parm7 -q 1
# 9. DFT single-point
mlmm dft -i optimized.pdb --parm complex.parm7 -q 1
Fully working example scripts are provided in the examples/ directory.
Start with the minimal examples/toy_system/ example, then explore a realistic enzyme case in examples/methyltransferase/.
CLI Subcommands
Workflow
| Subcommand | Role | Documentation |
|---|---|---|
all |
End-to-end: extraction → MEP → TS → IRC → freq → DFT | docs/all.md |
Structure Preparation
| Subcommand | Role | Documentation |
|---|---|---|
mm-parm |
Generate parm7/rst7 topology via AmberTools | docs/mm_parm.md |
extract |
Extract active-site pocket (cluster model) | docs/extract.md |
define-layer |
Assign 3-layer B-factor encoding | docs/define_layer.md |
add-elem-info |
Add/repair PDB element columns (77-78) | docs/add_elem_info.md |
fix-altloc |
Resolve alternate conformations | docs/fix_altloc.md |
Optimization & Path Search
| Subcommand | Role | Documentation |
|---|---|---|
opt |
Geometry optimization (L-BFGS or RFO) | docs/opt.md |
tsopt |
TS optimization (Dimer or RS-I-RFO) | docs/tsopt.md |
path-opt |
MEP optimization via GSM or DMF | docs/path_opt.md |
path-search |
Recursive MEP search with refinement | docs/path_search.md |
scan |
1D bond-length driven scan | docs/scan.md |
scan2d |
2D distance grid scan | docs/scan2d.md |
scan3d |
3D distance grid scan | docs/scan3d.md |
Analysis
| Subcommand | Role | Documentation |
|---|---|---|
freq |
Vibrational frequency analysis + thermochemistry | docs/freq.md |
irc |
IRC calculation (EulerPC) | docs/irc.md |
dft |
Single-point DFT (GPU4PySCF / PySCF) | docs/dft.md |
Visualization & Export
| Subcommand | Role | Documentation |
|---|---|---|
trj2fig |
Energy plot from XYZ trajectory | docs/trj2fig.md |
energy-diagram |
Energy diagram from numeric values | docs/energy_diagram.md |
oniom-export |
Generate Gaussian ONIOM / ORCA QM/MM input | docs/oniom_export.md |
oniom-import |
Import ONIOM input and reconstruct XYZ/layered PDB | docs/oniom_import.md |
3-Layer System
mlmm-toolkit uses a 3-layer system encoded in PDB B-factor columns:
| Layer | B-factor | Description | Hessian |
|---|---|---|---|
| ML | 0.0 | Active site (UMA energy/force/Hessian) | Yes |
| Movable MM | 10.0 | MM atoms allowed to move | No (by B-factor alone) |
| Frozen | 20.0 | Distant protein (coordinates fixed) | No |
Covalent bonds cut at the ML/MM boundary are capped with hydrogen link atoms. The ML region can therefore include entire amino-acid side chains when necessary.
Use mlmm define-layer to assign layers automatically based on a model PDB (the ML region).
The MM layer uses an analytical Hessian force field (hessian_ff) by default. Any force field capable of generating Amber parameters (e.g., ff19SB + OPC3 + GAFF2) can be used.
Units
| Interface | Energy | Distance | Force | Hessian |
|---|---|---|---|---|
| Core & ASE | eV | Å | eV Å-1 | eV Å-2 |
| PySisyphus (CLI) | Hartree | Bohr | Ha Bohr-1 | Ha Bohr-2 |
Unit conversions are handled automatically inside the ML/MM calculator.
Python API
See Python API Reference for full documentation.
Three API levels are available: MLMMCore (base engine), MLMMASECalculator (ASE), and mlmm (pysisyphus Calculator).
MLMMCore (Base-Level)
from mlmm import MLMMCore
core = MLMMCore(
input_pdb = "complex.pdb", # Full system PDB (protein + substrate + solvent)
real_parm7 = "complex.parm7", # Amber topology for the full system
model_pdb = "ml_region.pdb", # ML region only (trimmed PDB)
model_charge = -1, # Formal charge of the ML region including link H atoms
model_mult = 1, # Spin multiplicity of the ML region
)
from ase.io import read
atoms = read("structure.pdb")
results = core.compute(atoms.get_positions(), return_forces=True, return_hessian=True)
energy = results["energy"] # float, eV
forces = results["forces"] # ndarray (N, 3), eV/Å
hessian = results["hessian"] # torch.Tensor (3N, 3N), eV/Å^2
ASE Interface
from mlmm import MLMMCore, MLMMASECalculator
core = MLMMCore(
input_pdb="complex.pdb", real_parm7="complex.parm7",
model_pdb="ml_region.pdb", model_charge=-1, model_mult=1,
)
calc = MLMMASECalculator(core)
atoms.calc = calc
energy = atoms.get_potential_energy() # eV
pysisyphus YAML Workflows
mlmm pysis opt.yaml
See pysis documentation for YAML format and examples.
Notes
complex.pdbis the reference PDB used when the Amber parameters were generated, whereasstructure.pdbcan contain any starting geometry.- If
model_chargeis omitted, it defaults to 0. Always set it explicitly for charged systems.- v0.1.x parameter names (
real_pdb, etc.) andmlmm_ase()are still supported with deprecation warnings.
Getting Help
mlmm --help
mlmm <subcommand> --help
mlmm <subcommand> --help-advanced
--help shows core options. --help-advanced shows the full option list including advanced parameters.
Documentation
- Getting Started — Installation and first steps
- Concepts — 3-layer system, ONIOM, link atoms
- CLI Command Reference
- YAML Schema
- Troubleshooting — Common errors and fixes
- Full command index: docs/index.md
Citation
If you find this work helpful for your research, please cite:
[1] Ohmura, T., Inoue, S., Terada, T. (2025). ML/MM toolkit – Towards Accelerated Mechanistic Investigation of Enzymatic Reactions. ChemRxiv. doi:10.26434/chemrxiv-2025-jft1k
License
ML/MM toolkit is distributed under the GNU General Public License version 3 (GPL-3.0).
This software is still under development. Please use it at your own risk.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlmm_toolkit-0.2.2.dev0.tar.gz.
File metadata
- Download URL: mlmm_toolkit-0.2.2.dev0.tar.gz
- Upload date:
- Size: 7.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef66a4b0d803a70a2c7e2444c660d366f65c9a37914b05d11db029501a631e9b
|
|
| MD5 |
8c916db9cf4b5852b537c304ae1364ba
|
|
| BLAKE2b-256 |
7ac6e36bb9f0c7139f3c8bc5ba1f376db456d97d32bdf919a407eb8b608ecd22
|
Provenance
The following attestation bundles were made for mlmm_toolkit-0.2.2.dev0.tar.gz:
Publisher:
release.yml on t-0hmura/mlmm_toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlmm_toolkit-0.2.2.dev0.tar.gz -
Subject digest:
ef66a4b0d803a70a2c7e2444c660d366f65c9a37914b05d11db029501a631e9b - Sigstore transparency entry: 1116884773
- Sigstore integration time:
-
Permalink:
t-0hmura/mlmm_toolkit@462b56f30264477268a275d917cc567515d72bfc -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/t-0hmura
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@462b56f30264477268a275d917cc567515d72bfc -
Trigger Event:
release
-
Statement type:
File details
Details for the file mlmm_toolkit-0.2.2.dev0-py3-none-any.whl.
File metadata
- Download URL: mlmm_toolkit-0.2.2.dev0-py3-none-any.whl
- Upload date:
- Size: 3.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ba04575798566cd7ebeb86f0053a6872f11723a29a6f6b42b8b483d944bcb97
|
|
| MD5 |
1201564e23b16aa54a941954ca81e810
|
|
| BLAKE2b-256 |
c2e8043117f6d933b1b16280f08d3af9ed2729445b014f3a5bd620e40985bb8a
|
Provenance
The following attestation bundles were made for mlmm_toolkit-0.2.2.dev0-py3-none-any.whl:
Publisher:
release.yml on t-0hmura/mlmm_toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlmm_toolkit-0.2.2.dev0-py3-none-any.whl -
Subject digest:
3ba04575798566cd7ebeb86f0053a6872f11723a29a6f6b42b8b483d944bcb97 - Sigstore transparency entry: 1116884784
- Sigstore integration time:
-
Permalink:
t-0hmura/mlmm_toolkit@462b56f30264477268a275d917cc567515d72bfc -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/t-0hmura
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@462b56f30264477268a275d917cc567515d72bfc -
Trigger Event:
release
-
Statement type: