mlmm - ML/MM toolkit for enzyme reaction analysis

These details have not been verified by PyPI

Project description

ML/MM toolkit — Towards Accelerated Mechanistic Investigation of Enzymatic Reactions

Overview

mlmm-toolkit is an open-source CLI for ML/MM ONIOM analyses of enzymatic reactions. It replaces the QM region of conventional QM/MM with a machine-learning interatomic potential (MLIP, default: UMA) while keeping the surrounding protein under an analytical Amber force field (hessian_ff), and chains MM parametrization → ML-region selection → MEP search → TS optimization → IRC → thermochemical correction → DFT single-point in one command. A link-atom boundary handles amino-acid residues straddling the ML/MM cut, and a microiteration scheme makes TS optimization and Hessian-based methods tractable on ~10 000-atom systems.

Test a reaction mechanism in a single command:

# Multi-structure MEP (R + P endpoints → MEP, with TS optimization + thermo)
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' --tsopt --thermo

For scan-mode on a single structure and the bundled methyltransferase walk-through, see examples/. Each stage is also exposed as an individual subcommand.

Prerequisites: input PDBs must already contain hydrogens; multiple PDBs must share the same atoms in the same order (only coordinates differ). mlmm all runs mm-parm automatically. Match -l RES:CHARGE to the H count actually present (e.g. SAM with 23 H = SAM:1 cation, 22 H = SAM:0 neutral) — full input-prep checklist in docs/getting-started.md.

Related tools

Tool	Use case
pdb2reaction	Pure-MLIP reaction paths for cluster models and small molecules from PDB / XYZ / GJF.
uma_pysis	Lightweight YAML-driven UMA–pysisyphus interface for quick/exploratory reaction-mechanism studies (GS / TS / IRC / ΔG).

mlmm-toolkit bundles a GPU-optimized pysisyphus fork that is not compatible with upstream pysisyphus — do not install it into an environment that already has upstream pysisyphus.

Documentation

Getting Started · Concepts · Installation · Troubleshooting
Python API · CLI Conventions · YAML Reference · JSON Output Schema
Full command index: docs/index.md

System requirements

Component	Requirement
OS / Python	Linux recommended; native Windows unsupported (AmberTools/`tleap` unavailable). Python >= 3.11.
GPU / CUDA / VRAM	NVIDIA GPU, CUDA >= 12.6 (12.8+ recommended; required for RTX 50-series). 8 GB+ VRAM recommended.
RAM / Disk	32 GB+ RAM recommended; 20 GB free disk for the conda env, AmberTools, UMA cache, and artifacts.

Required external tools: AmberTools (tleap) and pdbfixer — conda install -c conda-forge ambertools pdbfixer -y. CPU-only execution works for setup commands but is 10–100× slower for any ML/MM dynamics or Hessian step. Full requirement and tuning details: docs/getting-started.md#installation.

Installation

# 1. New env + AmberTools + CUDA-enabled PyTorch
conda create -n mlmm-toolkit python=3.11 -y && conda activate mlmm-toolkit
conda install -c conda-forge ambertools pdbfixer -y
pip install torch --index-url https://download.pytorch.org/whl/cu129

# 2. mlmm-toolkit (editable from a local clone, or `pip install mlmm-toolkit`)
pip install -e .

# 3. Authenticate Hugging Face once (only required for the default UMA backend)
#    Accept the FAIR Chemistry License v1 at https://huggingface.co/facebook/UMA, then:
hf auth login                               # interactive
# OR: export HF_TOKEN=hf_xxx && hf auth login --token "$HF_TOKEN"   # CI / HPC

Avoid AmberTools conflicts: on clusters with a system AmberTools module loaded, run module unload amber before installing to prevent a ParmEd conflict with the conda-installed AmberTools.

Optional extras (install only what you need):

Extra	Adds
`[orb]` / `[aimnet]`	Orb / AIMNet2 MLIP backend — not HF-gated
`[dft]`	PySCF + GPU4PySCF single-point DFT (`--dft` / `mlmm dft`) — practical to ~500 ML-region atoms
`[mcp]`	Model Context Protocol server (`mlmm-mcp`) for agent clients
`[pdbfixer]`	PDBFixer extra (alternative to the conda install above)

The MACE backend (-b mace) is not a pip extra: mace-torch pins e3nn==0.4.4, which conflicts with fairchem-core's e3nn>=0.5 (UMA), so it needs a dedicated environment — pip uninstall -y fairchem-core && pip install mace-torch (see docs/getting-started.md#installation).

CUDA module-load recipes, alternative-backend installs, DMF / cyipopt, Plotly Chromium, and HPC job-script templates: docs/getting-started.md and docs/device-hpc.md.

Preparing an Enzyme-Substrate System

For most systems the only hard requirement is a PDB with explicit hydrogens (at the intended protonation state). mlmm all then builds the MM topology, selects the ML region, and runs the whole pipeline in one command — see Quick Examples for the three input modes (multi-structure R → P, single-structure scan, TS-only). The preparation steps below are optional.

Build a structural model of the complex. Download coordinates from the Protein Data Bank. If an experimental structure is not available, use structure-prediction programs such as AlphaFold3, Boltz2, or Chai; docking programs; or GUI software such as PyMOL. Add hydrogens at the intended protonation state (or let mm-parm --add-h --ph 7 add them). For multi-structure (R → P) runs, every PDB must share the same atoms in the same order.
(Optional) Build the MM topology yourself — it is automatic by default. mlmm all (via mlmm mm-parm) generates the Amber .parm7 / .rst7 from the PDB automatically; unknown residues (ligands, cofactors) are parameterized with GAFF2 / AM1-BCC — pass formal charges with -l 'RES:CHARGE'. Build the topology by hand when it helps — a custom force field, special solvation, or a system the automatic route cannot handle — then pass it with --parm. To mimic aqueous conditions, solvate the complex and remove water molecules beyond ~6 Å (see the OpenMM cookbook / tleap).

Note: elemental information (columns 77–78) is omitted in PDB files generated by tleap. Use mlmm add-elem-info to fix this.
(Optional) Define the ML region yourself. mlmm all extracts the ML region from -c/--center and -r/--radius automatically. To define it yourself instead, build an ML-region PDB — with mlmm extract or any molecular viewer — and feed it to mlmm all (or the per-stage subcommands) with --model-pdb; this skips the automatic extraction:
```
mlmm extract -i complex.pdb -c 'SAM,GPP' -r 6.0 -l 'SAM:1,GPP:-3' -o ml_region.pdb
```
Important: atom order, residue names, and residue numbers must match between the full PDB and the ML-region PDB. (In PyMOL, tick "Original atom order" when exporting.)

Quick Examples

# Multi-structure MEP (R + P → MEP, with TS + thermo + DFT)
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' --tsopt --thermo --dft

# Scan mode (single structure → staged bond scans → MEP)
mlmm all -i R.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
    --scan-lists "[('SAM 359 CS1','GPP 360 C8',1.3)]"

# TS-only validation (existing TS candidate)
mlmm tsopt -i TS_candidate_layered.pdb --parm complex.parm7 -q 1 --opt-mode grad

For Gaussian-ONIOM / ORCA-QM/MM round-trips use oniom-export / oniom-import. Per-stage walkthrough (mm-parm → extract → define-layer → opt → path-search → tsopt → freq → irc → dft): docs/getting-started.md and docs/quickstart-all.md. Working scripts (methyltransferase + toy_system): examples/.

Output

A run writes its deliverables to --out-dir (default ./result_all/):

segments/seg_NN/{reactant,ts,product}.pdb — the canonical R / TS / P structures to cite
mep.pdb / mep_trj.xyz — the merged reaction path; energy_diagram_MEP.png — barrier diagram
summary.log (human-readable) / summary.json (machine-readable)
Reusable inputs for follow-up runs: ml_region.pdb (--model-pdb), mm_parm/*.parm7 (--parm), layered/ (B-factor-annotated full-system PDBs)

Pipeline scratch lives under _work/ (safe to delete). Full layout and filename conventions: docs/output-layout.md.

CLI Subcommands

Subcommand	Role	Doc
`all` (default)	End-to-end: `mm-parm` → extract → MEP → TS → IRC → freq → DFT	all
`mm-parm`	Generate parm7/rst7 via AmberTools	mm-parm
`extract`	Extract active-site pocket	extract
`define-layer`	Assign 3-layer ML/MM B-factor encoding	define-layer
`add-elem-info` / `fix-altloc`	Repair PDB element columns / resolve altlocs	add-elem-info · fix-altloc
`opt` / `tsopt`	Geometry / TS optimization	opt · tsopt
`path-opt` / `path-search`	MEP via GSM/DMF; recursive refinement	path-opt · path-search
`scan` / `scan2d` / `scan3d`	1D / 2D / 3D bond-distance scans	scan · scan2d · scan3d
`freq` / `irc`	Vibrational analysis + thermo / IRC (EulerPC)	freq · irc
`dft` / `sp`	Single-point DFT / single-point ML/MM ONIOM	dft · sp
`bond-summary`	Compare structures, report bond changes	bond-summary
`trj2fig` / `energy-diagram`	Energy plot / R→TS→P diagram	trj2fig · energy-diagram
`oniom-export` / `oniom-import`	Gaussian ONIOM / ORCA QM/MM round-trip	oniom-export · oniom-import

3-layer system (ML / Movable-MM / Frozen-MM, B-factor encoded), link-atom treatment, units (eV·Å in core / Ha·Bohr in pysisyphus CLI): docs/concepts.md. Python API (MLMMCore, MLMMASECalculator, pysisyphus mlmm calculator): docs/python-api.md.

Getting Help

mlmm --help                       # top-level
mlmm <subcmd> --help              # core options
mlmm <subcmd> --help-advanced     # full option set

Issues: https://github.com/t-0hmura/mlmm_toolkit/issues.

Citation

@article{ohmura2025mlmm,
  author = {Ohmura, Takuto and Inoue, Sei and Terada, Tohru},
  title  = {ML/MM Toolkit -- Towards Accelerated Mechanistic Investigation of Enzymatic Reactions},
  year   = {2025}, journal = {ChemRxiv}, doi = {10.26434/chemrxiv-2025-jft1k}
}

Agent Skills

Agent Skills for Claude Code / Codex / Cursor etc. in skills/ — copy into your project's skill location (e.g. .claude/skills/) to let an agent drive mlmm-toolkit workflows and subcommands.

Known limitations

MACE + UMA cannot coexist (e3nn version conflict). Use separate conda envs.
DFT single-point is practical to ~500 ML-region atoms; larger regions incur high computational cost.
ORB backend sometimes converges TS with extra soft imaginary modes — prefer UMA / MACE for clean single-saddle spectra.
CPU-only execution is 10–100× slower than GPU; AmberTools (tleap) is required for mm-parm.

Contributing

Issues and pull requests are welcome — see CONTRIBUTING.md.

License

GNU General Public License v3 (GPL-3.0).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Jun 30, 2026

0.2.8

Apr 13, 2026

0.2.7

Apr 5, 2026

0.2.6

Apr 4, 2026

0.2.5

Mar 23, 2026

0.2.4

Mar 18, 2026

0.2.4.dev0 pre-release

Mar 18, 2026

0.2.3.dev0 pre-release

Mar 18, 2026

0.2.2.dev0 pre-release

Mar 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlmm_toolkit-0.3.0.tar.gz (5.8 MB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlmm_toolkit-0.3.0-py3-none-any.whl (1.0 MB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file mlmm_toolkit-0.3.0.tar.gz.

File metadata

Download URL: mlmm_toolkit-0.3.0.tar.gz
Upload date: Jun 30, 2026
Size: 5.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlmm_toolkit-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ea60d87b6e0561063b8cbb40e2e3645a5420966ce075b4e35e6b50148dd52f4d`
MD5	`bd456a56a0d30a9ab749b34075c6cfae`
BLAKE2b-256	`d4b924bf9555bc2ec75c2096e901f25ed1d945952bc1f01aa73b0860137c3b9c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlmm_toolkit-0.3.0.tar.gz:

Publisher: release.yml on t-0hmura/mlmm_toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlmm_toolkit-0.3.0.tar.gz
- Subject digest: ea60d87b6e0561063b8cbb40e2e3645a5420966ce075b4e35e6b50148dd52f4d
- Sigstore transparency entry: 2017548900
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: t-0hmura/mlmm_toolkit@90e303ca4c3563e8f70b3edf4c86fd327339e612
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/t-0hmura
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@90e303ca4c3563e8f70b3edf4c86fd327339e612
- Trigger Event: release

File details

Details for the file mlmm_toolkit-0.3.0-py3-none-any.whl.

File metadata

Download URL: mlmm_toolkit-0.3.0-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 1.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlmm_toolkit-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e383cfee5d6aa305ad60beb2c2aaf123b4e63f8aad651f721f0424fd77197cea`
MD5	`3dcfcf7dc2d2edb662ecc6bf4156b067`
BLAKE2b-256	`b0fd0317edfa83e525f281fdfb36a82e16212e4d06df60bad89a45da830d8956`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlmm_toolkit-0.3.0-py3-none-any.whl:

Publisher: release.yml on t-0hmura/mlmm_toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlmm_toolkit-0.3.0-py3-none-any.whl
- Subject digest: e383cfee5d6aa305ad60beb2c2aaf123b4e63f8aad651f721f0424fd77197cea
- Sigstore transparency entry: 2017549012
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: t-0hmura/mlmm_toolkit@90e303ca4c3563e8f70b3edf4c86fd327339e612
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/t-0hmura
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@90e303ca4c3563e8f70b3edf4c86fd327339e612
- Trigger Event: release

mlmm-toolkit 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ML/MM toolkit — Towards Accelerated Mechanistic Investigation of Enzymatic Reactions

Overview

Related tools

Documentation

System requirements

Installation

Preparing an Enzyme-Substrate System

Quick Examples

Output

CLI Subcommands

Getting Help

Citation

Agent Skills

Known limitations

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance