Skip to main content

FastMDXplora: Fully Automated SysTem for Molecular Dynamics eXploration

Project description

FastMDXplora

Fully Automated SysTem for Molecular Dynamics eXploration

DOI PyPI Python License: MIT Tests


FastMDXplora is a project-level orchestrator for end-to-end molecular dynamics studies. A single command takes a protein structure (or PDB ID) from input to publication-quality deliverable, coordinating four phases:

  setup  →  simulation  →  analysis  →  report

FastMDXplora is the next generation of FastMDAnalysis (Aina & Kwan, J. Comput. Chem. 2026, DOI: 10.1002/jcc.70350) — the same automated, reproducibility-by-design philosophy, extended from trajectory analysis to the full molecular dynamics study: setup, simulation (including enhanced sampling), protein and protein-ligand analysis, and reporting. It is not a generic workflow engine — the workflow is built-in, the domain knowledge is built-in, and the user expresses intent rather than describing a workflow graph (a DAG, or directed acyclic graph, the task-and-dependency model used by tools like Snakemake and Nextflow).

Highlights

  • Single-command end-to-end MD — from PDB to slides in one invocation
  • Protein-ligand ready — parameterize a small-molecule ligand (OpenFF) from a feasible bound pose; ligand-aware analyses (pose RMSD, contacts, protein-ligand H-bonds) run automatically
  • Project-level orchestrator pattern — shared state, registered phases, intelligent defaults, consolidated outputs
  • Granular control when you want it — run any single phase independently
  • Self-contained — the analysis and report phases have no heavy runtime dependencies
  • Reproducibility built in — every run writes a structured manifest of parameters, software versions, and artifact paths
  • Publication-quality reporting — automated slide deck, structured Markdown report, self-contained project bundle

Installation

FastMDXplora's four phases have different dependency footprints. The analysis and report phases work from pip alone; the setup and simulation phases need PDBFixer + OpenMM, which are distributed primarily through conda-forge. So there are two routes — pick by what you need.

Full install (all four phases) — from the git repo

The setup/simulation chemistry stack (OpenMM, PDBFixer) installs most reliably from conda-forge, so the full install uses the bundled environment.yml. We recommend mamba (a faster conda solver); plain conda works too.

git clone https://github.com/aai-research-lab/FastMDXplora.git
cd FastMDXplora
mamba env create -f environment.yml || conda env create -f environment.yml
conda activate fastmdxplora
pip install .

Don't have mamba? Either install Miniforge (see below), or just use conda — the || above falls back to it automatically.

Analysis + report only — from PyPI

If you only need to analyze existing trajectories and build reports (no simulation), plain pip is enough — no conda required:

pip install fastmdxplora              # primary package
pip install fastmdx                    # alias (resolves to fastmdxplora)

This gives a fully working analysis + report pipeline, slide deck included (python-pptx is a core dependency). The setup and simulation phases emit a clear warning and skip gracefully until the chemistry stack is present. Add it via conda-forge (recommended, reliable across platforms):

conda install -c conda-forge pdbfixer openmm

or best-effort via the [md] pip extras (PDBFixer wheels are unavailable on some platforms, so conda is preferred):

pip install "fastmdxplora[md]"

Development install

git clone https://github.com/aai-research-lab/FastMDXplora.git
cd FastMDXplora
mamba env create -f environment.yml || conda env create -f environment.yml
conda activate fastmdxplora
pip install -e ".[test]"               # editable, with the test dependencies

Verify

fastmdx --version
fastmdx info                           # versions + detected backends (OpenMM/PDBFixer)

Check which OpenMM platforms are available (CPU/CUDA/OpenCL):

python - <<'PY'
import openmm as mm
plats = [mm.Platform.getPlatform(i).getName() for i in range(mm.Platform.getNumPlatforms())]
print("Available platforms:", plats)
print("CUDA available" if "CUDA" in plats else "CPU-only — simulations will run on CPU")
PY

conda-forge package (coming soon). A single-command conda install -c conda-forge fastmdxplora (pulling every dependency, all four phases working out of the box) is planned once the recipe clears review. Until then, use the git + environment.yml route above.

Mamba / Miniforge (optional)

mamba is a drop-in, faster replacement for the conda solver — helpful because solving the OpenMM/CUDA stack is exactly where the classic solver is slow. If you don't have it, the easiest source is Miniforge (conda + mamba, preconfigured for conda-forge):

# Linux (x86_64) — see https://conda-forge.org/miniforge/ for macOS/Windows/ARM
curl -L -o "$HOME/Miniforge3.sh" \
  "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
bash "$HOME/Miniforge3.sh" -b -p "$HOME/miniforge3"
source "$HOME/miniforge3/etc/profile.d/conda.sh"
conda init "$(basename "$SHELL")"

If mamba still isn't on PATH afterward, add it to the base environment:

conda install -n base -c conda-forge mamba

For other operating systems (macOS Intel/Apple Silicon, Linux ARM64, Windows), grab the matching installer from the Miniforge releases page.

Examples

Command line

Run the full pipeline (setup → simulate → analyze → report):

fastmdx explore --system protein.pdb

Fetch a structure from the PDB by ID (auto-detected, fetched from RCSB):

fastmdx explore --system 1L2Y

Tune per-phase options (flags are namespaced by phase):

fastmdx explore -s protein.pdb --setup-ph 7.4 --simulate-duration-ns 100 --simulate-platform CUDA

Run only specific phases:

fastmdx explore -s protein.pdb --include setup simulation

Run a single phase (bare flags, no phase prefix):

fastmdx setup -s protein.pdb --ph 6.5
fastmdx simulate --output run_001 --duration-ns 50 --platform CUDA
fastmdx analyze --output run_001 --analyses rmsd rmsf rg

Drive a whole study from a config file (-c and -config also work):

fastmdx explore --config study.yml

Generate a commented config template to edit:

fastmdx init-config -o study.yml

The -s, -system, and --system forms are equivalent; xplore is an alias of explore.

Python API

Run the full pipeline:

from fastmdxplora import FastMDXplora

fmdx = FastMDXplora(system="protein.pdb")
fmdx.explore()

Specify options and select phases:

fmdx = FastMDXplora(system="1L2Y")          # PDB ID, fetched from RCSB
results = fmdx.explore(
    include=["setup", "simulation", "analysis"],
    options={
        "simulation": {"duration_ns": 100, "temperature_K": 310, "platform": "CUDA"},
        "analysis":   {"include": ["rmsd", "rg", "cluster"]},
    },
)
# explore() always returns a list of runs (a single study is a list of one)
for run in results:
    print(run.run_id, run.status)
    for phase in run.phases:
        print("  ", phase.name, phase.status)

Run a config file — one system, many systems, or a parameter sweep, all the same way:

fmdx = FastMDXplora(config="study.yml")
fmdx.explore()

Preview a run without executing (CLI --dry-run, or dry_run=True):

FastMDXplora(config="campaign.yml").explore(dry_run=True)

Recommended alias: import fastmdxplora as fastmdx.

See Configuration files and Many systems and parameter sweeps for the YAML format, batches, sweeps, and parallel execution.

Configuration files

For anything beyond a quick run, capture the whole study in a single YAML file instead of a long flag list. The same file drives both the CLI and the Python API. Input is always given as a systems: list — even for a single system — so the file looks the same whether you study one protein or a dozen.

Generate a commented template to start from:

fastmdx init-config                    # writes fastmdxplora.yml (comprehensive)
fastmdx init-config --minimal -o study.yml   # short starter

A study.yml looks like:

systems:
  - id: protein1
    system: protein.pdb        # PDB/CIF path, 4-char PDB ID, or sequence

output: ./my_study
include: [setup, simulation, analysis, report]

setup:
  ph: 7.4
  ion_concentration_M: 0.15

simulation:
  duration_ns: 100.0         # production length (equilibration is separate)
  temperature_K: 310.0
  platform: CUDA

analysis:
  include: [rmsd, rmsf, rg, cluster]
  selection: "name CA"
  options:
    cluster:
      methods: [kmeans, hierarchical]
      n_clusters: 5

report:
  title: "My MD Study"

Run it from the CLI or the API:

fastmdx explore --config study.yml     # also: -c, -config
from fastmdxplora import FastMDXplora
FastMDXplora(config="study.yml").explore()

With a single system and no sweep, the output uses the familiar flat layout (my_study/setup/, my_study/simulation/, …) with the usual manifest.json and resolved_config.yml. Three things make this robust:

  • Flags override the file. fastmdx explore --config study.yml --simulate-duration-ns 50 keeps everything in the file but runs 50 ns. Precedence is: command-line flags / API kwargs > config file > built-in defaults.
  • Strict validation. A typo like pH: (wrong case) or simulaton: is rejected with a did-you-mean suggestion, so a misspelled key never silently runs with the default.
  • Reproducibility. Every run writes resolved_config.yml — the fully-merged configuration that actually ran (defaults + file + overrides). Feed it straight back to --config to reproduce the study exactly.

For a quick command-line one-off, -s/--system is shorthand that builds a one-element systems list for you:

fastmdx explore -s protein.pdb --simulate-duration-ns 50

Many systems and parameter sweeps

Because input is always a systems: list, studying several systems is just adding entries. Add a sweep: block to vary parameters, and FastMDXplora runs the full cross-product — each as a complete, self-contained study.

output: ./trpcage_campaign
include: [setup, simulation, analysis, report]

systems:
  - id: trpcage1
    system: trpcage.pdb
  - id: trpcage2
    system: trpcage.pdb
    setup: { ph: 6.5 }                 # optional per-system overrides

sweep:
  simulation.temperature_K: [300, 310, 320]   # dotted phase.option → values
  simulation.pressure_bar: [1.0, 1.2]          # multiple axes → cross-product

That config produces 2 systems × 3 temperatures × 2 pressures = 12 runs. When there is more than one run, each goes in its own runs/<id>/ subdirectory, indexed by a top-level batch_manifest.json, with a cross-run comparison/ report:

trpcage_campaign/
  batch_manifest.json
  comparison/                                        (cross-run report)
  runs/
    trpcage1__temperature_K-300__pressure_bar-1.0/   (a full study)
    trpcage1__temperature_K-300__pressure_bar-1.2/
    ...

Run it exactly as any other config:

fastmdx explore --config campaign.yml
from fastmdxplora import FastMDXplora
FastMDXplora(config="campaign.yml").explore()

Each run is identical in structure to a single study (its own manifest.json, resolved_config.yml, and phase directories), so existing analysis tooling works per-run unchanged. Option precedence within a run is base config < per-system overrides < swept value. Typo'd sweep axes are rejected with the valid-option list, and a failed run is recorded while the others continue.

Cross-run comparison report

After a multi-run study, FastMDXplora automatically builds a comparison/ report at the batch root that turns a directory of runs into a single analysis:

  • Overlays — every run's per-frame trace (RMSD, Rg, Q-value, total SASA) drawn on one set of axes, labelled by its swept value, so divergence across the sweep is visible at a glance.
  • Trends — each run reduced to a summary scalar (e.g. mean RMSD over the trajectory) and plotted against the swept parameter, giving a structure-property relationship.
  • comparison_summary.csv — one row per run with the summary scalars, ready for further analysis.
  • comparison_report.md — a written report tying the figures together, with a one-line quantitative takeaway per property (e.g. "across temperature_K 300 → 320, mean RMSD increases 0.21 → 0.23 nm").

It degrades gracefully (errored runs and missing analyses are skipped) and can be turned off with report: { comparison: false }.

Parallel execution

By default runs execute sequentially. An optional execution: block runs several at once:

execution:
  mode: parallel          # sequential (default) | parallel
  workers: 2              # how many runs at once
  devices: [0, 1]         # GPU indices — one run pinned per device
  continue_on_error: true

Parallelism is process-based (each run is a subprocess, required because OpenMM contexts and the GIL don't share across threads). On GPU, the safe pattern is one run per GPU: list your devices and each worker is pinned to a distinct index round-robin. Oversubscribing a single GPU is slower than running sequentially, so workers should not exceed the number of devices on GPU. When workers is unset it defaults to one per device (GPU) or the CPU count capped at the run count (CPU).

The four phases

Phase Purpose Key outputs
setup System preparation (fix, protonate, solvate, ionize) prepared.pdb, solvated.pdb, setup_parameters.json
simulation Minimize, NVT, NPT, production MD production.dcd, topology.pdb, simulation_parameters.json
analysis RMSD, RMSF, Rg, H-bonds, SS, cluster, SASA, dim-red, Q-value, dihedrals <analysis>/*.dat, <analysis>/*.png, analysis_manifest.json
report Slides, structured report, project bundle report.md, slides.pptx, project_bundle.zip

Each phase writes to a dedicated subdirectory under the project output root and produces a structured parameters manifest, so every artifact is traceable to the exact options that produced it.

Documentation

Documentation is hosted at fastmdxplora.readthedocs.io (under development).

Citation

If you use FastMDXplora in your work, please cite the foundational FastMDAnalysis paper:

Aina, A.; Kwan, D. FastMDAnalysis: Software for Automated Analysis of Molecular Dynamics Trajectories. J. Comput. Chem. 2026, 47, e70350. DOI: 10.1002/jcc.70350

@article{aina2026fastmd,
  author  = {Aina, Adekunle and Kwan, Derrick},
  title   = {FastMDAnalysis: Software for Automated Analysis of Molecular Dynamics Trajectories},
  journal = {Journal of Computational Chemistry},
  volume  = {47},
  number  = {8},
  pages   = {e70350},
  year    = {2026},
  doi     = {10.1002/jcc.70350},
}

Contributing

Contributions are welcome. See CONTRIBUTING.md. FastMDXplora follows the Contributor Covenant.

License

MIT — see LICENSE.

Acknowledgements

FastMDXplora is developed in the AAI Research Lab at California State University Dominguez Hills. It builds on a deep ecosystem of open-source scientific Python: MDTraj, OpenMM, PDBFixer, NumPy, SciPy, scikit-learn, Matplotlib, python-pptx, and many others.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastmdxplora-2.0.0.tar.gz (229.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastmdxplora-2.0.0-py3-none-any.whl (172.2 kB view details)

Uploaded Python 3

File details

Details for the file fastmdxplora-2.0.0.tar.gz.

File metadata

  • Download URL: fastmdxplora-2.0.0.tar.gz
  • Upload date:
  • Size: 229.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastmdxplora-2.0.0.tar.gz
Algorithm Hash digest
SHA256 572e4a395d3863dd78285a2a3d1aceeceda35bcaa6be744c3ae6155749fb40eb
MD5 b39dd34c11ec8087fc3d1f4aaf044921
BLAKE2b-256 0023f42df706cd7e59ac5a74b0c6e73e4ac89af933e593173bd50b4be5fd9308

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastmdxplora-2.0.0.tar.gz:

Publisher: publish.yml on aai-research-lab/FastMDXplora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastmdxplora-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: fastmdxplora-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 172.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastmdxplora-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50a856dcc30efe1b9c9a9b95403d741cd5130100e5bddded9a1bcbb428aaea3b
MD5 35f25f13e940fb066f8500dae652959c
BLAKE2b-256 d28041bb19fbc7f2bfb7e1736c3190b243f23c3c299e46f4271f1886436abe92

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastmdxplora-2.0.0-py3-none-any.whl:

Publisher: publish.yml on aai-research-lab/FastMDXplora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page