Skip to main content

Semi-QM ensemble rescoring and xTB/ALPB binding energy analysis toolkit.

Project description

scientiflow-xtbsa

ScientiFlow xTB-SA: automated selection of representative MD snapshots and semi-empirical QM/MM ONIOM single-point evaluations to estimate protein–ligand interaction energies.

Maintained by Scientiflow. For end-to-end automated protein–ligand MD workflows integrated with GROMACS and deployment at scale, see https://scientiflow.com/.

What it does (high level)

  • Input: MD topology+trajectory (e.g., GROMACS .tpr + .xtc).
  • Sampling: PCA on protein Cα coordinates; KMeans to pick N representative frames.
  • Extraction: For each selected frame, write XYZ for complex, protein, and ligand.
  • QM/MM region: Define a QM inner region around the ligand using a cutoff (Å). Optionally include whole residues within the cutoff or only atoms within the cutoff.
  • Energetics: Run xTB with ONIOM (gfn2:gfnff) and ALPB water for complex, protein, and ligand; compute a per-frame interaction energy proxy: ΔG_bind,proxy ≈ E_complex − (E_protein + E_ligand)
  • Reporting: Write scientiflow_xtbsa_report.csv (kcal/mol) and plots into the output directory.

About the equation

  • We compute an interaction/binding free-energy proxy from single-point ONIOM energies in implicit solvent: ΔG_bind,proxy ≈ E_complex − (E_protein + E_ligand)
  • Sign convention: Negative values suggest favorable binding. Positive values can occur for some frames and are expected. The meaningful metric is the ensemble average across many frames (report mean ± SEM).
  • Scope: This proxy omits vibrational/rotational/translational entropies and standard-state corrections. For absolute ΔG°, add such corrections or use rigorous alchemical methods. As implemented, this is similar in spirit to MM/PBSA, but with a QM/MM inner region.

Prerequisites (user)

Install (user)

  • Install the package: pip install scientiflow-xtbsa

  • Verify the CLI is available: scientiflow-xtbsa --help

Quick start

Basic GROMACS-first usage (defaults used where possible):

  • Select frames, build QM/MM regions, run xTB, write CSV and plots:

    scientiflow-xtbsa --tpr path/to/topology.tpr --traj path/to/trajectory.xtc --outdir frames

Customize ligand residue name (default LIG):

  • The selections are derived automatically from the ligand name.

    scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --lig-resname UNL1 --outdir frames

Disable whole-residue rule (include only atoms within cutoff): scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --no-whole-residue

Control frame count and cutoff; tune xTB threading/stack: scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc -n 20 --qm-cutoff 6 --omp-threads 1 --omp-stacksize 8G --kmp-stacksize 8G

Turn off plot generation: scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --no-display-plots

Note for other MD engines:

  • Pass a structure/topology readable by MDAnalysis (e.g., Amber .prmtop + .nc/.dcd, CHARMM/NAMD .psf + .dcd, or .pdb + .dcd) via --tpr (name kept for consistency; the file itself can be any MDAnalysis-supported topology).

CLI flags and defaults

Flag Type Default Description
--tpr Path required GROMACS .tpr preferred; for other engines, pass a topology compatible with MDAnalysis (.prmtop/.psf/.pdb/.gro).
--traj Path required Trajectory file (.xtc/.trr/.dcd/.nc etc.).
--lig-resname str LIG Ligand residue name used in selections.
--nframes, -n int 15 Number of representative frames (PCA+KMeans).
--outdir, -o Path frames Output directory for frames/plots.
--qm-cutoff float 6.0 Cutoff (Å) to define QM inner region around the ligand.
--whole-residue / --no-whole-residue bool True If true, include entire residues with any atom within cutoff; else include only atoms within cutoff.
--omp-threads int 1 OMP_NUM_THREADS for xTB subprocesses.
--omp-stacksize str 8G OMP_STACKSIZE for xTB.
--kmp-stacksize str 8G KMP_STACKSIZE for xTB.
--stack-unlimited / --no-stack-unlimited bool True Attempt to raise process stack limit (POSIX), similar to ulimit -s unlimited.
--display-plots / --no-display-plots bool True Generate plots (timeseries, histogram, box) into outdir.
--force, -f bool False Overwrite outdir if it exists.
--verbose, -v bool False Stream xTB output and print progress in real time.

Derived selections (no flags needed):

  • PCA selection: "protein and name CA" (protein backbone Cα).
  • Write selection: "(protein or resname LIGAND)" where LIGAND is the provided --lig-resname (default LIG).

Outputs

  • outdir/frame_XXX_complex.xyz, frame_XXX_protein.xyz, frame_XXX_ligand.xyz: per-frame structures (numbering per file).
  • outdir/qm_region.json: ONIOM QM region index strings for each frame:
    • complex: indices in complex.xyz numbering for QM inner region (ligand + nearby protein).
    • protein: indices in protein.xyz numbering for the protein part of the QM region.
    • ligand: indices 1..n for ligand.xyz.
  • scientiflow_xtbsa_report.csv: per-frame energies (kcal/mol) and ΔG_bind,proxy.
  • outdir/xtbsa_dg_timeseries.png: per-frame ΔG.
  • outdir/xtbsa_dg_hist.png: ΔG distribution.
  • outdir/xtbsa_dg_box.png: box plot with mean (star).

Best practices

  • Use many frames (e.g., 20–100) to stabilize averages; report mean ± SEM.
  • Ensure protonation states and net charges are consistent across complex/protein/ligand systems for xTB.
  • Consider both whole-residue and atom-level cutoff schemes to assess sensitivity.
  • Negative average ΔG_bind,proxy indicates favorable binding; individual frames may be positive.

Troubleshooting

  • "No ligand atoms found for resname 'XXX' within write-selection."
    Ensure --lig-resname matches the residue name in your topology (case-sensitive). The write selection is derived as (protein or resname XXX).

  • xTB very slow or stalls:
    Use --omp-threads 1 --omp-stacksize 8G --kmp-stacksize 8G --stack-unlimited (default settings mimic recommended shell script tweaks). Run with --verbose to stream xTB output.

How it works (technical detail)

  1. MDAnalysis loads the topology+trajectory.
  2. PCA on "protein and name CA" → project frames → KMeans to N clusters → choose medoids.
  3. For each chosen frame:
    • Write complex/protein/ligand XYZ.
    • Build QM region by distance from ligand (cutoff Å), either by atoms or by whole residues.
    • Map indices carefully between complex/protein/ligand numberings and serialize to qm_region.json.
  4. For each frame, run xTB ONIOM (gfn2:gfnff, ALPB water) for complex/protein/ligand; parse total energies; compute: ΔG_bind,proxy ≈ E_complex − (E_protein + E_ligand) Convert to kcal/mol and write CSV; generate plots (optional).

Attribution: This package is maintained by Scientiflow. For integrated solutions with GROMACS and production pipelines, visit https://scientiflow.com/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scientiflow_xtbsa-0.1.2.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scientiflow_xtbsa-0.1.2-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file scientiflow_xtbsa-0.1.2.tar.gz.

File metadata

  • Download URL: scientiflow_xtbsa-0.1.2.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.14.0-35-generic

File hashes

Hashes for scientiflow_xtbsa-0.1.2.tar.gz
Algorithm Hash digest
SHA256 18030c6dcd4b79201d4f23ac05b772cf79ea3b967a7ed7f5be15be8023c8b7d6
MD5 96c6202ce1c734e88c82782e8a2f7283
BLAKE2b-256 ca9f9c51c96676cb6b52f3c309cf23cfcba5f4da5561b712fa9757b41df15b43

See more details on using hashes here.

File details

Details for the file scientiflow_xtbsa-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: scientiflow_xtbsa-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.14.0-35-generic

File hashes

Hashes for scientiflow_xtbsa-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c7f658a92d9e0a4bc2db2bee97b09705d07ecbe17377e289b40589e51fa50b58
MD5 b30b870c0bbb322a9d550f7ed9e426e8
BLAKE2b-256 205b1c705673d7ce097e3eff9536973e27f243ddee4d175699308a5c70b50edd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page