Skip to main content

Semi-QM ensemble rescoring and xTB/ALPB binding energy analysis toolkit.

Project description

scientiflow-xtbsa

ScientiFlow xTB-SA: Automated selection of representative MD snapshots and semi-empirical QM/MM ONIOM single-point evaluations to estimate protein-€“ligand interaction energies.

Maintained by Scientiflow. For end-to-€‘end automated protein-€“ligand MD workflows integrated with GROMACS and deployment at scale, see https://scientiflow.com/.

Overview

  • Input: MD topology+trajectory (e.g., GROMACS .tpr/.xtc).
  • Sampling: PCA on protein Cα coordinates, KMeans clustering to select N representative frames.
  • Extraction: For each selected frame, write XYZ for complex, protein, and ligand.
  • QM/MM region: Define a QM inner region around the ligand using a cutoff (Å). You can include either whole residues within the cutoff or only those atoms within the cutoff.
  • Energetics: Run xTB with ONIOM (gfn2:gfnff) and ALPB implicit solvent for complex, protein, and ligand, then compute a per-frame interaction energy proxy: dG ≈ E_complex − (E_protein + E_ligand)
  • Reporting: Write scientiflow_xtbsa_report.csv (kcal/mol) and plots into the output directory.

Does the equation and explanation match?

  • What we compute is an interaction/binding free energy proxy from single-point ONIOM energies in implicit solvent: ΔG_bind,proxy ≈ E_complex − (E_protein + E_ligand)
  • Sign convention: negative values suggest favorable binding; positive values can occur for certain frames and are expected. The meaningful quantity is the ensemble average (mean over many frames) with its variation (std/SEM).
  • This proxy omits vibrational/rotational/translational entropies and standard-state corrections. If you need absolute ΔG°, consider adding these corrections. As implemented, this is closer to an MM/PBSA-style interaction energy but with QM/MM for the inner region.

Key assumptions and notes

  • ALPB water used for solvation; consistency of charge and spin across complex/protein/ligand runs is essential.
  • ONIOM partition: QM layer = atoms defined by cutoff rule; MM layer = remainder of write_selection.
  • Using full ligand as QM is supported and common.
  • Single-point energies are computed; no geometry optimization is performed.

Installation

  • Requires Python 3.9+ and the following Python packages: typer, rich, MDAnalysis, numpy, scikit-learn, matplotlib.
  • Requires xTB executable on PATH.
  • Optional: GROMACS only for generating the input trajectories.

Usage Basic example (defaults used where possible): poetry run scientiflow-xtbsa --top path/to/topology.tpr --traj path/to/trajectory.xtc --outdir frames

Customize ligand residue name (default LIG): poetry run scientiflow-xtbsa --top samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --lig-resname LIG --outdir frames

Disable whole-residue rule (only atoms within cutoff): poetry run scientiflow-xtbsa --top samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --no-whole-residue

Choose number of frames and cutoff; control xTB threading and stack: poetry run scientiflow-xtbsa --top samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc -n 20 --qm-cutoff 6 --omp-threads 1 --omp-stacksize 8G --kmp-stacksize 8G

Turn off plot generation: poetry run scientiflow-xtbsa --top samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --no-display-plots

CLI flags and defaults

Flag Type Default Description
--top Path required Topology file (.tpr/.top/.gro/.pdb).
--traj Path required Trajectory file (.xtc/.trr/.dcd).
--lig-resname str LIG Ligand residue name used in selections.
--nframes, -n int 15 Number of representative frames (PCA+KMeans).
--outdir, -o Path frames Output directory for frames/plots.
--qm-cutoff float 6.0 Cutoff (Å) to define QM inner region around ligand.
--whole-residue / --no-whole-residue bool True Include entire residues if any atom within cutoff; if false, include only atoms within cutoff.
--omp-threads int 1 OMP_NUM_THREADS for xTB subprocesses.
--omp-stacksize str 8G OMP_STACKSIZE for xTB.
--kmp-stacksize str 8G KMP_STACKSIZE for xTB.
--stack-unlimited / --no-stack-unlimited bool True Attempt to raise process stack limit (POSIX).
--display-plots / --no-display-plots bool True Generate plots (timeseries, histogram, box) into outdir.
--force, -f bool False Overwrite outdir if it exists.
--verbose, -v bool False Stream xTB output and print progress.

Derived selections (no flags needed)

  • PCA selection: "protein and name CA" (protein backbone Cα).
  • Write selection: "(protein or resname LIGAND)" where LIGAND is the provided --lig-resname (default LIG).

Outputs

  • outdir/frame_XXX_complex.xyz, frame_XXX_protein.xyz, frame_XXX_ligand.xyz: per-frame structures.
  • outdir/qm_region.json: indices for ONIOM QM region (1-based, matching each respective XYZ numbering rules described in code).
  • scientiflow_xtbsa_report.csv: per-frame energies (kcal/mol) and ΔG proxy.
  • outdir/xtbsa_dg_timeseries.png: per-frame ΔG.
  • outdir/xtbsa_dg_hist.png: distribution of ΔG across frames.
  • outdir/xtbsa_dg_box.png: box plot with mean starred.

Best practices

  • Use many frames (e.g., 20–100) to stabilize the average; report mean ± SEM.
  • Ensure protonation states and net charges are consistent across complex/protein/ligand systems.
  • Consider testing both whole-residue and atom-level cutoff schemes to assess sensitivity.
  • Negative ΔG (proxy) on average indicates favorable binding; individual frames can be positive.

Limitations and extensions

  • The current ΔG is a single-point interaction proxy in ALPB; it does not include explicit entropy or standard-state corrections.
  • For absolute binding free energies, consider entropy approximations and 1 M standard-state corrections, or rigorous alchemical methods.
  • The method can be extended to different QM levels (e.g., gfn1/gfn2) or different MM force fields via xTB options.

Attribution This package is maintained by Scientiflow. For integrated solutions with GROMACS and production pipelines, visit https://scientiflow.com/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scientiflow_xtbsa-0.1.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scientiflow_xtbsa-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file scientiflow_xtbsa-0.1.0.tar.gz.

File metadata

  • Download URL: scientiflow_xtbsa-0.1.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.14.0-33-generic

File hashes

Hashes for scientiflow_xtbsa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 08ed63a4b79660e8bc6318551836b956ef1185063d292835662e9c2f2986fb09
MD5 1905deceaf18c6bd66e9dcd1aec6af16
BLAKE2b-256 257097a1186d65dfa86759226150fb4c43d84e140b21671d213fc0fdd5088a35

See more details on using hashes here.

File details

Details for the file scientiflow_xtbsa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scientiflow_xtbsa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.14.0-33-generic

File hashes

Hashes for scientiflow_xtbsa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3a4c1dd7a7fd44216345e9fd708f9c9d3b4ff99f84c2c58349ae76cb2b4b3e1
MD5 73d773a55aec837a3728ab1c2a6f3901
BLAKE2b-256 11033ebd6a270c03be7f4f59863a8bd50cd5694f504f78a48b4b9a96be80f01e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page