Semi-QM ensemble rescoring and xTB/ALPB binding energy analysis toolkit.
Project description
scientiflow-xtbsa
ScientiFlow xTB-SA: automated selection of representative MD snapshots and semi-empirical QM/MM ONIOM single-point evaluations to estimate protein–ligand interaction energies.
Maintained by Scientiflow. For end-to-end automated protein–ligand MD workflows integrated with GROMACS and deployment at scale, see https://scientiflow.com/.
What it does (high level)
- Input: MD topology+trajectory (e.g., GROMACS .tpr + .xtc).
- Sampling: PCA on protein Cα coordinates; KMeans to pick N representative frames.
- Extraction: For each selected frame, write XYZ for complex, protein, and ligand.
- QM/MM region: Define a QM inner region around the ligand using a cutoff (Å). Optionally include whole residues within the cutoff or only atoms within the cutoff.
- Energetics: Run xTB with ONIOM (gfn2:gfnff) and ALPB water for complex, protein, and ligand; compute a per-frame interaction energy proxy: ΔG_bind,proxy ≈ E_complex − (E_protein + E_ligand)
- Reporting: Write scientiflow_xtbsa_report.csv (kcal/mol) and plots into the output directory.
About the equation
- We compute an interaction/binding free-energy proxy from single-point ONIOM energies in implicit solvent: ΔG_bind,proxy ≈ E_complex − (E_protein + E_ligand)
- Sign convention: Negative values suggest favorable binding. Positive values can occur for some frames and are expected. The meaningful metric is the ensemble average across many frames (report mean ± SEM).
- Scope: This proxy omits vibrational/rotational/translational entropies and standard-state corrections. For absolute ΔG°, add such corrections or use rigorous alchemical methods. As implemented, this is similar in spirit to MM/PBSA, but with a QM/MM inner region.
Prerequisites (user)
- xTB executable available on PATH (https://github.com/grimme-lab/xtb)
- Python 3.9+ and internet access to install Python dependencies
Install (user)
-
Install the package: pip install scientiflow-xtbsa
-
Verify the CLI is available: scientiflow-xtbsa --help
Quick start
Basic GROMACS-first usage (defaults used where possible):
-
Select frames, build QM/MM regions, run xTB, write CSV and plots:
scientiflow-xtbsa --tpr path/to/topology.tpr --traj path/to/trajectory.xtc --outdir frames
Customize ligand residue name (default LIG):
-
The selections are derived automatically from the ligand name.
scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --lig-resname UNL1 --outdir frames
Disable whole-residue rule (include only atoms within cutoff): scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --no-whole-residue
Control frame count and cutoff; tune xTB threading/stack: scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc -n 20 --qm-cutoff 6 --omp-threads 1 --omp-stacksize 8G --kmp-stacksize 8G
Turn off plot generation: scientiflow-xtbsa --tpr samples/md_0_10.tpr --traj samples/md_0_10_5ns.xtc --no-display-plots
Note for other MD engines:
- Pass a structure/topology readable by MDAnalysis (e.g., Amber .prmtop + .nc/.dcd, CHARMM/NAMD .psf + .dcd, or .pdb + .dcd) via --tpr (name kept for consistency; the file itself can be any MDAnalysis-supported topology).
CLI flags and defaults
| Flag | Type | Default | Description |
|---|---|---|---|
| --tpr | Path | required | GROMACS .tpr preferred; for other engines, pass a topology compatible with MDAnalysis (.prmtop/.psf/.pdb/.gro). |
| --traj | Path | required | Trajectory file (.xtc/.trr/.dcd/.nc etc.). |
| --lig-resname | str | LIG | Ligand residue name used in selections. |
| --nframes, -n | int | 15 | Number of representative frames (PCA+KMeans). |
| --outdir, -o | Path | frames | Output directory for frames/plots. |
| --qm-cutoff | float | 6.0 | Cutoff (Å) to define QM inner region around the ligand. |
| --whole-residue / --no-whole-residue | bool | True | If true, include entire residues with any atom within cutoff; else include only atoms within cutoff. |
| --omp-threads | int | 1 | OMP_NUM_THREADS for xTB subprocesses. |
| --omp-stacksize | str | 8G | OMP_STACKSIZE for xTB. |
| --kmp-stacksize | str | 8G | KMP_STACKSIZE for xTB. |
| --stack-unlimited / --no-stack-unlimited | bool | True | Attempt to raise process stack limit (POSIX), similar to ulimit -s unlimited. |
| --display-plots / --no-display-plots | bool | True | Generate plots (timeseries, histogram, box) into outdir. |
| --force, -f | bool | False | Overwrite outdir if it exists. |
| --verbose, -v | bool | False | Stream xTB output and print progress in real time. |
Derived selections (no flags needed):
- PCA selection: "protein and name CA" (protein backbone Cα).
- Write selection: "(protein or resname LIGAND)" where LIGAND is the provided --lig-resname (default LIG).
Outputs
- outdir/frame_XXX_complex.xyz, frame_XXX_protein.xyz, frame_XXX_ligand.xyz: per-frame structures (numbering per file).
- outdir/qm_region.json: ONIOM QM region index strings for each frame:
- complex: indices in complex.xyz numbering for QM inner region (ligand + nearby protein).
- protein: indices in protein.xyz numbering for the protein part of the QM region.
- ligand: indices 1..n for ligand.xyz.
- scientiflow_xtbsa_report.csv: per-frame energies (kcal/mol) and ΔG_bind,proxy.
- outdir/xtbsa_dg_timeseries.png: per-frame ΔG.
- outdir/xtbsa_dg_hist.png: ΔG distribution.
- outdir/xtbsa_dg_box.png: box plot with mean (star).
Best practices
- Use many frames (e.g., 20–100) to stabilize averages; report mean ± SEM.
- Ensure protonation states and net charges are consistent across complex/protein/ligand systems for xTB.
- Consider both whole-residue and atom-level cutoff schemes to assess sensitivity.
- Negative average ΔG_bind,proxy indicates favorable binding; individual frames may be positive.
Troubleshooting
-
"No ligand atoms found for resname 'XXX' within write-selection."
Ensure --lig-resname matches the residue name in your topology (case-sensitive). The write selection is derived as(protein or resname XXX). -
xTB very slow or stalls:
Use--omp-threads 1 --omp-stacksize 8G --kmp-stacksize 8G --stack-unlimited(default settings mimic recommended shell script tweaks). Run with--verboseto stream xTB output.
How it works (technical detail)
- MDAnalysis loads the topology+trajectory.
- PCA on "protein and name CA" → project frames → KMeans to N clusters → choose medoids.
- For each chosen frame:
- Write complex/protein/ligand XYZ.
- Build QM region by distance from ligand (cutoff Å), either by atoms or by whole residues.
- Map indices carefully between complex/protein/ligand numberings and serialize to qm_region.json.
- For each frame, run xTB ONIOM (gfn2:gfnff, ALPB water) for complex/protein/ligand; parse total energies; compute: ΔG_bind,proxy ≈ E_complex − (E_protein + E_ligand) Convert to kcal/mol and write CSV; generate plots (optional).
Attribution: This package is maintained by Scientiflow. For integrated solutions with GROMACS and production pipelines, visit https://scientiflow.com/.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scientiflow_xtbsa-0.1.4.tar.gz.
File metadata
- Download URL: scientiflow_xtbsa-0.1.4.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.17.0-22-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5cfc4e5224c55e68a07c51824d46213e00eca8008114fe120f0c6260891a5db
|
|
| MD5 |
fa53d70f97a9fdf27fb2fde0277ae461
|
|
| BLAKE2b-256 |
4752a8155764797bae31f46a187ab54de648fe00d1bdcd6298c684cbaaa39a3d
|
File details
Details for the file scientiflow_xtbsa-0.1.4-py3-none-any.whl.
File metadata
- Download URL: scientiflow_xtbsa-0.1.4-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.17.0-22-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93324c61fd768d28cc3e4bd72967394a79626af1e44adfca2ecf5cb568bbea73
|
|
| MD5 |
ac9965e462cafabef77d52bf24e4b66d
|
|
| BLAKE2b-256 |
a19b6c9986c81ae3c7659fe6fd1b2db3e2d0e005a5f7817000f6041c370c95dc
|