Inverse virtual screening — dock one ligand against a whole protein library via GNINA.

These details have not been verified by PyPI

Project description

WTFDTB — High-Throughput Inverse Virtual Screening

Target Fishing: Dock a single small-molecule ligand against a library of macromolecular protein structures using a state-of-the-art ML/DL stack.

Python 3.10+ License: MIT Status: Alpha

What Is This?

Traditional virtual screening docks many ligands against one protein target. WTFDTB flips this: it docks one ligand against many proteins to answer the question — "What targets does this drug bind?"

This is called inverse virtual screening (or target fishing), and it's essential for:

Drug repurposing — finding new uses for existing drugs
Off-target prediction — identifying potential side effects
Polypharmacology — understanding multi-target drug activity
Natural product target deconvolution — identifying targets for bioactive compounds

WTFDTB automates the entire workflow from a raw ligand file to a ranked CSV of protein targets with interaction fingerprints — no manual intervention needed.

Pipeline Architecture

The pipeline runs in 5 sequential phases:

  ┌──────────────┐    ┌────────────────────┐    ┌──────────────────┐
  │  1. Ligand   │───▶│  2. Receptor       │───▶│  3. Pocket       │
  │     Prep     │    │     Curation       │    │     Detection    │
  │              │    │     (parallel)      │    │                  │
  │ Dimorphite-DL│    │ PDBFixer + PDB2PQR │    │     P2Rank       │
  │ RDKit + Meeko│    │ + PROPKA + Meeko   │    │     (Java ML)    │
  └──────────────┘    └────────────────────┘    └──────────────────┘
                                                         │
         ┌───────────────────────────────────────────────┘
         ▼
  ┌──────────────────┐    ┌──────────────────────┐
  │  4. Docking      │───▶│  5. Post-Docking      │
  │     (parallel)   │    │     Analysis           │
  │                  │    │                        │
  │     GNINA        │    │ ProLIF + Pandas        │
  │  (CNN-rescored)  │    │ Filter → Rank → CSV   │
  └──────────────────┘    └──────────────────────┘

Phase Details

Phase	Module	Tools	What It Does
1. Ligand Prep	`ligand_prep.py`	Dimorphite-DL, RDKit, Meeko	Enumerate protonation states at target pH, generate 3D conformer (ETKDGv3 + MMFF94), produce PDBQT with Gasteiger charges
2. Receptor Curation	`receptor_curation.py`	PDBFixer, PDB2PQR, PROPKA, pdb-tools	Download PDB from RCSB, strip HETATM/water, repair missing heavy atoms, protonate at target pH, parallelised across all targets
3. Pocket Detection	`pocket_detection.py`	P2Rank (Java)	ML-based druggable pocket prediction — no template bias, detects all possible binding sites per protein
4. Docking	`docking.py`	GNINA (C++)	CNN-rescored molecular docking for each pocket × ligand combination, parallelised with ProcessPoolExecutor
5. Post-Docking	`post_dock.py`	ProLIF, Pandas	Compute interaction fingerprints (H-bond, hydrophobic, π-stacking, salt bridge), apply CNNscore filter, rank by CNNaffinity, export CSV

Installation

Option A: Conda / Mamba (Recommended)

# Create environment with all dependencies including GNINA and Java
mamba create -n wtfdtb python=3.12
mamba activate wtfdtb
pip install -e .

Option B: From Source (Development)

git clone https://github.com/ChandraguptSharma07/WTFDTB.git
cd WTFDTB
python -m venv .venv
source .venv/bin/activate    # Linux/macOS
pip install -e ".[dev]"

External Dependencies

These binaries must be available on PATH:

Tool	Purpose	Install
GNINA	CNN-rescored docking engine	github.com/gnina/gnina or `mamba install gnina`
P2Rank	ML pocket detection	github.com/rdk/p2rank — requires Java ≥ 11
Java ≥ 11	Required by P2Rank	`mamba install openjdk`

Set PRANK_HOME to the P2Rank installation directory if it's not on your PATH:

export PRANK_HOME=/path/to/p2rank_2.4.2

Quick Start

Basic Usage

# Screen aspirin against 3 known kinase targets
echo "1EQG
2HZI
3K5V" > targets.txt

wtfdtb screen \
  --ligand aspirin.sdf \
  --targets targets.txt \
  --output results.csv

Using PDB IDs from a Text File

# targets.txt — one PDB ID per line
wtfdtb screen \
  --ligand my_compound.smi \
  --targets targets.txt \
  --output hits.csv \
  --ph 7.4 \
  --exhaustiveness 8 \
  --workers 4

Using a Directory of PDB Files

# Directory containing .pdb files
wtfdtb screen \
  --ligand drug.sdf \
  --targets ./protein_library/ \
  --output results.csv

SMILES Input

The ligand can be a .smi file with SMILES notation:

echo "CC(=O)Oc1ccccc1C(=O)O aspirin" > aspirin.smi
wtfdtb screen --ligand aspirin.smi --targets targets.txt -o results.csv

CLI Reference

wtfdtb screen [OPTIONS]

Flag	Type	Default	Description
`--ligand`, `-l`	Path	required	Input ligand file (`.sdf`, `.mol`, `.mol2`, `.smi`)
`--targets`, `-t`	Path	required	Protein target library — directory of `.pdb` files or text file of PDB IDs
`--output`, `-o`	Path	`results.csv`	Output CSV path for ranked docking results
`--ph`	float	`7.4`	Physiological pH for ligand and receptor protonation
`--box-size`	int	`25`	Side length (Å) of the cubic docking search box
`--cnn-model`	str	`default`	GNINA CNN model (`default`, `dense`, or path to weights)
`--cnn-score-threshold`	float	`0.5`	Minimum CNNscore (0–1) to accept a pose
`--min-interactions`	int	`1`	Minimum protein-ligand interactions to keep a pose (0 = no filter)
`--workers`, `-w`	int	CPU count	Parallel workers for receptor curation and docking
`--exhaustiveness`	int	`8`	GNINA search exhaustiveness (higher = slower, more thorough)
`--verbosity`	int	`1`	Logging: 0 = quiet, 1 = normal, 2 = debug
`--version`, `-v`	—	—	Show version and exit

Output Format

The output CSV is primarily ranked by Vina affinity (ascending = tighter predicted binding in kcal/mol), with CNNaffinity (pKd) used to break ties:

Column	Description
`rank`	Overall rank (1 = best predicted binder)
`pdb_id`	Target protein PDB ID
`pocket`	Binding pocket name (from P2Rank)
`pose_rank`	Pose rank within this pocket (from GNINA)
`cnn_score`	GNINA CNN confidence score (0–1, higher = more accurate pose)
`cnn_affinity`	GNINA CNN-predicted binding affinity (pKd, higher = tighter)
`vina_affinity`	AutoDock Vina scoring function affinity (kcal/mol, lower = tighter)
`hbond`	Number of hydrogen bonds (donor + acceptor)
`hydrophobic`	Number of hydrophobic contacts
`pi_stacking`	Number of π-stacking / cation-π interactions
`salt_bridge`	Number of salt bridges (anionic + cationic)
`total_interactions`	Sum of all interaction types

Example output:

rank,pdb_id,pocket,pose_rank,cnn_score,cnn_affinity,vina_affinity,hbond,hydrophobic,pi_stacking,salt_bridge,total_interactions
1,1EQG,pocket3,1,0.89,-7.2,-6.5,3,4,1,0,8
2,2HZI,pocket1,2,0.76,-6.8,-5.9,2,3,0,1,6
3,1EQG,pocket7,1,0.82,-6.5,-6.1,2,2,1,0,5

Project Structure

WTFDTB/
├── pyproject.toml               # Package metadata, dependencies, entry point
├── recipe/
│   └── meta.yaml                # Bioconda / Conda-Forge recipe
├── src/
│   └── wtfdtb/
│       ├── __init__.py           # Version string
│       ├── cli.py                # Typer CLI — screen command + all flags
│       ├── ligand_prep.py        # Phase 1: SMILES/SDF → protonated 3D PDBQT
│       ├── receptor_curation.py  # Phase 2: PDB → cleaned, protonated receptor
│       ├── pocket_detection.py   # Phase 3: P2Rank ML pocket prediction
│       ├── docking.py            # Phase 4: GNINA CNN-rescored docking
│       ├── post_dock.py          # Phase 5: ProLIF interactions + ranking
│       ├── pipeline.py           # Orchestrator: wires Phases 1–5
│       └── utils.py              # PDB fetcher, logging, shared helpers
├── tests/
│   └── ...
└── README.md

Tech Stack

Layer	Tool	Purpose
CLI	Typer	Type-hinted CLI with auto-generated `--help`
Ligand Protonation	Dimorphite-DL	pH-dependent protonation state enumeration
Cheminformatics	RDKit	3D conformer generation (ETKDGv3), MMFF94 minimisation
PDBQT Generation	Meeko	Gasteiger charges, torsion tree for AutoDock-family
PDB Parsing	Biopython	REMARK 465 parsing for quality gating
PDB Cleaning	pdb-tools	Strip HETATM, waters, alternate conformations
Structure Repair	PDBFixer (OpenMM)	Model missing heavy atoms
Receptor Protonation	PDB2PQR + PROPKA	Rigorous pKa-based protonation
Pocket Detection	P2Rank	ML-based pocket prediction (Java)
Docking	GNINA	CNN-rescored docking (superior to AutoDock Vina)
Interaction Fingerprints	ProLIF	H-bond, hydrophobic, π-stacking, salt bridge detection
Data	Pandas	Filtering, ranking, CSV export
Parallelism	`concurrent.futures`	ProcessPoolExecutor for receptors + docking

How It Works (In Detail)

Phase 1: Ligand Preparation

Read input ligand (SMILES string or SDF/MOL file)
Enumerate physiological protonation states at the target pH using Dimorphite-DL
Generate 3D coordinates using RDKit's ETKDGv3 algorithm
Energy-minimise with the MMFF94 force field
Convert to PDBQT format (Gasteiger charges + torsion tree) via Meeko

Phase 2: Receptor Curation

For each protein target (downloaded from RCSB or provided as local PDB):

Strip all HETATM records and water molecules using pdb-tools
Repair missing heavy atoms using PDBFixer (OpenMM)
Assign protonation states at physiological pH using PDB2PQR with PROPKA
Write the curated receptor PDB

This phase runs in parallel across all targets using ProcessPoolExecutor.

Phase 3: Pocket Detection

Run P2Rank on all curated receptors in batch mode
Parse P2Rank output to extract binding pocket centers (X, Y, Z coordinates)
Each pocket defines a docking search box for Phase 4

P2Rank uses machine learning (random forests on surface features) to detect druggable pockets without requiring known binding site templates.

Phase 4: Molecular Docking

For each (receptor, pocket) combination:

Build GNINA command-line arguments with pocket center and box size
Run GNINA with CNN rescoring enabled
Parse output SDF to extract per-pose CNNscore, CNNaffinity, and Vina affinity

This phase runs in parallel using ProcessPoolExecutor. GNINA uses convolutional neural networks trained on protein-ligand complexes to rescore docking poses, significantly outperforming classical scoring functions.

Phase 5: Post-Docking Analysis

CNNscore filter: Discard poses below the threshold (default 0.5)
Interaction profiling: Use ProLIF to compute protein-ligand interaction fingerprints (H-bonds, hydrophobic contacts, π-stacking, salt bridges, cation-π)
Interaction filter: Discard poses with fewer interactions than --min-interactions
Ranking: Sort remaining poses by Vina affinity (kcal/mol, ascending) then CNN affinity (pKd, descending)
Export: Write ranked results to CSV

Development

Setup

git clone https://github.com/ChandraguptSharma07/WTFDTB.git
cd WTFDTB
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Running Tests

pytest

Code Quality

ruff check src/
ruff format src/

Building the Conda Package

conda build recipe/

Supported Platforms

Platform	Status	Notes
Linux x86_64	✅ Supported	Primary platform. GNINA binary available via conda-forge.
macOS	⚠️ Partial	Python pipeline works; GNINA must be compiled from source.
Windows (WSL)	⚠️ Partial	Works through Windows Subsystem for Linux.

Citation

If you use WTFDTB in your research, please cite:

@software{wtfdtb2025,
  title  = {WTFDTB: High-Throughput Inverse Virtual Screening},
  author = {Chandragupt Sharma},
  year   = {2025},
  url    = {https://github.com/ChandraguptSharma07/WTFDTB}
}

And the key tools in the pipeline:

GNINA: McNutt et al. J. Cheminformatics 13, 43 (2021)
P2Rank: Krivák & Hoksza. J. Cheminformatics 10, 39 (2018)
ProLIF: Bouysset & Fiorucci. J. Cheminformatics 13, 72 (2021)
RDKit: rdkit.org

License

MIT — see LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

Mar 25, 2026

0.2.0

Mar 4, 2026

0.1.2

Mar 3, 2026

This version

0.1.1

Mar 3, 2026

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wtfdtb-0.1.1.tar.gz (597.4 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wtfdtb-0.1.1-py3-none-any.whl (25.7 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file wtfdtb-0.1.1.tar.gz.

File metadata

Download URL: wtfdtb-0.1.1.tar.gz
Upload date: Mar 3, 2026
Size: 597.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for wtfdtb-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3280919817f0d912a0f1750fa1309e225c39a782d96b0b954ddfaabf0ddfe0a8`
MD5	`93de48fa622c6ddd58e0cf4c834cac2c`
BLAKE2b-256	`3352bcbe2eaef936a963232e4c4c80bbad709604720ed2a54fc13c9edb7303ae`

See more details on using hashes here.

File details

Details for the file wtfdtb-0.1.1-py3-none-any.whl.

File metadata

Download URL: wtfdtb-0.1.1-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for wtfdtb-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`459eb9b9167d0e0c9e1b9f0efdf5182c349e9099f38a305b8f2c86d0e68c059c`
MD5	`1e541c5b64ac2137f0f9ebc6e4747adb`
BLAKE2b-256	`b06cdaf6dd420539a9021b9ee1f5cd82fa9f1584acc1a4a1477b580eee713bba`

See more details on using hashes here.

wtfdtb 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

WTFDTB — High-Throughput Inverse Virtual Screening

What Is This?

Pipeline Architecture

Phase Details

Installation

Option A: Conda / Mamba (Recommended)

Option B: From Source (Development)

External Dependencies

Quick Start

Basic Usage

Using PDB IDs from a Text File

Using a Directory of PDB Files

SMILES Input

CLI Reference

Output Format

Project Structure

Tech Stack

How It Works (In Detail)

Phase 1: Ligand Preparation

Phase 2: Receptor Curation

Phase 3: Pocket Detection

Phase 4: Molecular Docking

Phase 5: Post-Docking Analysis

Development

Setup

Running Tests

Code Quality

Building the Conda Package

Supported Platforms

Citation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes