Skip to main content

High-Throughput Cross-Docking (NxM) — cross-dock massive ligand libraries against entire protein libraries via GNINA.

Project description

WTFDTB — High-Throughput Cross-Docking

NxM Screening: Instantly cross-dock entire small-molecule libraries against libraries of macromolecular protein structures using a state-of-the-art ML/DL stack.

Python 3.10+ License: MIT Status: Stable


What Is This?

Traditional virtual screening docks many ligands against one protein target. WTFDTB is a High-Throughput Cross-Docking (NxM) engine: it can dock many ligands against one protein to answer "Which drug binds to this protein?", or it can dock one ligand against many proteins to answer "What targets does this drug bind?", or it can dock massive ligand libraries against multiple targets simultaneously.

This NxM High-Throughput Cross-Docking capability is essential for:

  • High-Throughput Screening (HTS) — discovering novel hits from massive vendor libraries
  • Drug repurposing — finding new uses for existing drugs across multiple targets
  • Off-target prediction & Tox — identifying potential side effects and cross-reactivity
  • Polypharmacology — designing or understanding multi-target drug activity

WTFDTB automates the entire workflow from a raw ligand file to a ranked CSV of protein targets with interaction fingerprints — no manual intervention needed.


Pipeline Architecture

The pipeline runs in 5 sequential phases:

  ┌──────────────┐    ┌────────────────────┐    ┌──────────────────┐
  │  1. Ligand   │───▶│  2. Receptor       │───▶│  3. Pocket       │
  │     Prep     │    │     Curation       │    │     Detection    │
  │              │    │     (parallel)      │    │                  │
  │ Dimorphite-DL│    │ PDBFixer + PDB2PQR │    │     P2Rank       │
  │ RDKit + Meeko│    │ + PROPKA + Meeko   │    │     (Java ML)    │
  └──────────────┘    └────────────────────┘    └──────────────────┘
                                                         │
         ┌───────────────────────────────────────────────┘
         ▼
  ┌──────────────────┐    ┌──────────────────────┐
  │  4. Docking      │───▶│  5. Post-Docking      │
  │     (parallel)   │    │     Analysis           │
  │                  │    │                        │
  │     GNINA        │    │ ProLIF + Pandas        │
  │  (CPU / GPU)     │    │ Filter → Rank → CSV   │
  └──────────────────┘    └──────────────────────┘

Phase Details

Phase Module Tools What It Does
1. Ligand Prep ligand_prep.py Dimorphite-DL, RDKit, Meeko Parses massive multi-molecule .mol2/.sdf libraries, enumerates protonation states, generates 3D conformers, and retains a 2D SMILES template for zero-data-loss post-docking.
2. Receptor Curation receptor_curation.py PDBFixer, PDB2PQR, PROPKA, RDKit Download PDB, repair missing heavy atoms, protonate at target pH. Includes Intelligent Cofactor Recovery (preserves whitelisted metals and cofactors).
3. Pocket Detection pocket_detection.py P2Rank (Java) ML-based cavity prediction — no template bias, detects all druggable sites per protein.
4. Docking docking.py GNINA (C++) CNN-rescored molecular docking for each pocket × ligand combination. Supports CPU and GPU acceleration.
5. Post-Docking post_dock.py ProLIF, Pandas Template-Based Molecule Reconstruction rebuilds broken topologies for perfect interaction fingerprints. Filters, ranks, and exports NxM CSV matrices.

Installation

From PyPI (Recommended)

pip install wtfdtb

Setup (External Binaries)

WTFDTB requires GNINA and P2Rank. You can set them up automatically:

wtfdtb install

This command will:

  1. Download pre-compiled Linux binaries for both GNINA-CPU and GNINA-CUDA (v1.3.2).
  2. Download and extract P2Rank.
  3. Place them in ~/.local/.
  4. Automatically configure your PATH by updating your .bashrc.

Note: P2Rank requires Java ≥ 11 (sudo apt install default-jre on Ubuntu).


Quick Start

1. Create a Target List

Create a file named targets.txt with PDB IDs or paths to .pdb files:

1IEP
1PXX

2. Run the Screen

Standard (CPU):

wtfdtb screen --ligand library.smi --targets targets.txt --output-dir my_results

High Performance (GPU):

# Requires NVIDIA GPU + CUDA 12
wtfdtb screen --ligand library.smi --targets targets.txt --output-dir my_results --gpu

CLI Reference

wtfdtb screen [OPTIONS]
Flag Type Default Description
--ligand, -l Path required Input ligand file (.sdf, .mol, .mol2, .smi). Native support for multi-molecule libraries.
--targets, -t Path required Directory of .pdb files or text file of PDB IDs
--output-dir, -o Path results/ Output directory for individual ranked CSVs and 2D Matrix Pivot Tables
--ph float 7.4 Physiological pH for protonation
--box-size int 25 Side length (Å) of the cubic docking box
--cnn-model str default GNINA CNN model (default, dense)
--cnn-score-threshold float 0.5 Minimum CNNscore (0–1) to accept a pose
--min-interactions int 1 Minimum interactions to keep a pose
--workers, -w int CPU count Parallel workers for curation and docking
--exhaustiveness int 8 GNINA search thoroughness (higher = slower)
--gpu bool False Enable GPU acceleration (requires CUDA version)
--keep-hetatm str None Comma-separated 3-letter codes to preserve (e.g. 'SAM,LIG')
--verbosity int 1 Logging: 0=quiet, 1=normal, 2=debug

Output Format

The output CSV is ranked primarily by Vina affinity (lower is better), with CNNaffinity used to break ties:

Column Description
rank Overall rank (1 = best binder)
pdb_id Target protein PDB ID
pocket Cavity name (from P2Rank)
pose_rank Pose rank within this pocket (from GNINA)
cnn_score Neural network confidence (0–1)
cnn_affinity Predicted binding affinity (pKd)
vina_affinity Empirical scoring affinity (kcal/mol)
hbond Number of hydrogen bonds
hydrophobic Number of hydrophobic contacts
pi_stacking Number of π-stacking interactions
salt_bridge Number of salt bridges
total_interactions Sum of all interaction types

Python API

from pathlib import Path
from wtfdtb.pipeline import run_pipeline

results_dir = run_pipeline(
    ligand_path=Path("multi-ligand-library.smi"),
    targets_path=Path("targets.txt"),
    output_dir=Path("my_results_folder"),
    use_gpu=True,
    workers=4
)

Supported Platforms

Platform Status Notes
Linux x86_64 ✅ Supported Primary platform. Binaries auto-installed via wtfdtb install.
Windows (WSL) ✅ Supported Works flawlessly via Windows Subsystem for Linux.
Kaggle / Colab ✅ Supported Verified working. GPU T4 acceleration supported.
macOS ⚠️ Partial Python pipeline works; GNINA must be compiled from source.

Citation

If you use WTFDTB in your research, please cite:

@software{wtfdtb2026,
  title  = {WTFDTB: High-Throughput Inverse Virtual Screening},
  author = {Chandragupt Sharma},
  year   = {2026},
  url    = {https://github.com/ChandraguptSharma07/WTFDTB}
}

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wtfdtb-0.3.1.tar.gz (35.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wtfdtb-0.3.1-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file wtfdtb-0.3.1.tar.gz.

File metadata

  • Download URL: wtfdtb-0.3.1.tar.gz
  • Upload date:
  • Size: 35.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for wtfdtb-0.3.1.tar.gz
Algorithm Hash digest
SHA256 a05fdf43867683bc62ba485fa9e9c6fa91188ef74a49c8c27a25cc8aa9206b41
MD5 39b54c82642f05854d69a18c262e7ee2
BLAKE2b-256 7d060b05bd96904214e5bc8b48312240c6fbda361f25c533d04a7c9a83338f50

See more details on using hashes here.

File details

Details for the file wtfdtb-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: wtfdtb-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for wtfdtb-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0e08a437e2a62de7336eb8bf61898db17f3f50a70381a6e69c14738a36ba4096
MD5 11dc7bc98b62ec99830ae3d06acdf51c
BLAKE2b-256 1b1ebe003f73b568ffc52c249e35a4d4893fe8a30a63954f0c4f62af0cfbdd5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page