High-Throughput Cross-Docking (NxM) — cross-dock massive ligand libraries against entire protein libraries via GNINA.
Project description
WTFDTB — High-Throughput Cross-Docking
NxM Screening: Instantly cross-dock entire small-molecule libraries against libraries of macromolecular protein structures using a state-of-the-art ML/DL stack.
What Is This?
Traditional virtual screening docks many ligands against one protein target. WTFDTB is a High-Throughput Cross-Docking (NxM) engine: it can dock many ligands against one protein to answer "Which drug binds to this protein?", or it can dock one ligand against many proteins to answer "What targets does this drug bind?", or it can dock massive ligand libraries against multiple targets simultaneously.
This NxM High-Throughput Cross-Docking capability is essential for:
- High-Throughput Screening (HTS) — discovering novel hits from massive vendor libraries
- Drug repurposing — finding new uses for existing drugs across multiple targets
- Off-target prediction & Tox — identifying potential side effects and cross-reactivity
- Polypharmacology — designing or understanding multi-target drug activity
WTFDTB automates the entire workflow from a raw ligand file to a ranked CSV of protein targets with interaction fingerprints — no manual intervention needed.
Pipeline Architecture
The pipeline runs in 5 sequential phases:
┌──────────────┐ ┌────────────────────┐ ┌──────────────────┐
│ 1. Ligand │───▶│ 2. Receptor │───▶│ 3. Pocket │
│ Prep │ │ Curation │ │ Detection │
│ │ │ (parallel) │ │ │
│ Dimorphite-DL│ │ PDBFixer + PDB2PQR │ │ P2Rank │
│ RDKit + Meeko│ │ + PROPKA + Meeko │ │ (Java ML) │
└──────────────┘ └────────────────────┘ └──────────────────┘
│
┌───────────────────────────────────────────────┘
▼
┌──────────────────┐ ┌──────────────────────┐
│ 4. Docking │───▶│ 5. Post-Docking │
│ (parallel) │ │ Analysis │
│ │ │ │
│ GNINA │ │ ProLIF + Pandas │
│ (CPU / GPU) │ │ Filter → Rank → CSV │
└──────────────────┘ └──────────────────────┘
Phase Details
| Phase | Module | Tools | What It Does |
|---|---|---|---|
| 1. Ligand Prep | ligand_prep.py |
Dimorphite-DL, RDKit, Meeko | Parses massive multi-molecule .mol2/.sdf libraries, enumerates protonation states, generates 3D conformers, and retains a 2D SMILES template for zero-data-loss post-docking. |
| 2. Receptor Curation | receptor_curation.py |
PDBFixer, PDB2PQR, PROPKA, RDKit | Download PDB, repair missing heavy atoms, protonate at target pH. Includes Intelligent Cofactor Recovery (preserves whitelisted metals and cofactors). |
| 3. Pocket Detection | pocket_detection.py |
P2Rank (Java) | ML-based cavity prediction — no template bias, detects all druggable sites per protein. |
| 4. Docking | docking.py |
GNINA (C++) | CNN-rescored molecular docking for each pocket × ligand combination. Supports CPU and GPU acceleration. |
| 5. Post-Docking | post_dock.py |
ProLIF, Pandas | Template-Based Molecule Reconstruction rebuilds broken topologies for perfect interaction fingerprints. Filters, ranks, and exports NxM CSV matrices. |
Installation
From PyPI (Recommended)
pip install wtfdtb
Setup (External Binaries)
WTFDTB requires GNINA and P2Rank. You can set them up automatically:
wtfdtb install
This command will:
- Download pre-compiled Linux binaries for both GNINA-CPU and GNINA-CUDA (v1.3.2).
- Download and extract P2Rank.
- Place them in
~/.local/. - Automatically configure your PATH by updating your
.bashrc.
Note: P2Rank requires Java ≥ 11 (sudo apt install default-jre on Ubuntu).
Quick Start
1. Create a Target List
Create a file named targets.txt with PDB IDs or paths to .pdb files:
1IEP
1PXX
2. Run the Screen
Standard (CPU):
wtfdtb screen --ligand library.smi --targets targets.txt --output-dir my_results
High Performance (GPU):
# Requires NVIDIA GPU + CUDA 12
wtfdtb screen --ligand library.smi --targets targets.txt --output-dir my_results --gpu
CLI Reference
wtfdtb screen [OPTIONS]
| Flag | Type | Default | Description |
|---|---|---|---|
--ligand, -l |
Path | required | Input ligand file (.sdf, .mol, .mol2, .smi). Native support for multi-molecule libraries. |
--targets, -t |
Path | required | Directory of .pdb files or text file of PDB IDs |
--output-dir, -o |
Path | results/ |
Output directory for individual ranked CSVs and 2D Matrix Pivot Tables |
--ph |
float | 7.4 |
Physiological pH for protonation |
--box-size |
int | 25 |
Side length (Å) of the cubic docking box |
--cnn-model |
str | default |
GNINA CNN model (default, dense) |
--cnn-score-threshold |
float | 0.5 |
Minimum CNNscore (0–1) to accept a pose |
--min-interactions |
int | 1 |
Minimum interactions to keep a pose |
--workers, -w |
int | CPU count | Parallel workers for curation and docking |
--exhaustiveness |
int | 8 |
GNINA search thoroughness (higher = slower) |
--gpu |
bool | False |
Enable GPU acceleration (requires CUDA version) |
--keep-hetatm |
str | None |
Comma-separated 3-letter codes to preserve (e.g. 'SAM,LIG') |
--verbosity |
int | 1 |
Logging: 0=quiet, 1=normal, 2=debug |
Output Format
The output CSV is ranked primarily by Vina affinity (lower is better), with CNNaffinity used to break ties:
| Column | Description |
|---|---|
rank |
Overall rank (1 = best binder) |
pdb_id |
Target protein PDB ID |
pocket |
Cavity name (from P2Rank) |
pose_rank |
Pose rank within this pocket (from GNINA) |
cnn_score |
Neural network confidence (0–1) |
cnn_affinity |
Predicted binding affinity (pKd) |
vina_affinity |
Empirical scoring affinity (kcal/mol) |
hbond |
Number of hydrogen bonds |
hydrophobic |
Number of hydrophobic contacts |
pi_stacking |
Number of π-stacking interactions |
salt_bridge |
Number of salt bridges |
total_interactions |
Sum of all interaction types |
Python API
from pathlib import Path
from wtfdtb.pipeline import run_pipeline
results_dir = run_pipeline(
ligand_path=Path("multi-ligand-library.smi"),
targets_path=Path("targets.txt"),
output_dir=Path("my_results_folder"),
use_gpu=True,
workers=4
)
Supported Platforms
| Platform | Status | Notes |
|---|---|---|
| Linux x86_64 | ✅ Supported | Primary platform. Binaries auto-installed via wtfdtb install. |
| Windows (WSL) | ✅ Supported | Works flawlessly via Windows Subsystem for Linux. |
| Kaggle / Colab | ✅ Supported | Verified working. GPU T4 acceleration supported. |
| macOS | ⚠️ Partial | Python pipeline works; GNINA must be compiled from source. |
Citation
If you use WTFDTB in your research, please cite:
@software{wtfdtb2026,
title = {WTFDTB: High-Throughput Inverse Virtual Screening},
author = {Chandragupt Sharma},
year = {2026},
url = {https://github.com/ChandraguptSharma07/WTFDTB}
}
License
MIT — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wtfdtb-0.3.1.tar.gz.
File metadata
- Download URL: wtfdtb-0.3.1.tar.gz
- Upload date:
- Size: 35.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a05fdf43867683bc62ba485fa9e9c6fa91188ef74a49c8c27a25cc8aa9206b41
|
|
| MD5 |
39b54c82642f05854d69a18c262e7ee2
|
|
| BLAKE2b-256 |
7d060b05bd96904214e5bc8b48312240c6fbda361f25c533d04a7c9a83338f50
|
File details
Details for the file wtfdtb-0.3.1-py3-none-any.whl.
File metadata
- Download URL: wtfdtb-0.3.1-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e08a437e2a62de7336eb8bf61898db17f3f50a70381a6e69c14738a36ba4096
|
|
| MD5 |
11dc7bc98b62ec99830ae3d06acdf51c
|
|
| BLAKE2b-256 |
1b1ebe003f73b568ffc52c249e35a4d4893fe8a30a63954f0c4f62af0cfbdd5f
|