Skip to main content

A deep learning model for protein-ligand binding affinity prediction

Project description

DockTDeep

Preprint: "Data-centric training enables meaningful interaction learning in protein–ligand binding affinity prediction." ChemRXiv.

💾 Installation

[!TIP] Always use a virtual environment to manage dependencies.

python -m venv .venv
source .venv/bin/activate

Using pip

Quick setup for inference. Install the package directly from PyPI:

pip install docktdeep

Using Docker

For a containerized setup that requires no Python environment:

# clone the repository (needed for the model checkpoint)
git clone https://github.com/gmmsb-lncc/docktdeep.git
cd docktdeep

# build the image and install the wrapper script
./install.sh

This installs a docktdeep command system-wide. Run it from any directory containing your data files:

docktdeep --proteins protein.pdb --ligands ligand.mol2 --output-csv results.csv

# with GPU support
docktdeep --gpu --proteins protein.pdb --ligands ligand.mol2 --output-csv results.csv

[!NOTE] The wrapper automatically mounts the current working directory into the container. All input files must be in the current directory and output files will be written there.

🚀 Quick start

Basic usage

Predict binding affinities for protein-ligand pairs (predictions are given in kcal/mol).

# single protein-ligand pair
docktdeep predict --proteins protein.pdb --ligands ligand.pdb --output-csv results.csv

# multiple pairs
docktdeep predict \
    --proteins protein1.pdb protein2.pdb \
    --ligands ligand1.pdb ligand2.pdb \
    --output-csv results.csv \
    --max-batch-size 16

# single protein with multiple ligands (protein auto-replicated)
docktdeep predict \
    --proteins protein.pdb \
    --ligands ligand1.mol2 ligand2.mol2 ligand3.mol2 \
    --output-csv results.csv

# multi-mol2 file (e.g., docking output with multiple poses)
docktdeep predict \
    --proteins protein.pdb \
    --ligands docked_poses.mol2 \
    --output-csv results.csv

# options available in help
docktdeep predict --help

[!TIP] When using a single protein with multiple ligands, the protein is automatically replicated — no need to repeat the protein path. Multi-mol2 files (common output from docking programs like AutoDock Vina or GOLD) are automatically split into individual molecules.

[!TIP] Use shell globbing patterns to process multiple files efficiently.

# using regex expansion
docktdeep predict \
   --proteins $(ls path/to/proteins/*_protein.pdb) \
   --ligands $(ls path/to/ligands/*_ligand.pdb)

# another example using find command for more complex patterns
docktdeep predict \
   --proteins $(find /data/complexes -name "*_protein_prep.pdb" | sort) \
   --ligands $(find /data/complexes -name "*_ligand_rnum.pdb" | sort)

⚙️ Development setup

For development and training custom models:

# clone the repository
git clone https://github.com/gmmsb-lncc/docktdeep.git
cd docktdeep

# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# install deps
python -m pip install -r requirements.txt

# run tests to verify installation
python -m pytest tests/

Training models

Initialize a new aim repository for tracking experiments:

aim init

# to start the aim server
aim server

To see all available training options:

python train.py --help

Train a model with optimized hyperparameters:

python train.py \
    --model Baseline \
    --experiment experiment-name \
    --depthwise-convs \
    --adaptive-pooling \
    --optim AdamW \
    --max-epochs 1500 \
    --batch-size 64 \
    --lr 0.00087469 \
    --beta1 0.25693012 \
    --eps 0.00032933 \
    --dropout 0.25348994 \
    --wdecay 0.0000169 \
    --molecular-dropout 0.06 \
    --molecular-dropout-unit complex \
    --random-rotation \
    --dataframe-path path/to/dataframe.csv \
    --root-dir path/to/data/PDBbind2020 \
    --ligand-path-pattern "{c}/{c}_ligand_rnum.pdb" \
    --protein-path-pattern "{c}/{c}_protein_prep.pdb" \
    --split-column random_split

📝 Citation

If you use DockTDeep in your research, please cite:

@article{dasilva2025docktdeep,
  title={Data-centric training enables meaningful interaction learning in protein--ligand binding affinity prediction},
  author={da Silva, Matheus M. P. and Vidal, Lincon and Guedes, Isabella and de Magalh{\~a}es, Camila and Cust{\'o}dio, F{\'a}bio and Dardenne, Laurent},
  year={2025}
}

Related

  • DockTGrid: a python package for generating deep learning-ready voxel grids of molecular complexes. GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docktdeep-0.2.0.tar.gz (315.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docktdeep-0.2.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file docktdeep-0.2.0.tar.gz.

File metadata

  • Download URL: docktdeep-0.2.0.tar.gz
  • Upload date:
  • Size: 315.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for docktdeep-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ed5a8e94dd2bbe6fc6ae48b4087916d5a072ad970749cedb1ee9a475f5c620b7
MD5 c3cd0b945adf410b3e4424e2fc3c3e0b
BLAKE2b-256 171a91da536f5064ac986217a05299419f3db9ed2056342dcae36f39043a2110

See more details on using hashes here.

File details

Details for the file docktdeep-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: docktdeep-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for docktdeep-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f7f1e6bdbe0c1215b4daab3963dfdee2c2adb716476b058e9196aecd9089c1d
MD5 ae7aaea7f728c08e4d0f9d05d133e20c
BLAKE2b-256 4fabf7a8c178f237c6b6ced465ec373fc67ed7b33313d5dd8a7919bef4614208

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page