Skip to main content

Adapting protein language models and contrastive learning for DTI prediction.

Project description

ConPLex

ConPLex Schematic

ConPLex Releases PyPI Build Documentation Status License Code style: black

🚧🚧 Please note that ConPLex is currently a pre-release and is actively being developed. For the code used to generate our PNAS results, see the manuscript code 🚧🚧

Abstract

Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance on one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pre-trained protein language models ("PLex") and employing a novel protein-anchored contrastive co-embedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with sub-nanomolar affinity, plus a novel strongly-binding EPHB1 inhibitor ($K_D = 1.3nM$). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate ConPLex will facilitate novel drug discovery by making highly sensitive in-silico drug screening feasible at genome scale.

Installation

Install from PyPI

You should first have a version of cudatoolkit compatible with your system installed. Then run

pip install conplex-dti
conplex-dti --help

Compile from Source

git clone https://github.com/samsledje/ConPLex.git
cd ConPLex
conda create -n conplex-dti python=3.9
conda activate conplex-dti
make poetry-download
export PATH="[poetry  install  location]:$PATH"
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
make install
conplex-dti --help

Usage

Download benchmark data sets and pre-trained models

conplex-dti download --to datasets --benchmarks davis bindingdb biosnap biosnap_prot biosnap_mol dude
conplex-dti download --to . --models ConPLex_v1_BindingDB

Run benchmark training

conplex-dti train --run-id TestRun --config config/default_config.yaml

Make predictions with a trained model

conplex-dti predict --data-file [pair predict file].tsv --model-path ./models/ConPLex_v1_BindingDB.pt --outfile ./results.tsv

Format of [pair predict file].tsv should be [protein ID]\t[molecule ID]\t[protein Sequence]\t[molecule SMILES]

Visualize co-embedding space

...

Reference

If you use ConPLex, please cite Contrastive learning in protein language space predicts interactions between drugs and protein targets by Rohit Singh*, Samuel Sledzieski*, Bryan Bryson, Lenore Cowen and Bonnie Berger.

@article{singh2023contrastive,
  title={Contrastive learning in protein language space predicts interactions between drugs and protein targets},
  author={Singh, Rohit and Sledzieski, Samuel and Bryson, Bryan and Cowen, Lenore and Berger, Bonnie},
  journal={Proceedings of the National Academy of Sciences},
  volume={120},
  number={24},
  pages={e2220778120},
  year={2023},
  publisher={National Acad Sciences}
}

Thanks to Ava Amini, Kevin Yang, and Sevahn Vorperian from MSR New England for suggesting the use of the triplet distance contrastive loss function without the sigmoid activation. The default has now been changed. For the original formulation with the sigmoid activation, you can set the --use-sigmoid-cosine flag during training.

Manuscript Code

Code used to generate results in the manuscript can be found in the development repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conplex_dti-0.1.12.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

conplex_dti-0.1.12-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file conplex_dti-0.1.12.tar.gz.

File metadata

  • Download URL: conplex_dti-0.1.12.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.9.18 Linux/5.4.0-164-generic

File hashes

Hashes for conplex_dti-0.1.12.tar.gz
Algorithm Hash digest
SHA256 afa5f7fe0a33af2d4588de6496258daf8d759ec63ec645afc85b764c2b38e8fd
MD5 d1464c18fb8f472619fc8efc4ef7677f
BLAKE2b-256 724980936e8366c26b9631be7f7c5b4106bc7c8d507a09e4c127a4b5df27b34f

See more details on using hashes here.

File details

Details for the file conplex_dti-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: conplex_dti-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.9.18 Linux/5.4.0-164-generic

File hashes

Hashes for conplex_dti-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 fe42bcd9dfe3c77428ae000a4ba04939fbb53a3e8426445874a9ec9e68663703
MD5 43d51673a0ef2ea0dda55142eac979b7
BLAKE2b-256 490e3377791cbba20ee03bb5304a0a3b54e983c7890445664765035f99babaa6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page