Skip to main content

Multimodal Graph-Text Contrastive Learning for Drug Design

Project description

DrugCLIP: Multimodal Graph-Text Contrastive Learning for Drug Design 🧬✨

Python PyTorch RDKit

DrugCLIP is a deep-learning core package designed to perform contrastive alignment between 3D molecular structures and textual therapeutic/clinical descriptions. It powers AI drug discovery pipelines (like BioTarget) by scoring novel molecular geometries against clinical goals (binding affinity) and failure modes (toxicity).


🧩 Model Architecture

DrugCLIP maps text descriptions and 3D molecular point clouds into a shared 128-dimensional continuous latent space. It is designed to act as a surrogate multi-objective function, scoring binding potential and drug toxicity.

  • Graph Encoder: SchNet (3D Message Passing Neural Network), extracting features from atomic numbers (z) and 3D coordinates (pos).
  • Text Encoder: DistilBERT (distilbert-base-uncased), projecting clinical natural language queries into semantic embeddings.
  • Loss Function: InfoNCE (Contrastive Loss) matching paired batches of molecular geometries with their corresponding text records.

💾 Installation

To install DrugCLIP as a standalone pip package:

git clone https://github.com/your-org/drugclip.git
cd drugclip
pip install -e .

After installation, the CLI tool drugclip becomes globally available in your terminal.


📊 Dataset Preparation

DrugCLIP requires supervised data matching structures with clinical outcomes and textual descriptions. Out of the box, it supports preparing:

  1. MolTextNet (Structure-to-text mapping)
  2. TDC Tox21 (Experimental toxicity assays)
  3. TDC ClinTox (FDA clinical failure records)
  4. ChEMBL (Unlabeled chemical lookup libraries)

To download all datasets directly into data/:

drugclip data download all

⚡ High-Performance Pre-Training

DrugCLIP is optimized for HPC and single-node multi-GPU setups. The PyTorch data-loaders utilize asynchronous pinning, and operations natively use Automatic Mixed Precision (AMP) via torch.amp.autocast. 3D molecular geometry is computed utilizing all available CPU cores via ProcessPoolExecutor before GPU encoding.

To train the contrastive alignment model:

# Full training run
drugclip train align

# Or validate your hardware limits with synthetic data
drugclip train align --quick-validate

Checkpoints are automatically saved to runs/align/best.ckpt relative to your execution path.


🔬 Standalone Inference

You can run isolated drug-retrieval inference directly through the DrugCLIP CLI:

drugclip infer --goal-text "A highly selective and safe kinase inhibitor" --top-n 5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drugclip-0.1.1.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drugclip-0.1.1-py3-none-any.whl (3.0 kB view details)

Uploaded Python 3

File details

Details for the file drugclip-0.1.1.tar.gz.

File metadata

  • Download URL: drugclip-0.1.1.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for drugclip-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dc81c6ac96d26f80e57dff385ba5eb2e5ea499df70cf56190e1471cb25507937
MD5 abf34d9e00b436e72d35830a680abd05
BLAKE2b-256 c35a5043324b3ca9a0f593b6b0e84bccb1776d3bd6547a3ed046534fab522206

See more details on using hashes here.

File details

Details for the file drugclip-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: drugclip-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for drugclip-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 379a34ab8c8fba59584dbe1b96d82d365b975011ac77fb6f680176abf96d3d35
MD5 bb24adcb99882f390a6f938861afe8ec
BLAKE2b-256 13c21348237cbbe726e1cd33c148435c049b075fdea796349c664eb79713e05d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page