Skip to main content

Multimodal Graph-Text Contrastive Learning for Drug Design

Project description

DrugCLIP: Multimodal Graph-Text Contrastive Learning for Drug Design 🧬✨

Python PyTorch RDKit

DrugCLIP is a deep-learning core package designed to perform contrastive alignment between 3D molecular structures and textual therapeutic/clinical descriptions. It powers AI drug discovery pipelines (like BioTarget) by scoring novel molecular geometries against clinical goals (binding affinity) and failure modes (toxicity).


🧩 Model Architecture

DrugCLIP maps text descriptions and 3D molecular point clouds into a shared 128-dimensional continuous latent space. It is designed to act as a surrogate multi-objective function, scoring binding potential and drug toxicity.

  • Graph Encoder: SchNet (3D Message Passing Neural Network), extracting features from atomic numbers (z) and 3D coordinates (pos).
  • Text Encoder: DistilBERT (distilbert-base-uncased), projecting clinical natural language queries into semantic embeddings.
  • Loss Function: InfoNCE (Contrastive Loss) matching paired batches of molecular geometries with their corresponding text records.

💾 Installation

To install DrugCLIP as a standalone pip package:

git clone https://github.com/your-org/drugclip.git
cd drugclip
pip install -e .

After installation, the CLI tool drugclip becomes globally available in your terminal.


📊 Dataset Preparation

DrugCLIP requires supervised data matching structures with clinical outcomes and textual descriptions. Out of the box, it supports preparing:

  1. MolTextNet (Structure-to-text mapping)
  2. TDC Tox21 (Experimental toxicity assays)
  3. TDC ClinTox (FDA clinical failure records)
  4. ChEMBL (Unlabeled chemical lookup libraries)

To download all datasets directly into data/:

drugclip data download all

⚡ High-Performance Pre-Training

DrugCLIP is optimized for HPC and single-node multi-GPU setups. The PyTorch data-loaders utilize asynchronous pinning, and operations natively use Automatic Mixed Precision (AMP) via torch.amp.autocast. 3D molecular geometry is computed utilizing all available CPU cores via ProcessPoolExecutor before GPU encoding.

To train the contrastive alignment model:

# Full training run
drugclip train align

# Or validate your hardware limits with synthetic data
drugclip train align --quick-validate

Checkpoints are automatically saved to runs/align/best.ckpt relative to your execution path.


🔬 Standalone Inference

You can run isolated drug-retrieval inference directly through the DrugCLIP CLI:

drugclip infer --goal-text "A highly selective and safe kinase inhibitor" --top-n 5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drugclip-0.1.2.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drugclip-0.1.2-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file drugclip-0.1.2.tar.gz.

File metadata

  • Download URL: drugclip-0.1.2.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for drugclip-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3a78bf66aa6845d5f106179d0def3da84942d9925f04410b91b7aca6a22bfbc7
MD5 4ba4d235ece059fea19b06edfdba3ccf
BLAKE2b-256 422a9f8c1fee36d944aef3f71de8547dc3db074d084f7b55522a95f08c7ec56a

See more details on using hashes here.

File details

Details for the file drugclip-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: drugclip-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for drugclip-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7080cbbb25c06ac7db418c80fc36b45e5d7ca2d864644e4db277affaa4ecc597
MD5 be1a0092b32f47105ff5772842c4d4bf
BLAKE2b-256 51ae147f49986c1bcd49b8e8fae76f9884b6d39d9892a792fbd9b24f4eb0e523

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page