Multimodal Graph-Text Contrastive Learning for Drug Design
Project description
DrugCLIP: Multimodal Graph-Text Contrastive Learning for Drug Design 🧬✨
DrugCLIP is a deep-learning core package designed to perform contrastive alignment between 3D molecular structures and textual therapeutic/clinical descriptions. It powers AI drug discovery pipelines (like BioTarget) by scoring novel molecular geometries against clinical goals (binding affinity) and failure modes (toxicity).
🧩 Model Architecture
DrugCLIP maps text descriptions and 3D molecular point clouds into a shared 128-dimensional continuous latent space. It is designed to act as a surrogate multi-objective function, scoring binding potential and drug toxicity.
- Graph Encoder: SchNet (3D Message Passing Neural Network), extracting features from atomic numbers (
z) and 3D coordinates (pos). - Text Encoder: DistilBERT (
distilbert-base-uncased), projecting clinical natural language queries into semantic embeddings. - Loss Function: InfoNCE (Contrastive Loss) matching paired batches of molecular geometries with their corresponding text records.
💾 Installation
To install DrugCLIP as a standalone pip package:
git clone https://github.com/your-org/drugclip.git
cd drugclip
pip install -e .
After installation, the CLI tool drugclip becomes globally available in your terminal.
📊 Dataset Preparation
DrugCLIP requires supervised data matching structures with clinical outcomes and textual descriptions. Out of the box, it supports preparing:
- MolTextNet (Structure-to-text mapping)
- TDC Tox21 (Experimental toxicity assays)
- TDC ClinTox (FDA clinical failure records)
- ChEMBL (Unlabeled chemical lookup libraries)
To download all datasets directly into data/:
drugclip data download all
⚡ High-Performance Pre-Training
DrugCLIP is optimized for HPC and single-node multi-GPU setups. The PyTorch data-loaders utilize asynchronous pinning, and operations natively use Automatic Mixed Precision (AMP) via torch.amp.autocast. 3D molecular geometry is computed utilizing all available CPU cores via ProcessPoolExecutor before GPU encoding.
To train the contrastive alignment model:
# Full training run
drugclip train align
# Or validate your hardware limits with synthetic data
drugclip train align --quick-validate
Checkpoints are automatically saved to runs/align/best.ckpt relative to your execution path.
🔬 Standalone Inference
You can run isolated drug-retrieval inference directly through the DrugCLIP CLI:
drugclip infer --goal-text "A highly selective and safe kinase inhibitor" --top-n 5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drugclip-0.1.2.tar.gz.
File metadata
- Download URL: drugclip-0.1.2.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a78bf66aa6845d5f106179d0def3da84942d9925f04410b91b7aca6a22bfbc7
|
|
| MD5 |
4ba4d235ece059fea19b06edfdba3ccf
|
|
| BLAKE2b-256 |
422a9f8c1fee36d944aef3f71de8547dc3db074d084f7b55522a95f08c7ec56a
|
File details
Details for the file drugclip-0.1.2-py3-none-any.whl.
File metadata
- Download URL: drugclip-0.1.2-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7080cbbb25c06ac7db418c80fc36b45e5d7ca2d864644e4db277affaa4ecc597
|
|
| MD5 |
be1a0092b32f47105ff5772842c4d4bf
|
|
| BLAKE2b-256 |
51ae147f49986c1bcd49b8e8fae76f9884b6d39d9892a792fbd9b24f4eb0e523
|