BioTarget: AI Drug Discovery Pipeline. Requires NVIDIA GPU and Docker for GNINA docking. Run ./scripts/install_gnina_docker.sh before use.

These details have not been verified by PyPI

Project description

BioTarget: End-to-End AI Drug Discovery Pipeline 🧬💊

BioTarget is a state-of-the-art, open-source CLI pipeline designed to accelerate the early stages of the AI drug-discovery workflow. It seamlessly links target discovery, 3D protein structure prediction, deep-learning-based contrastive molecular screening, and physics-based CNN docking into a single cohesive framework.

The pipeline leverages DrugCLIP (a dual-encoder graph-text architecture) to act as a generative filter for toxicity and therapeutic intent, and gnina for structure-aware binding affinity predictions.

After install, simply use it by one command:

python biotarget/cli.py run full \
  --disease "Alzheimer" --top-ligands 20

For more info, visit BioTarget on GitHub.

🎯 The Pipeline Architecture

BioTarget executes a 5-stage workflow designed for rapid, iterative drug discovery:

1. Stage A: Disease $\rightarrow$ Target Ranking

Retrieves and ranks disease-relevant protein targets by querying extensive biomedical knowledge graphs.

Sources: Open Targets Platform, DisGeNET, STRING, Reactome.
Methodology: Ranks protein targets via heterogeneous Graph Neural Networks (GNN) and biological pathway evidence mapping.

2. Stage B: Protein Structure Generation

Fetches or predicts the 3D conformation of the selected target proteins.

Primary: Experimental structures (PDB).
Generator: OpenFold-3 for de novo prediction of variants, mutants, or unmapped isoforms.

3. Stage C: Generative AI & Candidate Extraction

Instead of blindly docking massive lookup libraries (like ChEMBL), BioTarget employs a highly optimized generative filtering approach.

DrugCLIP Guidance: Thousands of virtual compounds are geometrically folded on the CPU array. DrugCLIP encodes a textual representation of the disease and isolates the Top 10× geometrically/semantically aligned molecular structures.

4. Stage D: Multi-Objective Binding & Toxicity Evaluation

Evaluates candidates simultaneously for efficacy (physics/CNN docking) and safety (latent space contrastive geometry).

Binding Evaluation (gnina): Generates 3D structural Spatial Data Files (.sdf) via RDKit and calls the actual gnina subprocess. Evaluates ligand-receptor binding affinity using Convolutional Neural Networks on voxelized binding sites.
Toxicity Penalty (DrugCLIP): Computes semantic embedding for clinical failure and calculates the normalized Cosine Similarity against the ligand's structural embedding.

5. Stage E: Ranking & Reporting

Final Ranking: $\mathcal{S}{final} = \mathcal{S}{binding} - (0.5 \cdot \mathcal{S}_{tox})$ Aggregates hits, flags highly toxic compounds (⚠️), and outputs a ranked manifest of candidate SMILES ready for Molecular Dynamics (MD) refinement via OpenMM.

🚀 Installation & Setup

BioTarget requires Python 3.9+ and leverages PyTorch for its deep learning models. Follow these steps to get a fully functioning environment.

1. Base Installation

# Clone the repository
git clone https://github.com/homerquan/biotarget.git
cd biotarget

# Create and activate a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install the base dependencies
pip install -r requirements.txt

2. Install DrugCLIP (Required)

BioTarget relies on a specialized, multi-modal package called drugclip to handle the graph-text contrastive filtering.

pip install git+https://github.com/homerquan/drugclip.git

(Note: If drugclip is not yet public, you will need the appropriate SSH keys or access tokens configured on your machine, or you must place the package locally in your PYTHONPATH)

3. External Dependencies (Required)

Due to licensing and packaging constraints for massive C++ binaries, gnina requires Docker to be installed and running on your system, and it also requires an NVIDIA GPU for high-performance physics-based molecular docking.

GNINA (Physics-Based Binding Evaluation via Docker)

For Stage D to execute high-accuracy CNN molecular docking, BioTarget automatically manages gnina using its official Docker container. You do not need to install the gnina binary manually; the pipeline will automatically pull and execute gnina/gnina:latest.

Hardware & Software Requirements:

Docker is mandatory: You must have Docker installed and running on your host machine.
NVIDIA GPU: Required for high-performance execution. The nvidia-container-toolkit must be configured for Docker.
ARM Architecture Support: If you are running BioTarget on an ARM machine (like aarch64 / Apple Silicon / Graviton), BioTarget will automatically pull the linux/amd64 Docker image and execute gnina via Docker's x86_64 emulation (QEMU). Emulation does not support GPU pass-through, so ARM systems will execute gnina on the emulated CPU, which is significantly slower but fully functional.

IMPORTANT: Pre-Installation Setup Before running the BioTarget pipeline, you must configure the GNINA docker container. We have provided an automated script that handles architecture detection, pulls the correct image, configures QEMU emulation for ARM devices, and verifies GPU pass-through on x86 machines.

Run the following setup script:

chmod +x scripts/install_gnina_docker.sh
./scripts/install_gnina_docker.sh

To ensure your environment is ready, you can manually verify Docker is working with GPUs (on x86_64 systems):

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

(Note: macOS fallbacks are no longer supported. You must run this pipeline on a Linux machine with an NVIDIA GPU and Docker installed.)

OpenFold-3 / AlphaFold DB (Protein Structure Prediction)

For Stage B, the pipeline attempts to fetch validated 3D structures.

By default, the pipeline has been upgraded to automatically pull .pdb files from the AlphaFold Protein Structure Database via their API.
If you specifically need to fold novel variants de novo, you will need OpenFold-3 weights. These fall under a strict CC-BY-NC license. Request access via AQLaboratory/OpenFold and place the .pt files in ~/.biotarget/openfold3_weights/.

🔬 Running the BioTarget Pipeline

The pipeline is invoked via the unified biotarget/cli.py orchestrator (or via the biotarget command if installed globally).

To execute the end-to-end pipeline for a specific disease:

python biotarget/cli.py run full \
  --disease "Alzheimer" \
  --target-model hetero-gnn \
  --structure-engine openfold3 \
  --binding-engine gnina \
  --top-targets 3 \
  --top-ligands 10

Example Output

[Stage A] Disease -> Protein Target Ranking
[*] Querying Open Targets & DisGeNET for 'Alzheimer'...
[*] Found 3 highly ranked targets.

[Stage B] Protein Structure Generation
[*] Using engine: openfold3
[*] Folding GBA (P04062) with OpenFold-3...

[Stage C] Generative AI: De Novo Candidate Generation
[*] Generating 3000 de novo molecular structures...
[*] Generating 3D conformers for the generative pool using 64 CPU cores...
[*] Using DrugCLIP to guide selection of the top 100 generated candidates...
[*] Successfully finalized 10x generative candidate pool (N=100).

[Stage D] Binding Evaluation (gnina) & Toxicity Filtering (DrugCLIP)
[*] Loaded Target Receptor: GBA from Stage B (/runs/structures/GBA_openfold3.pdb)
[*] Computing Toxicity penalties for 100 candidates via DrugCLIP...
[*] Executing 'gnina' structure-aware docking & CNN scoring on 100 candidates...

[Stage E] Reporting
=====================================================================================
BIOTARGET PIPELINE FINAL RESULTS FOR: 'Alzheimer'
=====================================================================================
Rank  | Final  | Gnina (pK_d) | Tox Penalty   | SMILES
-------------------------------------------------------------------------------------
#1    | 0.9944 | 9.4457 (0.99) | 0.0000 OK      | CCC1(C(C)(C)C)CCOC1=O...
#2    | 0.8108 | 8.9903 (0.91) | 0.2005 OK      | COc1ccccc1N=C(S)N(CCN1CCOCC1)Cc1ccc...
#3    | 0.7631 | 9.2345 (0.96) | 0.3852 OK      | CCOC(=O)C1CCCN(c2c(NCCCN(C)Cc3ccccc...
#4    | 0.5101 | 8.8713 (0.87) | 0.7225 ⚠️ HIGH | CCCC(N=C(S)NCC1CCCO1)C12CC3CC(CC(C3...

🛠 Model Extensibility (The Roadmap)

While this framework establishes the AI-driven core, it is intentionally modular to support the integration of downstream biophysics tools:

Generative Expansion: Swapping the simulated candidate subset for an active autoregressive/diffusion generative model to perform closed-loop optimization.
MD Refinement: Automated hand-off of the top $K$ hits to OpenMM for physical stability analysis and short MD relaxation.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Apr 14, 2026

This version

0.1.4

Apr 14, 2026

0.1.3

Apr 14, 2026

0.1.2

Apr 14, 2026

0.1.1

Apr 14, 2026

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biotarget-0.1.4.tar.gz (18.0 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

biotarget-0.1.4-py3-none-any.whl (16.6 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file biotarget-0.1.4.tar.gz.

File metadata

Download URL: biotarget-0.1.4.tar.gz
Upload date: Apr 14, 2026
Size: 18.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for biotarget-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`ddc2ebe9dea268f3c638e000cafd3d08b499b125d41613901ae29788da0ebcb9`
MD5	`742cde20c8c78188d3a7ba750e196cc4`
BLAKE2b-256	`ee067faf3de486acfdb9fd393a46acb6b0e29dad1acf5a39458f4ba3cfcdd226`

See more details on using hashes here.

File details

Details for the file biotarget-0.1.4-py3-none-any.whl.

File metadata

Download URL: biotarget-0.1.4-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 16.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for biotarget-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e48ea0b88558fbc4b470c57b17f29c07a7697b4be571cdd6e0e72715687eed70`
MD5	`2669b68bf2a0cb3f77e4d2d0d75be85d`
BLAKE2b-256	`b619a9a6403efbe29f39cb229ddce9c197377f54b1b97fa3cf6a711abfde65e0`

See more details on using hashes here.

biotarget 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

BioTarget: End-to-End AI Drug Discovery Pipeline 🧬💊

🎯 The Pipeline Architecture

1. Stage A: Disease $\rightarrow$ Target Ranking

2. Stage B: Protein Structure Generation

3. Stage C: Generative AI & Candidate Extraction

4. Stage D: Multi-Objective Binding & Toxicity Evaluation

5. Stage E: Ranking & Reporting

🚀 Installation & Setup

1. Base Installation

2. Install DrugCLIP (Required)

3. External Dependencies (Required)

GNINA (Physics-Based Binding Evaluation via Docker)

OpenFold-3 / AlphaFold DB (Protein Structure Prediction)

🔬 Running the BioTarget Pipeline

Example Output

🛠 Model Extensibility (The Roadmap)

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes