Python client for Boltz-2 protein structure prediction API with covalent complex and multi-endpoint support
Project description
Boltz-2 Python Client
Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
A comprehensive Python client for NVIDIA's Boltz-2 biomolecular structure prediction service. This package provides both synchronous and asynchronous interfaces, a rich CLI, and built-in 3D visualization capabilities.
🚀 Features
- ✅ Full API Coverage - Complete Boltz-2 API support
- ✅ Async & Sync Clients - Choose your preferred programming style
- ✅ Rich CLI Interface - Beautiful command-line tools with progress bars
- ✅ 3D Visualization - Built-in py3Dmol integration for structure viewing
- ✅ Flexible Endpoints - Support for both local and NVIDIA hosted services
- ✅ Type Safety - Full Pydantic model validation
- ✅ YAML Configuration - Official Boltz format support
- ✅ Affinity Prediction - Predict binding affinity (IC50) for protein-ligand complexes
- ✅ Virtual Screening - High-level API for drug discovery campaigns
- ✅ MSA Search Integration - GPU-accelerated MSA generation with NVIDIA MSA Search NIM
- ✅ A3M to Multimer MSA - Convert ColabFold A3M files to paired multimer format (NEW)
- ✅ Multi-Endpoint Load Balancing - Distribute predictions across multiple NIMs
- ✅ Comprehensive Examples - Ready-to-use code samples
📦 Installation
From PyPI (Recommended)
pip install boltz2-python-client
From TestPyPI (Latest Development)
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ boltz2-python-client
From Source
git clone https://github.com/NVIDIA/digital-biology-examples.git
cd digital-biology-examples/examples/nims/boltz-2
pip install -e .
🎯 Quick Start
Python API
import asyncio
from boltz2_client import Boltz2Client
async def quick_prediction():
client = Boltz2Client(base_url="http://localhost:8000")
seq = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
# --- BASIC (no MSA) --------------------------------------------
basic = await client.predict_protein_structure(sequence=seq)
print("basic confidence", basic.confidence_scores[0])
# --- MSA-GUIDED --------------------------------------------------
msa_path = "msa-kras-g12c_combined.a3m" # any *.a3m/*.sto/*.fasta file
msa = [(msa_path, "a3m")]
msa_res = await client.predict_protein_structure(
sequence=seq,
msa_files=msa, # NEW helper will auto-convert ➜ nested-dict
sampling_steps=50,
recycling_steps=3,
)
print("msa confidence", msa_res.confidence_scores[0])
if __name__ == "__main__":
asyncio.run(quick_prediction())
CLI Usage
# Health check
boltz2 health
# Protein structure prediction
boltz2 protein "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
# Protein-ligand complex
boltz2 ligand "PROTEIN_SEQUENCE" --smiles "CC(=O)OC1=CC=CC=C1C(=O)O"
# Protein-ligand with affinity prediction
boltz2 ligand "PROTEIN_SEQUENCE" --smiles "CC(=O)OC1=CC=CC=C1C(=O)O" --predict-affinity
# Covalent complex with bond constraints
boltz2 covalent "SEQUENCE" --ccd U4U --bond A:11:SG:L:C22
# Virtual screening campaign
boltz2 screen "TARGET_SEQUENCE" compounds.csv -o screening_results/
# MSA search
boltz2 msa-search "PROTEIN_SEQUENCE" --databases Uniref30_2302 colabfold_envdb_202108 --output msa.a3m
# MSA search + structure prediction
boltz2 msa-predict "PROTEIN_SEQUENCE" --databases Uniref30_2302 --max-sequences 1000
# MSA search + ligand affinity
boltz2 msa-ligand "PROTEIN_SEQUENCE" --smiles "LIGAND_SMILES" --predict-affinity
# Convert ColabFold A3M files to paired multimer CSV
boltz2 convert-msa chain_A.a3m chain_B.a3m -c A,B -o paired.csv
# One-command multimer prediction from A3M files (NEW)
boltz2 multimer-msa chain_A.a3m chain_B.a3m -c A,B -o complex.cif
# Multi-endpoint multimer prediction
boltz2 --multi-endpoint --base-url "http://gpu1:8000,http://gpu2:8000" \
multimer-msa chain_A.a3m chain_B.a3m -c A,B -o complex.cif
Affinity Prediction
from boltz2_client import Boltz2Client, Polymer, Ligand, PredictionRequest
client = Boltz2Client()
# Create protein and ligand with affinity prediction
protein = Polymer(id="A", molecule_type="protein", sequence="YOUR_SEQUENCE")
ligand = Ligand(id="LIG", smiles="CC(=O)OC1=CC=CC=C1C(=O)O", predict_affinity=True)
request = PredictionRequest(
polymers=[protein],
ligands=[ligand],
sampling_steps_affinity=200, # Affinity-specific parameters
diffusion_samples_affinity=5
)
result = await client.predict(request)
# Access affinity results
if result.affinities and "LIG" in result.affinities:
affinity = result.affinities["LIG"]
print(f"pIC50: {affinity.affinity_pic50[0]:.2f}")
print(f"IC50: {10**(-affinity.affinity_pic50[0])*1e9:.1f} nM")
print(f"Binding probability: {affinity.affinity_probability_binary[0]:.1%}")
MSA Search Integration (NEW)
Integrate GPU-accelerated MSA Search NIM for enhanced protein structure predictions:
from boltz2_client import Boltz2Client
# Initialize and configure MSA Search
client = Boltz2Client()
client.configure_msa_search(
msa_endpoint_url="https://health.api.nvidia.com/v1/biology/nvidia/msa-search",
api_key="your_nvidia_api_key" # Or set NVIDIA_API_KEY env var
)
# One-step MSA search + structure prediction
result = await client.predict_with_msa_search(
sequence="MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
databases=["Uniref30_2302", "PDB70_220313"],
max_msa_sequences=1000,
e_value=0.0001
)
print(f"Confidence: {result.confidence_scores[0]:.3f}")
# Or just search MSA and save in different formats
msa_path = await client.search_msa(
sequence="YOUR_PROTEIN_SEQUENCE",
output_format="a3m", # Options: a3m, fasta, csv, sto
save_path="protein_msa.a3m"
)
See the MSA Search Guide for detailed usage and parameters.
A3M to Multimer MSA Conversion (NEW)
Convert ColabFold-generated A3M monomer MSA files to paired multimer format for Boltz2:
from boltz2_client import (
Boltz2Client,
convert_a3m_to_multimer_csv,
create_paired_msa_per_chain,
Polymer, PredictionRequest
)
# Convert A3M files to paired MSA (auto-detects pairing mode)
result = convert_a3m_to_multimer_csv(
a3m_files={'A': 'chain_A.a3m', 'B': 'chain_B.a3m'}
)
print(f"Paired {result.num_pairs} sequences")
# Create per-chain MSA structures for Boltz2
msa_per_chain = create_paired_msa_per_chain(result)
# Create polymers with paired MSA
protein_A = Polymer(id="A", molecule_type="protein",
sequence=result.query_sequences['A'],
msa=msa_per_chain['A'])
protein_B = Polymer(id="B", molecule_type="protein",
sequence=result.query_sequences['B'],
msa=msa_per_chain['B'])
# Predict complex structure
client = Boltz2Client(base_url="http://localhost:8000")
response = await client.predict(PredictionRequest(
polymers=[protein_A, protein_B],
recycling_steps=3,
sampling_steps=200
))
CLI One-Command Prediction
# Predict directly from A3M files (converts + predicts in one step)
boltz2 --base-url http://localhost:8000 multimer-msa \
chain_A.a3m chain_B.a3m \
-c A,B \
-o complex.cif
# Save all outputs: structure, paired CSVs, and confidence scores
boltz2 --base-url http://localhost:8000 multimer-msa \
chain_A.a3m chain_B.a3m \
-c A,B \
-o complex.cif \
--save-csv \ # Save paired CSV files
--save-all # Save scores JSON (confidence, pLDDT, pTM, etc.)
# With multi-endpoint load balancing
boltz2 --multi-endpoint \
--base-url "http://gpu1:8000,http://gpu2:8000,http://gpu3:8000" \
multimer-msa chain_A.a3m chain_B.a3m -c A,B -o complex.cif --save-all
Output files with --save-all --save-csv:
output/
├── complex.cif # 3D structure (mmCIF)
├── complex.scores.json # Confidence scores, pLDDT, pTM, metrics
├── complex_chain_A.csv # Paired MSA for chain A
└── complex_chain_B.csv # Paired MSA for chain B
Save All Outputs (Python API)
from boltz2_client import save_prediction_outputs, get_prediction_summary
# Save all outputs with one function call
paths = save_prediction_outputs(
response=response,
output_dir=Path("results"),
base_name="my_complex",
save_structure=True, # Save CIF file(s)
save_scores=True, # Save scores JSON
save_csv=True, # Save paired CSVs
conversion_result=result # From convert_a3m_to_multimer_csv
)
print(paths)
# {'structure': Path('results/my_complex.cif'),
# 'scores': Path('results/my_complex.scores.json'),
# 'csv_A': Path('results/my_complex_chain_A.csv'),
# 'csv_B': Path('results/my_complex_chain_B.csv')}
# Get a quick summary of prediction quality
summary = get_prediction_summary(response)
print(f"Confidence: {summary['confidence']:.2f}")
print(f"Quality: {summary['quality_assessment']}") # Very High/High/Medium/Low
See the A3M to Multimer MSA Guide for detailed usage.
Virtual Screening
from boltz2_client import quick_screen
# Minimal virtual screening
compounds = [
{"name": "Aspirin", "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O"},
{"name": "Ibuprofen", "smiles": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"}
]
result = quick_screen(
target_sequence="YOUR_PROTEIN_SEQUENCE",
compounds=compounds,
target_name="My Target",
output_dir="screening_results"
)
# Show top hits
print(result.get_top_hits(n=5))
Multi-Endpoint Virtual Screening (NEW)
Parallelize screening across multiple Boltz-2 NIM endpoints for better throughput:
from boltz2_client import MultiEndpointClient, LoadBalanceStrategy, VirtualScreening
# Configure multiple endpoints
multi_client = MultiEndpointClient(
endpoints=[
"http://localhost:8000",
"http://localhost:8001",
"http://localhost:8002",
],
strategy=LoadBalanceStrategy.LEAST_LOADED
)
# Use with virtual screening
vs = VirtualScreening(client=multi_client)
result = await vs.screen(
target_sequence="YOUR_PROTEIN_SEQUENCE",
compound_library=compounds,
predict_affinity=True
)
# View endpoint statistics
multi_client.print_status()
See MULTI_ENDPOINT_GUIDE.md for detailed setup instructions.
3D Visualization
import py3Dmol
from boltz2_client import Boltz2Client
client = Boltz2Client()
result = await client.predict_protein_structure(sequence="YOUR_SEQUENCE", recycling_steps=6, sampling_steps=50 )
# Create 3D visualization
view = py3Dmol.view(width=800, height=600)
view.addModel(result.structures[0].structure, 'cif')
view.setStyle({'cartoon': {'color': 'spectrum'}})
view.zoomTo()
view.show()
🔧 Configuration
Local Endpoint (Default)
client = Boltz2Client(base_url="http://localhost:8000")
NVIDIA Hosted Endpoint
client = Boltz2Client(
base_url="https://health.api.nvidia.com",
api_key="your_api_key",
endpoint_type="nvidia_hosted"
)
Environment Variables
export NVIDIA_API_KEY="your_api_key"
export BOLTZ2_BASE_URL="http://localhost:8000"
🐳 Local Deployment Setup
To run Boltz-2 locally using NVIDIA's NIM (NVIDIA Inference Microservice) container, follow these steps:
Prerequisites
- NVIDIA GPU with sufficient VRAM (recommended: 24GB+)
- Docker with NVIDIA Container Runtime
- NGC Account with API key
Step 1: Generate NGC API Key
- Go to NGC (NVIDIA GPU Cloud)
- Sign in or create an account
- Navigate to Setup → Generate API Key
- Copy your personal API key
Step 2: Docker Login
# Login to NVIDIA Container Registry
docker login nvcr.io
Username: $oauthtoken
Password: <PASTE_API_KEY_HERE>
Step 3: Set Up Environment
# Export your NGC API key
export NGC_API_KEY=<your_personal_NGC_key>
# Create local cache directory (recommended for model reuse)
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p $LOCAL_NIM_CACHE
chmod -R 777 $LOCAL_NIM_CACHE
Step 4: Run Boltz-2 NIM Container
Option A: Use All Available GPUs (Default)
docker run -it \
--runtime=nvidia \
-p 8000:8000 \
-e NGC_API_KEY \
-v "$LOCAL_NIM_CACHE":/opt/nim/.cache \
nvcr.io/nim/mit/boltz2:1.0.0
Option B: Use Specific GPU (e.g., GPU 0)
docker run -it \
--runtime=nvidia \
--gpus='"device=0"' \
-p 8000:8000 \
-e NGC_API_KEY \
-v "$LOCAL_NIM_CACHE":/opt/nim/.cache \
nvcr.io/nim/mit/boltz2:1.0.0
Step 5: Verify Installation
Once the container is running, test the service:
# Health check
curl http://localhost:8000/v1/health/live
# Or using the Python client
python -c "
import asyncio
from boltz2_client import Boltz2Client
async def test():
client = Boltz2Client(base_url='http://localhost:8000')
health = await client.health_check()
print(f'Service status: {health.status}')
asyncio.run(test())
"
🚨 Important Notes
- First Run: The container will automatically download models (~several GB), which may take time
- Cache Directory: Using
LOCAL_NIM_CACHEsaves bandwidth and time for subsequent runs - GPU Memory: Ensure sufficient GPU memory for your prediction workloads
- Port 8000: Make sure port 8000 is available and not blocked by firewall
- Network: Container needs internet access for initial model downloads
🔧 Troubleshooting
Container fails to start:
# Check GPU availability
nvidia-smi
# Check Docker NVIDIA runtime
docker run --rm --runtime=nvidia nvidia/cuda:11.0-base nvidia-smi
Permission issues:
# Fix cache directory permissions
sudo chown -R $USER:$USER $LOCAL_NIM_CACHE
chmod -R 755 $LOCAL_NIM_CACHE
Memory issues:
# Monitor GPU memory usage
watch -n 1 nvidia-smi
# Use specific GPU with more memory
docker run --gpus='"device=1"' ... # Use GPU 1 instead
📚 Examples
The examples/ directory contains comprehensive examples:
- 01_basic_protein_folding.py - Simple protein structure prediction
- 02_protein_structure_prediction_with_msa.py - MSA-guided predictions with comparison
- 03_protein_ligand_complex.py - Protein-ligand complexes
- 04_covalent_bonding.py - Covalent bond constraints
- 05_dna_protein_complex.py - DNA-protein interactions
- 06_yaml_configurations.py - YAML config files
- 07_advanced_parameters.py - Advanced API parameters
- 08_affinity_prediction.py - Binding affinity prediction (IC50/pIC50)
- 15_a3m_to_multimer_csv.py - A3M to multimer MSA conversion
- 16_colabfold_a3m_to_multimer.ipynb - Interactive notebook for multimer MSA (NEW)
- A3M_TO_MULTIMER_MSA.md - Comprehensive guide for A3M conversion (NEW)
🧪 Supported Prediction Types
| Type | Description | CLI Command | Python Method |
|---|---|---|---|
| Protein | Single protein folding | protein |
predict_protein_structure() |
| Ligand Complex | Protein-ligand binding | ligand |
predict_protein_ligand_complex() |
| Covalent Complex | Covalent bonds | covalent |
predict_covalent_complex() |
| DNA-Protein | Nucleic acid complexes | dna-protein |
predict_dna_protein_complex() |
| Advanced | Custom parameters | advanced |
predict_with_advanced_parameters() |
| YAML | Configuration files | yaml |
predict_from_yaml_config() |
🔬 Advanced Features
Batch Processing
from boltz2_client import Boltz2Client
import asyncio
async def batch_predictions():
client = Boltz2Client()
sequences = ["SEQ1", "SEQ2", "SEQ3"]
# Process multiple sequences concurrently
tasks = [client.predict_protein_structure(seq) for seq in sequences]
results = await asyncio.gather(*tasks)
for i, result in enumerate(results):
print(f"Sequence {i+1}: Confidence {result.confidence:.3f}")
MSA-Guided Predictions
# With MSA file
result = await client.predict_protein_structure(
sequence="YOUR_SEQUENCE",
msa_files=[("path/to/alignment.a3m", "a3m")]
)
Custom Parameters
result = await client.predict_with_advanced_parameters(
polymers=[{"id": "A", "sequence": "SEQUENCE"}],
recycling_steps=3,
sampling_steps=200,
diffusion_samples=1
)
🆕 Affinity Prediction
Predict binding affinity (IC50/pIC50) for protein-ligand complexes:
from boltz2_client import Boltz2Client, Polymer, Ligand, PredictionRequest
# Create protein and ligand
protein = Polymer(id="A", molecule_type="protein", sequence="YOUR_SEQUENCE")
ligand = Ligand(id="LIG", smiles="CC(=O)OC1=CC=CC=C1C(=O)O", predict_affinity=True)
# Create request with affinity parameters
request = PredictionRequest(
polymers=[protein],
ligands=[ligand],
sampling_steps_affinity=200, # Default: 200
diffusion_samples_affinity=5, # Default: 5
affinity_mw_correction=False # Default: False
)
# Predict structure and affinity
result = await client.predict(request)
# Access affinity results
if result.affinities and "LIG" in result.affinities:
affinity = result.affinities["LIG"]
print(f"pIC50: {affinity.affinity_pic50[0]:.3f}")
print(f"Binding probability: {affinity.affinity_probability_binary[0]:.3f}")
🧬 MSA-Guided Affinity Prediction
Combine MSA search with affinity prediction for improved accuracy:
# Configure MSA Search
client.configure_msa_search("http://your-msa-nim:8000")
# Predict with MSA + affinity in one call
result = await client.predict_ligand_with_msa_search(
protein_sequence="YOUR_SEQUENCE",
ligand_smiles="CC(=O)OC1=CC=CC=C1C(=O)O",
predict_affinity=True,
databases=["Uniref30_2302", "PDB70_220313"],
max_msa_sequences=1000,
sampling_steps_affinity=300
)
# Or use existing MSA file
result = await client.predict_protein_ligand_complex(
protein_sequence="YOUR_SEQUENCE",
ligand_smiles="LIGAND_SMILES",
msa_files=[("alignment.a3m", "a3m")],
predict_affinity=True
)
CLI Usage
# Basic affinity prediction
boltz2 ligand "PROTEIN_SEQUENCE" --smiles "LIGAND_SMILES" --predict-affinity
# With custom parameters
boltz2 ligand "PROTEIN_SEQUENCE" --ccd Y7W \
--predict-affinity \
--sampling-steps-affinity 100 \
--diffusion-samples-affinity 3 \
--affinity-mw-correction
Note: Only ONE ligand per request can have affinity prediction enabled.
🛠 Development
Setup Development Environment
git clone https://github.com/NVIDIA/digital-biology-examples.git
cd digital-biology-examples/examples/nims/boltz-2
pip install -e ".[dev]"
Run Tests
pytest tests/
Code Formatting
black boltz2_client/
isort boltz2_client/
Type Checking
mypy boltz2_client/
📋 Requirements
- Python: 3.8+
- Dependencies:
httpx>=0.24.0- HTTP clientpydantic>=2.0.0- Data validationrich>=13.0.0- CLI formattingaiofiles>=23.0.0- Async file operationsclick>=8.0.0- CLI frameworkPyYAML>=6.0.0- YAML supportpy3Dmol>=2.0.0- 3D visualization
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Merge Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Third-party dependencies are licensed under their respective licenses - see the licenses/ directory for details.
📚 Documentation
Guides
- MSA Search Guide - GPU-accelerated MSA generation with NVIDIA MSA Search NIM
- A3M to Multimer MSA Guide - Convert ColabFold A3M files to paired multimer format (NEW)
- Affinity Prediction Guide - Comprehensive guide for binding affinity prediction
- YAML Configuration Guide - Working with YAML configuration files
- Async Programming Guide - Best practices for async operations
- Covalent Complex Guide - Predicting covalent bonds
- Multi-Endpoint Guide - Load balancing across multiple NIMs
- Parameters Guide - Detailed parameter documentation
🔗 Links
- TestPyPI: https://test.pypi.org/project/boltz2-python-client/
- NVIDIA BioNeMo: https://www.nvidia.com/en-us/clara/bionemo/
- Boltz-2 Paper: Link to Boltz-2 paper
🏆 Acknowledgments
- NVIDIA BioNeMo Team for the Boltz-2 service
- Contributors and testers
- Open source community
Disclaimer
This software is provided as-is without warranties of any kind. No guarantees are made regarding the accuracy, reliability, or fitness for any particular purpose. The underlying models and APIs are experimental and subject to change without notice. Users are responsible for validating all results and assessing suitability for their specific use cases.
Made with ❤️ for the computational biology community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boltz2_python_client-0.3.2.tar.gz.
File metadata
- Download URL: boltz2_python_client-0.3.2.tar.gz
- Upload date:
- Size: 959.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2937927d4aecf4277b0726b51a28242c960c95fbb985b3dc29ce90aa80eff7a
|
|
| MD5 |
1de76d25692ced97b0d9b10b29288ff7
|
|
| BLAKE2b-256 |
2c83d4860c1d2048541452ed2178a848317535d5db64f21acecf80ea49e45707
|
File details
Details for the file boltz2_python_client-0.3.2-py3-none-any.whl.
File metadata
- Download URL: boltz2_python_client-0.3.2-py3-none-any.whl
- Upload date:
- Size: 815.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edf3648281fb48089425dbfe0b4faf306c07ad23c81e4ee27bb5d56832de2c54
|
|
| MD5 |
168a4307b4beaa40ce11f3801ada9929
|
|
| BLAKE2b-256 |
c3a0f0321fdcf3c2f216288a33bd36ee3d9bec82dfd18acd2896da436678bdf4
|