Preprocessing file generator for Boltz-2 ternary (Protein 1 + Ligand + Protein 2) binding prediction.
Project description
BoltzYML
Preprocessing-file generator for ternary (Protein 1 + Ligand + Protein 2) binding prediction with Boltz-2.
Live web app → · CLI · How it works
When to use BoltzYML
Use this tool only if you want to study how the interaction between two proteins changes in the presence of a ligand.
That is the one scenario it is built for: a small molecule sits in a pocket of Protein 1, and you want Boltz-2 to predict how Protein 1 and Protein 2 dock together while that ligand is bound (or absent, as a control). Typical use cases:
- ABA signaling: PYR/PYL/RCAR receptor + ABA + PP2C phosphatase
- Allosteric drug screens where a ligand is expected to enable or block partner binding
- Any ternary system where the ligand sits on Protein 1 and you care about the Protein 1 ↔ Protein 2 interface
If your problem is just protein–protein docking, just protein–ligand docking, or anything that is not a ternary Protein 1 + Ligand + Protein 2 complex, BoltzYML will not help — write the YAML directly or use the Boltz-2 examples.
What it does
You hand BoltzYML two CIF files:
| Input | Contents | Role |
|---|---|---|
protein1_ligand.cif |
Protein 1 bound to the ligand (docked structure, e.g. AlphaFill, Glide, AutoDock) | Source of Protein 1 sequence + ligand identity + pocket residues |
protein1_protein2.cif |
Protein 1 together with Protein 2 (e.g. AlphaFold3 prediction) | Source of Protein 2 sequence |
BoltzYML then runs entirely in your browser (or via the CLI) and produces a single Boltz-2 v1 YAML with:
- Protein 1 sequence on chain
A - The ligand on chain
B(with its PDB CCD code) - Protein 2 sequence on chain
C - A
pocketconstraint listing the Protein 1 CA atoms within a cutoff of the ligand - An
affinityproperty block targeting the ligand
That YAML is then fed straight into the Boltz-2 CLI.
Two ways to run it
1. Web app (recommended)
Open https://ayushmania2002.github.io/boltzyml/, drop in the two CIFs, click Generate YAML, download the file.
Everything happens in your browser — no uploads, no server, no telemetry. Works on any modern browser, no installation required.
2. Command-line interface
Install from PyPI:
pip install boltzyml
Then run:
boltzyml \
--ligand PYL2_ABA.cif \
--complex PYL2_PP2C30.cif \
--output PYL2_ABA_PP2C30.yaml
Pure Python 3.10+, zero runtime dependencies (no gemmi, no numpy, no pyyaml required).
You can also clone the repo and run it as a module without installing:
git clone https://github.com/Ayushmania2002/boltzyml.git
cd boltzyml
pip install -e .
boltzyml --ligand A.cif --complex B.cif -o out.yaml
CLI options
| Flag | Description | Default |
|---|---|---|
--ligand |
CIF with Protein 1 + Ligand | required |
--complex |
CIF with Protein 1 + Protein 2 | required |
-o, --output |
Output YAML path | {ligand-stem}__{complex-stem}.yaml |
--job-name |
Job name used for the default output filename | derived |
--cutoff |
CA-to-ligand distance cutoff in Å | 6.0 |
--ccd |
Override the ligand CCD code (used verbatim in the YAML's ccd: field) |
auto-detect |
--no-affinity |
Skip the affinity prediction block | off |
--no-pocket |
Skip the pocket constraints block | off |
--no-force |
Emit force: false on the pocket block |
off |
-v, --verbose |
Print detected chains, ligand, and contacts | off |
How it works
- Parse both CIFs. The parser handles two layouts seen in the wild: single-line atom records (AlphaFold3 style, 18 fields per line) and two-line records (some AlphaFill outputs, 21 fields split 17 + 4).
- Identify the ligand. The first non-water HETATM whose comp ID is not a standard amino acid is taken as the ligand. If the CIF labels it generically (
LIG,UNL,UNK), the tool warns and asks for a CCD override. - Assign chains. Protein 1 = the longest protein chain in the ligand CIF. The corresponding chain in the complex CIF is matched by 5-mer overlap similarity; the remaining chain is Protein 2.
- Compute pocket contacts. CA atoms of Protein 1 within
--cutoffÅ of any ligand atom are emitted as[A, residue_number]entries, using the CIF'sauth_seq_idso the numbers match Boltz-2's own residue numbering. - Emit YAML. A deterministic Boltz-2 v1 YAML is written out, in the field order Boltz-2 expects.
Why CA-only at 6 Å? CA-to-ligand at 6 Å approximates all-atom contacts at ~4 Å, which is the right scale for the soft pocket constraint Boltz-2 applies as an inference-time potential.
Example output
version: 1
sequences:
- protein:
id: A
sequence: MEAHVERALREGLTEEERAALEPAVMAHHTFPPSTTTATTAAATCTSLVTQRVAAPVRAVWPIVRSFGNPQRYKHFVRTCALAAGDGASVGSVREVTVVSGLPASTSTERLEMLDDDRHIISFRVVGGQHRLRNYRSVTSVTEFQPPAAGPAPAPPYCVVVESYVVDVPDGNTAEDTRMFTDTVVKLNLQKLAAVAEDSSSASRRRD
- ligand:
id: B
ccd: A8S
- protein:
id: C
sequence: MAEICCEVVAGSSSEGKGPECDTGSRAARRRR...
constraints:
- pocket:
binder: B
contacts:
- [A, 76] # PHE76
- [A, 98] # VAL98
- [A, 103] # PRO103
- [A, 104] # ALA104
- [A, 107] # SER107
- [A, 130] # HIS130
- [A, 131] # ARG131
- [A, 132] # LEU132
- [A, 181] # THR181
- [A, 184] # VAL184
- [A, 185] # VAL185
max_distance: 6.0
force: true
properties:
- affinity:
binder: B
Running Boltz-2 on the output
pip install boltz
boltz predict job.yaml \
--use_msa_server \
--use_potentials \
--diffusion_samples 3 \
--recycling_steps 3 \
--step_scale 1.638
Key result files:
| File | Contents |
|---|---|
*_model_0.pdb |
Predicted ternary structure |
confidence_*.json |
iptm, ptm, ligand_iptm |
affinity_*.json |
affinity_pred_value in kcal/mol |
pae_*.npz |
Predicted aligned error matrix |
Sanity checks:
iptm > 0.6— confident protein–protein interfaceligand_iptm > 0.5— confident ligand placement- A low off-diagonal block between chain
Aand chainCin the PAE map = confident interaction
Project layout
boltzyml/
├── index.html # Web app (deploy to GitHub Pages as-is)
├── logo.png # Wordmark — favicon + header logo
├── banner.png # Pipeline schematic — used in this README
│
├── pyproject.toml # PyPI packaging metadata (hatchling)
├── LICENSE # MIT
│
├── src/boltzyml/
│ ├── __init__.py # Public API re-exports
│ ├── cli.py # CLI entry point (boltzyml command)
│ ├── parser.py # CIF parser (1-line and 2-line layouts)
│ ├── contacts.py # CA-to-ligand pocket contact computation
│ ├── utils.py # Chain assignment via k-mer similarity
│ └── yaml_writer.py # Boltz-2 v1 YAML emitter
│
└── tests/
├── test_parser.py # Synthetic CIFs for both layouts
├── test_contacts.py # Cutoff filtering, missing ligand, chain remap
└── test_cli.py # End-to-end CLI smoke test
Tests
pip install -e ".[dev]"
pytest
Or run each file directly without pytest:
python tests/test_parser.py
python tests/test_contacts.py
python tests/test_cli.py
Each script exits non-zero on failure and prints all tests passed otherwise.
Deploying the web app to GitHub Pages
- Push the repo to GitHub.
- Repo → Settings → Pages → Source: Deploy from a branch → Branch:
main// (root)→ Save. - The site goes live at
https://<your-username>.github.io/<repo>/in a minute or so.index.html,logo.png, andbanner.pngare everything the page needs — no build step.
Gotchas
LIG/UNL/UNKligands. Docking tools often name the ligand generically. The real PDB CCD code (e.g.A8Sfor abscisic acid,ATP,HEM) belongs in theccd:field — use the CCD override in the web app or--ccdon the CLI. Verify codes at https://www.rcsb.org/ligand/.- Residue numbering. Boltz-2 uses
auth_seq_idfrom your CIF. If your CIF starts at residue 14 (truncated structure), the pocket numbers will start at 14 too — that is correct. - Apo vs holo Protein 1. For the ligand CIF, use the holo (ligand-bound) structure, not the apo AlphaFold prediction, so the pocket is in the right conformation.
- Tamarind Bio users. The same YAML works on https://app.tamarind.bio/boltz. Do not include a
templates:block when submitting there — use Tamarind's UI fields for template CIFs instead.
Citation
If BoltzYML is useful in published work, please cite Boltz-2 itself:
Wohlwend, J. et al. Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction. 2024. https://github.com/jwohlwend/boltz
A standalone citation for BoltzYML is not necessary — a link back to this repo is appreciated.
License
MIT.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boltzyml-0.1.0.tar.gz.
File metadata
- Download URL: boltzyml-0.1.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d6a1c9efe464abbe28a933a86bf027680da60d6b01e06e1d44863e3e271649f
|
|
| MD5 |
2e458e23e4e6695cc42acca85d1e6a76
|
|
| BLAKE2b-256 |
be826ecbd3a3ff74902d5c3eafe5d77cc972ef7b85cd2651e01c00696e02c3bb
|
File details
Details for the file boltzyml-0.1.0-py3-none-any.whl.
File metadata
- Download URL: boltzyml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f0244ff0d9028ccbab769832157fc6332a3ad014684d3cb31c91d17718a8340
|
|
| MD5 |
529240dd0e8b444b0049c66f452f896c
|
|
| BLAKE2b-256 |
a47f588a826b831ea10a4f13b888034fbedc5174c7ed8374120e81098a9a314e
|