Community-maintained fork of Boltz with bug fixes, broader compatibility, and CI
Project description
boltz-community
Community-maintained fork of Boltz with bug fixes, broader compatibility, and CI.
What's different from upstream?
Compatibility:
- Apple Silicon (MPS) support:
boltz predict --accelerator mps - Dependency pins relaxed from
==to>= fairscaledependency removed — replaced with PyTorch built-intorch.utils.checkpointnumpy<2.0cap removedrequires-pythonwidened to>=3.10(removed<3.13cap)- Compatible with PyTorch 2.6+ and Lightning 2.6+
Bug fixes:
- Fixed ROCm DDP crashes by allocating tensors directly on the target device instead of CPU-then-move (#654)
- Fixed
--write_full_paeand--write_full_pdebeing ignored, and added matching--no_write_full_paeand--no_write_full_pdeflags (#602) - Fixed MSA CSV parsing removing paired duplicate sequences across different taxa (#584)
- Fixed MSA feature construction incorrectly including paired sequences as unpaired (#582)
- Fixed consecutive CA filter rejecting valid protein chains containing metal ions (#576)
- Fixed template alignment forcing gapless matches, breaking templates with indels (#538)
- Fixed relative MSA paths resolved from CWD instead of input file directory (#500)
- Fixed MSA authentication headers forcing
Content-Type, which breaks some MSA servers (#488) - Fixed CLI help/default mismatches and added option validation for MSA pairing strategy (#463)
- Fixed sign error in binding energy calculation documentation (#363)
- Fixed broken v1 attention code path in
PairformerLayer(#602) - Fixed SIGSEGV crash on ligands with invalid implicit valence (#649)
- Fixed
--subsample_msadefaulting to False instead of True (#628) - Fixed 2-char elements (Ca, Fe, Br, Cl) misidentified in PDB/mmCIF output (#458)
- Fixed atom name overflow (>4 chars) crashing large molecule processing (#494)
- Fixed null bytes in A3M files crashing MSA parsing (#509)
- Fixed bfloat16 dtype mismatch in potentials (#625)
- Fixed CCD tar re-download on every run when
mols/already exists (#633) - Fixed empty CIF files causing cryptic errors (#641)
- Fixed hardcoded "LIG" residue name in PDB HETATM records (#630)
- Fixed mmCIF entity deduplication for chemically distinct ligands (#630)
- Fixed chirality constraint computation missing stereo assignment (#589)
- Fixed multi-CCD ligands not dropping leaving atoms (#631)
- Fixed
--preprocessing-threadsovercommitting CPUs (#564) - Fixed silent wrong-answer bug: inference
__getitem__no longer substitutes a different record on failure — errors now propagate - Fixed potential stack overflow in training/validation data loading via bounded retry (max 10 attempts)
- Fixed
boltz predictexiting silently with code 0 when all inputs fail validation (e.g. requesting affinity for a protein chain) - Fixed MSA pairing keys lost when loading cached A3M files (#627)
- Fixed affinity prediction crashing when structure prediction fails (e.g. covalent ligands, OOM) — now skips affected records with a warning (#620, #624)
- Fixed affinity prediction for repeated ligand binders: inputs that request affinity for one copy of a repeated ligand entity no longer fail, and affinity is now reported per binder copy as
affinity_<record>_<chain>.json(#647) - Fixed Boltz-2 checkpoint loading crash due to extra
mse_rotational_alignmentkwarg (#644) - Fixed direct Boltz-2 model use from checkpoint hyperparameters crashing during sampling when
steering_argsis missing (#680) - Fixed empty checkpoint files causing cryptic
load_from_checkpointaborts — now re-downloads empty cached weights and raises a clear error before model load (#664) - Fixed CPU inference producing distorted structures with wrong bond lengths — Boltz-2 was incorrectly using
bf16-mixedprecision on CPU; now forces float32 (#653) - Fixed missing PAE summaries in
confidence_*.json— aggregatedcomplex_pae,complex_ipae,chains_pae, andpair_chains_paeare now included, remain available with--no_write_full_pae, use contact-weighted aggregation like PDE, and write undefined values asnull(#607) - Fixed cuequivariance triangle multiplication kernel crashes by falling back to PyTorch when kernels are unavailable or unsupported (#485, with thanks to #682)
- Fixed cuequivariance triangle attention kernel crashes by falling back to PyTorch when kernels are unavailable or unsupported — extends the v2.10.3 fix to the triangle attention path that was missed (#485)
- Fixed forced contact restraints losing all signal when union weighting underflowed for large distance violations (#621, with thanks to #682)
- Fixed incorrect contact-union gradient sign so soft-OR contact restraints apply gradient pressure with the correct magnitude across union members
- Fixed diffusion sampling ignoring
--max_parallel_samplesin divisible cases (for example10samples with a parallel limit of5), which could batch everything into one large chunk and trigger avoidable OOMs - Fixed MSA discarded as "does not match input sequence" when pre-computed MSAs are aligned to a full UniProt sequence but the input uses a shorter PDB construct — Boltz now finds the construct as a contiguous subsequence within the MSA query and trims all MSA rows accordingly, instead of falling back to a dummy single-sequence MSA. Tolerates up to 5% mismatches (selenomethionine substitutions, expression tags, minor construct mutations). Applies to both Boltz-1 and Boltz-2.
- Fixed PDB templates crashing with
IndexError: list index out of rangewhen Gemmi drops entity sequence metadata during PDB→mmCIF conversion — template parsing now falls back to the observed polymer residues, and relative templatepdb/cifpaths are resolved from the YAML file directory instead of the current working directory (#669) - Fixed YAML
bondconstraints for custom cross-residue covalent bonds (for example ACE-CY3 cyclization) missing atom-level bond-length bounds for physical guidance and_struct_connrecords in mmCIF output (#675) - Fixed Boltz-2 fine-tuning/validation crashes when downstream methods read
self.validate_structure; the constructor now stores thevalidate_structureargument on the model (novel-therapeutics/boltz-community#11) - Fixed training-data preprocessing:
scripts/process/rcsb.pywas looking up clusters bypdb_id_entity_idwhilescripts/process/cluster.pykeys its output byhash_sequence(seq)(proteins/RNA/short polymers) or by CCD code (ligands), so every chain silently gotcluster_id=-1andClusterSamplerweighted everything uniformly. Records also hadmsa_id=""for protein chains, so training silently ran with no MSA features. Both fields are now populated correctly, andentity_idis propagated through to the record (#686). Does not affect inference; users who trained or fine-tuned with the documented pipeline need to re-preprocess and re-train. - Fixed affinity prediction running 5× slower than necessary: upstream hardcoded
max_parallel_samples=1for the affinity diffusion path, which was silently masked by upstream's buggychunk(multiplicity % max + 1)math (the divisible case collapsed tochunk(1), batching all samples in one pass). When our earlier diffusion fix replaced that with the correctsplit(max_parallel_samples), the hardcoded1started to actually take effect, forcing N sequential single-sample forward passes per affinity record. The affinity path now honors the user's--max_parallel_samples, capped at--diffusion_samples_affinityso it doesn't claim more parallelism than diffusion will run.
Improvements:
- Published to PyPI as
boltz-community—pip install boltz-communityanduv add boltz-communitynow work without the git URL (#12). Releases are tag-driven via GitHub Actions + PyPI Trusted Publisher (OIDC, no long-lived tokens). - Added
--skip_bad_inputsflag: by defaultboltz predictnow aborts when any input fails processing; pass--skip_bad_inputsto skip bad inputs and continue with the rest - Deferred heavy imports (torch, rdkit, pytorch-lightning) so
boltz.mainloads instantly for CLI help and input validation --devicesnow accepts a comma-separated list of specific GPU device IDs in addition to a device count (e.g.--devices 0,1targets GPUs 0 and 1; useCUDA_VISIBLE_DEVICES=1 boltz predict ...to target a single GPU by index)- Added
--batch_sizefor Boltz-2 structure inference so multiple inputs can be processed per prediction batch. Current limits: affinity prediction remains single-record (batch_size=1), and guided inference (--use_potentials/ contact guidance) is only supported with--batch_size 1
Performance improvements:
- Added optional FlashAttention-2 / PyTorch SDPA acceleration for triangle attention and pair-biased attention via
--flash_attn(off by default). Reduces attention memory footprint and speeds up inference on Ampere+ GPUs while remaining numerically equivalent to the manual einsum path within float-precision tolerance (verified by parity tests) - Model weights now load directly to GPU instead of CPU-then-transfer
- Cached molecule file reads and symmetry deserialization across samples
- Removed dead O(n_tokens × n_chains) loop in pocket distance computation
- Tensors across model modules now allocated directly on device instead of CPU-then-transfer (#654)
- Featurizer MSA pairing fill rewritten with vectorized numpy indexing (eliminates per-row Python loop)
process_atom_featurespre-allocates output arrays and fillsatom_to_tokenin one slice per token (eliminates per-atom appends)
Tests & CI:
- 283 tests in this fork vs. 5 tests in current upstream
jwohlwend/boltz: unit tests (CPU), smoke tests (end-to-end inference), regression tests (golden output verification for Boltz-1 and Boltz-2), determinism tests, MSA trim subsequence matching (8 cases), diffusion chunking regression tests, Boltz-2 validation constructor coverage, and featurizer pre-allocation correctness - GitHub Actions CI with CPU runners (every push/PR) and GPU T4 runners (push to main)
Contributing
Pull requests are welcome! If you have a bug fix, test improvement, or compatibility enhancement, please open a PR.
Installation
Install from PyPI:
pip install boltz-community
With CUDA kernels:
pip install "boltz-community[cuda]"
uv works the same way:
uv add boltz-community # or: uv add "boltz-community[cuda]"
If you are installing on CPU-only or non-CUDA GPU hardware, omit [cuda]. Note that the CPU version is significantly slower than the GPU version.
Installing the bleeding-edge main branch
pip install "boltz-community @ git+https://github.com/Novel-Therapeutics/boltz-community.git"
Apple Silicon (MPS)
On Macs with Apple Silicon, you can run inference on the GPU via MPS:
boltz predict input.yaml --accelerator mps --use_msa_server
MPS mode automatically uses float32 precision and single-device execution. Performance is slower than CUDA but significantly faster than CPU.
Releasing
Releases are tag-driven. Pushing a v*.*.* tag triggers .github/workflows/release.yml, which builds the sdist + wheel, validates them with twine check, and publishes to PyPI via PyPI Trusted Publisher (OIDC) — no long-lived API token is stored in this repo.
One-time setup (PyPI side)
- Register the
boltz-communitypackage on PyPI (create as a new project; the name is currently available). - Open the project's Publishing settings and add a Trusted Publisher with:
- Owner:
Novel-Therapeutics - Repository:
boltz-community - Workflow filename:
release.yml - Environment:
pypi
- Owner:
- In the GitHub repo Settings → Environments, create an environment named
pypi. Optionally add a required-reviewer rule so a publish has to be approved by a maintainer before the OIDC exchange runs.
Per-release flow
- Land all changes on
main. - Bump
versionin pyproject.toml. - Add a bullet under Bug fixes / Improvements / Performance improvements in this README. Update the test count if new tests landed.
- Commit as
Release X.Y.Zand pushmain. - Tag the release commit:
git tag -a vX.Y.Z -m "Release X.Y.Z" && git push origin vX.Y.Z. - The release workflow builds, checks, and publishes the artifacts to PyPI. Watch the run under the Actions tab.
- Create the GitHub release with the release notes (see prior releases for the Highlights / Details / Commits since structure).
Recovering from a failed publish
PyPI rejects re-uploading the same version. If the build succeeded but the publish step failed (e.g. environment approval timed out), you can re-run the Publish to PyPI job from the Actions tab without re-tagging. If a version is published but broken, bump the patch and ship a new release — never delete or yank without a follow-up that supersedes it.
Everything below is from the upstream Boltz README.
Introduction
Boltz is a family of models for biomolecular interaction prediction. Boltz-1 was the first fully open source model to approach AlphaFold3 accuracy. Our latest work Boltz-2 is a new biomolecular foundation model that goes beyond AlphaFold3 and Boltz-1 by jointly modeling complex structures and binding affinities, a critical component towards accurate molecular design. Boltz-2 is the first deep learning model to approach the accuracy of physics-based free-energy perturbation (FEP) methods, while running 1000x faster — making accurate in silico screening practical for early-stage drug discovery.
All the code and weights are provided under MIT license, making them freely available for both academic and commercial uses. For more information about the model, see the Boltz-1 and Boltz-2 technical reports. To discuss updates, tools and applications join our Slack channel.
Inference
You can run inference using Boltz with:
boltz predict input_path --use_msa_server
input_path should point to a YAML file, or a directory of YAML files for batched processing, describing the biomolecules you want to model and the properties you want to predict (e.g. affinity). To see all available options: boltz predict --help and for more information on these input formats, see our prediction instructions. By default, the boltz command will run the latest version of the model.
Binding Affinity Prediction
There are two main predictions in the affinity output: affinity_pred_value and affinity_probability_binary. They are trained on largely different datasets, with different supervisions, and should be used in different contexts. The affinity_probability_binary field should be used to detect binders from decoys, for example in a hit-discovery stage. Its value ranges from 0 to 1 and represents the predicted probability that the ligand is a binder. The affinity_pred_value aims to measure the specific affinity of different binders and how this changes with small modifications of the molecule. This should be used in ligand optimization stages such as hit-to-lead and lead-optimization. It reports a binding affinity value as log10(IC50), derived from an IC50 measured in μM. More details on how to run affinity predictions and parse the output can be found in our prediction instructions.
Authentication to MSA Server
When using the --use_msa_server option with a server that requires authentication, you can provide credentials in one of two ways. More information is available in our prediction instructions.
Training
⚠️ Coming soon: updated training code for Boltz-2!
If you're interested in retraining the model, currently for Boltz-1 but soon for Boltz-2, see our training instructions.
Contributing
We welcome external contributions and are eager to engage with the community. Connect with us on our Slack channel to discuss advancements, share insights, and foster collaboration around Boltz-2.
On recent NVIDIA GPUs, Boltz leverages the acceleration provided by NVIDIA cuEquivariance kernels. Boltz also runs on Tenstorrent hardware thanks to a fork by Moritz Thüning.
License
Our model and code are released under MIT License, and can be freely used for both academic and commercial purposes.
Cite
If you use this code or the models in your research, please cite the following papers:
@article{passaro2025boltz2,
author = {Passaro, Saro and Corso, Gabriele and Wohlwend, Jeremy and Reveiz, Mateo and Thaler, Stephan and Somnath, Vignesh Ram and Getz, Noah and Portnoi, Tally and Roy, Julien and Stark, Hannes and Kwabi-Addo, David and Beaini, Dominique and Jaakkola, Tommi and Barzilay, Regina},
title = {Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction},
year = {2025},
doi = {10.1101/2025.06.14.659707},
journal = {bioRxiv}
}
@article{wohlwend2024boltz1,
author = {Wohlwend, Jeremy and Corso, Gabriele and Passaro, Saro and Getz, Noah and Reveiz, Mateo and Leidal, Ken and Swiderski, Wojtek and Atkinson, Liam and Portnoi, Tally and Chinn, Itamar and Silterra, Jacob and Jaakkola, Tommi and Barzilay, Regina},
title = {Boltz-1: Democratizing Biomolecular Interaction Modeling},
year = {2024},
doi = {10.1101/2024.11.19.624167},
journal = {bioRxiv}
}
In addition if you use the automatic MSA generation, please cite:
@article{mirdita2022colabfold,
title={ColabFold: making protein folding accessible to all},
author={Mirdita, Milot and Sch{\"u}tze, Konstantin and Moriwaki, Yoshitaka and Heo, Lim and Ovchinnikov, Sergey and Steinegger, Martin},
journal={Nature methods},
year={2022},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boltz_community-2.10.8.tar.gz.
File metadata
- Download URL: boltz_community-2.10.8.tar.gz
- Upload date:
- Size: 260.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f194fb23ee12f8a1c767d07cb29afeb37c482ff47182483793690ea9cf167653
|
|
| MD5 |
31817ba3139a44a0af1ba077a1bae989
|
|
| BLAKE2b-256 |
f6bac3cc987a0f0afe59cc7f880834b315c31e0bc3e38affdfe10d9198034da0
|
Provenance
The following attestation bundles were made for boltz_community-2.10.8.tar.gz:
Publisher:
release.yml on Novel-Therapeutics/boltz-community
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
boltz_community-2.10.8.tar.gz -
Subject digest:
f194fb23ee12f8a1c767d07cb29afeb37c482ff47182483793690ea9cf167653 - Sigstore transparency entry: 1667294766
- Sigstore integration time:
-
Permalink:
Novel-Therapeutics/boltz-community@0cbe41c1a9c675d76032f41dfda1e0d391affebb -
Branch / Tag:
refs/tags/v2.10.8 - Owner: https://github.com/Novel-Therapeutics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0cbe41c1a9c675d76032f41dfda1e0d391affebb -
Trigger Event:
push
-
Statement type:
File details
Details for the file boltz_community-2.10.8-py3-none-any.whl.
File metadata
- Download URL: boltz_community-2.10.8-py3-none-any.whl
- Upload date:
- Size: 292.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a79dee94e5aad5c2a8052ebf9384cbcb73c531185755caf765595ffb711a8b7c
|
|
| MD5 |
cfac4c1ebf969a21ca464aabd011fd75
|
|
| BLAKE2b-256 |
4efe966504c4f7a2195934b154ae5fa832ee5aa12d706802d994f096c26e28bb
|
Provenance
The following attestation bundles were made for boltz_community-2.10.8-py3-none-any.whl:
Publisher:
release.yml on Novel-Therapeutics/boltz-community
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
boltz_community-2.10.8-py3-none-any.whl -
Subject digest:
a79dee94e5aad5c2a8052ebf9384cbcb73c531185755caf765595ffb711a8b7c - Sigstore transparency entry: 1667294861
- Sigstore integration time:
-
Permalink:
Novel-Therapeutics/boltz-community@0cbe41c1a9c675d76032f41dfda1e0d391affebb -
Branch / Tag:
refs/tags/v2.10.8 - Owner: https://github.com/Novel-Therapeutics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0cbe41c1a9c675d76032f41dfda1e0d391affebb -
Trigger Event:
push
-
Statement type: