Bayesian flow network framework for Chemistry
Project description
ChemBFN: Bayesian Flow Network for Chemistry
This is the repository of the PyTorch implementation of ChemBFN model.
Build State
Features
ChemBFN provides the state-of-the-art functionalities of
- SMILES or SELFIES-based de novo molecule generation
- Protein sequence de novo generation
- Template optimisation (mol2mol)
- Classifier-free guidance conditional generation (single or multi-objective optimisation)
- Context-guided conditional generation (inpaint)
- Outstanding out-of-distribution chemical space sampling
- Fast sampling via ODE solver
- Molecular property and activity prediction finetuning
- Reaction yield prediction finetuning
in an all-in-one-model style.
News
- [09/10/2025] A web app
chembfn_webuifor hosting ChemBFN models is available on PyPI. - [30/01/2025] The package
bayesianflow_for_chemis available on PyPI. - [21/01/2025] Our first paper has been accepted by JCIM.
- [17/12/2024] The second paper of out-of-distribution generation is available on arxiv.org.
- [31/07/2024] Paper is available on arxiv.org.
- [21/07/2024] Paper was submitted to arXiv.
Install
$ pip install -U bayesianflow_for_chem
Usage
You can find example scripts in 📁example folder.
Pre-trained Model
You can find pretrained models on our 🤗Hugging Face model page.
Dataset Handling
We provide a Python class CSVData to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.
- Download your dataset file (e.g., ESOL from MoleculeNet) and split the file:
>>> from bayesianflow_for_chem.tool import split_data
>>> split_data("delaney-processed.csv", method="scaffold")
- Load the split data:
>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData
>>> dataset = CSVData("delaney-processed_train.csv")
>>> dataset[0]
{'Compound ID': ['Thiophene'],
'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'],
'Minimum Degree': ['2'],
'Molecular Weight': ['84.14299999999999'],
'Number of H-Bond Donors': ['0'],
'Number of Rings': ['1'],
'Number of Rotatable Bonds': ['0'],
'Polar Surface Area': ['0.0'],
'measured log solubility in mols per litre': ['-1.33'],
'smiles': ['c1ccsc1']}
- Create a mapping function to tokenise the dataset and select values:
>>> import torch
>>> def encode(x):
... smiles = x["smiles"][0]
... value = [float(i) for i in x["measured log solubility in mols per litre"]]
... return {"token": smiles2token(smiles), "value": torch.tensor(value)}
>>> dataset.map(encode)
>>> dataset[0]
{'token': tensor([ 1, 151, 23, 151, 151, 154, 151, 23, 2]),
'value': tensor([-1.3300])}
- Wrap the dataset in torch.utils.data.DataLoader:
>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)
Cite This Work
@article{2025chembfn,
title={Bayesian Flow Network Framework for Chemistry Tasks},
author={Tao, Nianze and Abe, Minori},
journal={Journal of Chemical Information and Modeling},
volume={65},
number={3},
pages={1178-1187},
year={2025},
doi={10.1021/acs.jcim.4c01792},
}
Out-of-distribution generation:
@misc{2024chembfn_ood,
title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces},
author={Nianze Tao},
year={2024},
eprint={2412.11439},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.11439},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bayesianflow_for_chem-2.3.1.tar.gz.
File metadata
- Download URL: bayesianflow_for_chem-2.3.1.tar.gz
- Upload date:
- Size: 49.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a799068eb58434ce093c83c43b0e5e972ae0dd6641c88abbe9fcfa9dc54509e
|
|
| MD5 |
07ea6aa0bf145c5c0a4c06f6c1172356
|
|
| BLAKE2b-256 |
c29a32ad2456bc22b6fc6b98570b2fbae8ad2b6cae869f5db1e922a7b7fa08eb
|
File details
Details for the file bayesianflow_for_chem-2.3.1-py3-none-any.whl.
File metadata
- Download URL: bayesianflow_for_chem-2.3.1-py3-none-any.whl
- Upload date:
- Size: 48.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12abf85dc55040b6f2597ba8eea4401d1ff7df46349bb97066e5cbeab8863222
|
|
| MD5 |
9ef1e134ac47e93a5d1bb8e67590faff
|
|
| BLAKE2b-256 |
3b5e7924fd5b98c760d72e229737d696b6caddd67605d20722ad37e75f98eb38
|