Skip to main content

Bayesian flow network framework for Chemistry

Project description

ChemBFN: Bayesian Flow Network for Chemistry

DOI DOI arxiv

This is the repository of the PyTorch implementation of ChemBFN model.

Build State

PyPI CI document

Features

ChemBFN provides the state-of-the-art functionalities of

  • SMILES or SELFIES-based de novo molecule generation
  • Protein sequence de novo generation
  • Template optimisation (mol2mol)
  • Classifier-free guidance conditional generation (single or multi-objective optimisation)
  • Context-guided conditional generation (inpaint)
  • Outstanding out-of-distribution chemical space sampling
  • Fast sampling via ODE solver
  • Molecular property and activity prediction finetuning
  • Reaction yield prediction finetuning

in an all-in-one-model style.

News

  • [26/12/2025] We were invited to submit a short report about ChemBFN for CICSJ Bulletin.
  • [09/10/2025] A web app chembfn_webui for hosting ChemBFN models is available on PyPI.
  • [30/01/2025] The package bayesianflow_for_chem is available on PyPI.
  • [21/01/2025] Our first paper has been accepted by JCIM.
  • [17/12/2024] The second paper of out-of-distribution generation is available on arxiv.org.
  • [31/07/2024] Paper is available on arxiv.org.
  • [21/07/2024] Paper was submitted to arXiv.

Install

$ pip install -U bayesianflow_for_chem

Usage

You can find example scripts in 📁example folder.

Pre-trained Model

You can find pretrained models (linked to pretraining datasets) on our 🤗Hugging Face model page.

Dataset Handling

We provide a Python class CSVData to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.

  1. Download your dataset file (e.g., ESOL from MoleculeNet) and split the file:
>>> from bayesianflow_for_chem.tool import split_data

>>> split_data("delaney-processed.csv", method="scaffold")
  1. Load the split data:
>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData

>>> dataset = CSVData("delaney-processed_train.csv")
>>> dataset[0]
{'Compound ID': ['Thiophene'], 
'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], 
'Minimum Degree': ['2'], 
'Molecular Weight': ['84.14299999999999'], 
'Number of H-Bond Donors': ['0'], 
'Number of Rings': ['1'], 
'Number of Rotatable Bonds': ['0'], 
'Polar Surface Area': ['0.0'], 
'measured log solubility in mols per litre': ['-1.33'], 
'smiles': ['c1ccsc1']}
  1. Create a mapping function to tokenise the dataset and select values:
>>> import torch

>>> def encode(x):
...   smiles = x["smiles"][0]
...   value = [float(i) for i in x["measured log solubility in mols per litre"]]
...   return {"token": smiles2token(smiles), "value": torch.tensor(value)}

>>> dataset.map(encode)
>>> dataset[0]
{'token': tensor([  1, 151,  23, 151, 151, 154, 151,  23,   2]), 
'value': tensor([-1.3300])}
  1. Wrap the dataset in torch.utils.data.DataLoader:
>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)

Cite This Work

@article{2025chembfn,
    title={Bayesian Flow Network Framework for Chemistry Tasks},
    author={Tao, Nianze and Abe, Minori},
    journal={Journal of Chemical Information and Modeling},
    volume={65},
    number={3},
    pages={1178-1187},
    year={2025},
    doi={10.1021/acs.jcim.4c01792},
}
@article{2025chembfn_report,
    title={Molecular Structure Design via Bayesian Flow Network},
    author={Tao, Nianze and Nagai, Touma and Abe, Minori},
    journal={CICSJ Bulletin},
    volume={43},
    number={1},
    pages={10-14},
    year={2025},
    doi={10.11546/cicsj.43.10},
}

Out-of-distribution generation and fast sampling:

@misc{2024chembfn_ood,
    title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, 
    author={Nianze Tao},
    year={2024},
    eprint={2412.11439},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2412.11439}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesianflow_for_chem-2.4.4.tar.gz (58.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bayesianflow_for_chem-2.4.4-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file bayesianflow_for_chem-2.4.4.tar.gz.

File metadata

  • Download URL: bayesianflow_for_chem-2.4.4.tar.gz
  • Upload date:
  • Size: 58.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for bayesianflow_for_chem-2.4.4.tar.gz
Algorithm Hash digest
SHA256 208946db006d020cf3f8ab7abf560ca9b54843e10d70e74d4825776494eda496
MD5 ae2b00d1c2bf3e6a6fd78889e45b99c1
BLAKE2b-256 29b71fa0412336bee2a32850e6fd8b8c5209de16159a6ba2d039d84ecb4352a4

See more details on using hashes here.

File details

Details for the file bayesianflow_for_chem-2.4.4-py3-none-any.whl.

File metadata

File hashes

Hashes for bayesianflow_for_chem-2.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c660496fc3a3ac639851b58a8244742c7eb8810f136bfc87fc7dee26a5c066d2
MD5 1fbbcf64846ce7194458c1eb9c101f4e
BLAKE2b-256 1b8d1c9a4bb7507b337121ac1b3f6253d2f2deade8ec4f803b0ec3c112ae9f98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page