Skip to main content

A rich molecule dataset for Blood-Brain Barrier (BBB) permeability.

Project description

About B3DB

In this repo, we present a large benchmark dataset, Blood-Brain Barrier Database (B3DB), compiled from 50 published resources (as summarized at raw_data/raw_data_summary.tsv) and categorized based on the consistency between different experimental references/measurements. This dataset was published in Scientific Data and this repository is occasionally uploaded with new experimental data. Scientists who would like to contribute data should contact the database's maintainers (e.g., by creating a new Issue in this database).

A subset of the molecules in B3DB has numerical logBB values (1058 compounds), while the whole dataset has categorical (BBB+ or BBB-) BBB permeability labels (7807 compounds prior to v1.0.0 and 7982 compounds after). Some physicochemical properties of the molecules are also provided.

Citation

Please use the following citations in any publication using our B3DB dataset:

@article{Meng_A_curated_diverse_2021,
author = {Meng, Fanwang and Xi, Yang and Huang, Jinfeng and Ayers, Paul W.},
doi = {10.1038/s41597-021-01069-5},
journal = {Scientific Data},
number = {289},
title = {A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors},
volume = {8},
year = {2021},
url = {https://www.nature.com/articles/s41597-021-01069-5},
publisher = {Springer Nature}
}

@article{Meng_B3clf_2025,
author = {Meng, Fanwang and Chen, Jitian and Collins-Ramirez, Juan Samuel and Ayers, Paul W.},
doi = {10.26434/chemrxiv-2025-xschc},
journal = {ChemRxiv},
number = {to be updated pending peer-reviewed publication},
title = {B3clf: A Resampling-Integrated Machine Learning Framework to Predict Blood-Brain Barrier Permeability},
volume = {to be updated pending peer-reviewed publication},
year = {to be updated pending peer-reviewed publication},
url = {to be updated pending peer-reviewed publication},
publisher = {to be updated pending peer-reviewed publication}
}

Features of B3DB

  1. The largest dataset with numerical and categorical values for Blood-Brain Barrier small molecules (to the best of our knowledge, as of February 25, 2021).

  2. Inclusion of stereochemistry information with isomeric SMILES with chiral specifications if available. Otherwise, canonical SMILES are used.

  3. Characterization of uncertainty of experimental measurements by grouping the collected molecular data records.

  4. Extended datasets for numerical and categorical data with precomputed physicochemical properties using mordred.

Usage

Via PyPI

The B3DB dataset is avaliable at PyPI. One can install it using pip:

pip install qc-B3DB

Then load the data (dictionary of pandas dataframe) with the following code snippet:

from B3DB import B3DB_DATA_DICT

# access the data via dictionary keys
# 'B3DB_regression'
# 'B3DB_regression_extended'
# 'B3DB_classification'
# 'B3DB_classification_extended'
# "B3DB_classification_external"
df_b3db_reg = B3DB_DATA_DICT["B3DB_regression"]
df_b3db_reg.head()
#    NO.                                      compound_name  ... group comments
# 0    1                                         moxalactam  ...     A      NaN
# 1    2                                      schembl614298  ...     A      NaN
# 2    3                             morphine-6-glucuronide  ...     A      NaN
# 3    4  2-[4-(5-bromo-3-methylpyridin-2-yl)butylamino]...  ...     A      NaN
# 4    5                                                NaN  ...     A      NaN

# [5 rows x 10 columns]

Manually Download the Data

There are two types of dataset in B3DB, regression data and classification data and they can be loaded simply using pandas. For example

import pandas as pd

# load regression dataset
regression_data = pd.read_csv("B3DB/B3DB_regression.tsv",
                              sep="\t")

# load classification dataset
classification_data = pd.read_csv("B3DB/B3DB_classification.tsv",
                                  sep="\t")

# load extended regression dataset
regression_data_extended = pd.read_csv("B3DB/B3DB_regression_extended.tsv.gz",
                                       sep="\t", compression="gzip")

# load extended classification dataset
classification_data_extended = pd.read_csv("B3DB/B3DB_classification_extended.tsv.gz",
                                           sep="\t", compression="gzip")

Examples in Jupyter Notebooks

We also have three examples to show how to use our dataset, numerical_data_analysis.ipynb, PCA_projection_fingerprint.ipynb and PCA_projection_descriptors.ipynb. PCA_projection_descriptors.ipynb uses precomputed chemical descriptors for visualization of chemical space of B3DB, and can be used directly using MyBinder, Binder. Due to the difficulty of installing RDKit in MyBinder, only PCA_projection_descriptors. ipynb is set up in MyBinder.

Data Curation

Detailed procedures for data curation can be found in data curation section in this repository.

The materials and data under this repo are distributed under the CC0 Licence.

ChangeLog

  • 2025Aug16, the B3DB dataset is avaliable via PyPI.
  • 2025Aug16, we have added a new set of 171 BBB+ and 4 BBB- compounds to the dataset since version 1.1.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qc_b3db-1.1.1.tar.gz (78.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qc_b3db-1.1.1-py3-none-any.whl (78.3 MB view details)

Uploaded Python 3

File details

Details for the file qc_b3db-1.1.1.tar.gz.

File metadata

  • Download URL: qc_b3db-1.1.1.tar.gz
  • Upload date:
  • Size: 78.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qc_b3db-1.1.1.tar.gz
Algorithm Hash digest
SHA256 008f1f10864ab9586e0f1269d328c3ec2ffc4076a6000ea8c769b04fe81adb98
MD5 0791e83a574efd3a85e8d3a78134a770
BLAKE2b-256 5efdfae19da74225fcf0cb1e932514f3f46a18750a5bb0adecacef542eb7945b

See more details on using hashes here.

Provenance

The following attestation bundles were made for qc_b3db-1.1.1.tar.gz:

Publisher: pypi_release.yaml on theochem/B3DB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qc_b3db-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: qc_b3db-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 78.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qc_b3db-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c929f601951e4b0ed56326c94d2cdc2f6b0c24711fbc1e0d7f67e88e330fc959
MD5 ec219a55075c418cdf5e8e197446e20e
BLAKE2b-256 886ae18406e3c154c7eb8d9e5b5ffbc21832427612f0d0f5182027ecf128767a

See more details on using hashes here.

Provenance

The following attestation bundles were made for qc_b3db-1.1.1-py3-none-any.whl:

Publisher: pypi_release.yaml on theochem/B3DB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page