Skip to main content

Reaction Database for benchmarking

Project description

SynRXN

SynRXN logo

PyPI version Release Last Commit Zenodo CI Stars

SynRXN is an open reaction benchmark repository for reproducible reaction-informatics evaluation.

SynRXN collects curated reaction datasets, canonical task folders, versioned data releases, and lightweight loading utilities for benchmarking atom-atom mapping, reaction classification, property prediction, reaction balancing, and synthesis/retrosynthesis workflows.

SynRXN Workflow

Highlights

  • Five task families: aam, classification, property, rbl, and synthesis.
  • Consistent tabular format: each dataset is a compressed CSV under Data/<task>/<name>.csv.gz.
  • Stable identifiers: most curated rows use r_id; task-specific columns store reactions, labels, targets, splits, mappings, or references.
  • Version-aware access: load data from Zenodo releases, GitHub tags, or exact Git commits.
  • Reproducible benchmarking: use published splits when present, or generate deterministic repeated k-fold splits through synrxn.split.

Installation

SynRXN requires Python 3.11 or later.

pip install synrxn

Install optional dependencies when you need the broader tooling stack:

pip install "synrxn[all]"

For development:

git clone https://github.com/TieuLongPhan/SynRXN.git
cd SynRXN
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Quick Start

from pathlib import Path
from synrxn.data import DataLoader

dl = DataLoader(
    task="classification",
    source="zenodo",
    version="1.0.0",
    cache_dir=Path("~/.cache/synrxn").expanduser(),
)

print(dl.available_names())
df = dl.load("schneider_b")
print(df.shape)
print(df.columns.tolist())

Use an exact commit for development snapshots you want to reproduce later:

from pathlib import Path
from synrxn.data import DataLoader

dl = DataLoader(
    task="property",
    source="commit",
    version="3e1612e2199e8b0e369fce3ed9aff3dda68e4c32",
    cache_dir=Path("~/.cache/synrxn").expanduser(),
    gh_enable=True,
)

df = dl.load("b97xd3")
print(df[["r_id", "ea", "dh"]].head())

Data Concept

The public data lives in Data/ and is grouped by benchmark task:

Folder Purpose Example datasets Core columns
Data/aam/ Atom-atom mapping comparison uspto_3k, golden, ecoli ground_truth, mapper outputs, rxn
Data/classification/ Reaction class, template, and enzyme classification uspto_50k_b, tpl_u, ecreact rxn, labels, optional split
Data/property/ Reaction property prediction b97xd3, rgd1, sn2 aam or rxn, target values, optional split
Data/rbl/ Reaction balancing and rebalancing mos, mnc, mbs, complex unbalanced rxn, balanced ground_truth
Data/synthesis/ Synthesis and retrosynthesis datasets uspto_mit, uspto_50k, da reactions, split/source metadata, optional reagents

Reproducible Splits

from pathlib import Path
from synrxn.data import DataLoader
from synrxn.split.repeated_kfold import RepeatedKFoldsSplitter

dl = DataLoader(
    task="property",
    source="zenodo",
    version="1.0.0",
    cache_dir=Path("~/.cache/synrxn").expanduser(),
)
df = dl.load("b97xd3")

splitter = RepeatedKFoldsSplitter(
    n_splits=5,
    n_repeats=2,
    ratio=(8, 1, 1),
    shuffle=True,
    random_state=1,
)
splitter.prepare_splits(df, stratify=None)
train_df, val_df, test_df = splitter.get_split(0, 0, as_frame=True)
print(len(train_df), len(val_df), len(test_df))

Documentation

Citation

If you use SynRXN in your research, please cite:

Tieu-Long Phan, Nhu-Ngoc Nguyen Song, and Peter F. Stadler. SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling. Scientific Data 13, 625 (2026). https://doi.org/10.1038/s41597-026-07260-w

@article{phan2026synrxn,
  title = {SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling},
  author = {Phan, Tieu-Long and Nguyen Song, Nhu-Ngoc and Stadler, Peter F.},
  journal = {Scientific Data},
  volume = {13},
  pages = {625},
  year = {2026},
  doi = {10.1038/s41597-026-07260-w},
  url = {https://www.nature.com/articles/s41597-026-07260-w}
}

License

This project is licensed under the MIT License. Dataset-specific terms are summarized in Data/LICENSE when applicable.

Acknowledgments

This project has received funding from the European Union's Horizon Europe Doctoral Network programme under the Marie Sklodowska-Curie grant agreement No. 101072930 (TACsy).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synrxn-1.0.0.tar.gz (91.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synrxn-1.0.0-py3-none-any.whl (108.1 kB view details)

Uploaded Python 3

File details

Details for the file synrxn-1.0.0.tar.gz.

File metadata

  • Download URL: synrxn-1.0.0.tar.gz
  • Upload date:
  • Size: 91.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for synrxn-1.0.0.tar.gz
Algorithm Hash digest
SHA256 12b4d78fbc1afc5ccf162982529f3443c7f9e4db755fe39be910f9775c414245
MD5 44be078c1ebee68585813457a8265221
BLAKE2b-256 44105256e2bc97f19434e03639646ebfe01ae1fd4b9ea8710356a2eae014a5b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrxn-1.0.0.tar.gz:

Publisher: publish-package.yml on TieuLongPhan/SynRXN

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file synrxn-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: synrxn-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 108.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for synrxn-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 693cea2aec7bb5c4d75cbbfcdeb0ec8884292a159b30568a5bc0a0c69c3a89c5
MD5 35bfcae68a02df30cc6bd4893666a268
BLAKE2b-256 b58c2f7196f876aaae15f20c5dfc6f1296dab7b8d83f4b82f4f5d2deefc5c5d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrxn-1.0.0-py3-none-any.whl:

Publisher: publish-package.yml on TieuLongPhan/SynRXN

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page