Reaction Database for benchmarking
Project description
SynRXN
SynRXN is an open reaction benchmark repository for reproducible reaction-informatics evaluation.
SynRXN collects curated reaction datasets, canonical task folders, versioned data releases, and lightweight loading utilities for benchmarking atom-atom mapping, reaction classification, property prediction, reaction balancing, and synthesis/retrosynthesis workflows.
Highlights
- Five task families:
aam,classification,property,rbl, andsynthesis. - Consistent tabular format: each dataset is a compressed CSV under
Data/<task>/<name>.csv.gz. - Stable identifiers: most curated rows use
r_id; task-specific columns store reactions, labels, targets, splits, mappings, or references. - Version-aware access: load data from Zenodo releases, GitHub tags, or exact Git commits.
- Reproducible benchmarking: use published splits when present, or generate deterministic repeated k-fold splits through
synrxn.split.
Installation
SynRXN requires Python 3.11 or later.
pip install synrxn
Install optional dependencies when you need the broader tooling stack:
pip install "synrxn[all]"
For development:
git clone https://github.com/TieuLongPhan/SynRXN.git
cd SynRXN
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Quick Start
from pathlib import Path
from synrxn.data import DataLoader
dl = DataLoader(
task="classification",
source="zenodo",
version="1.0.0",
cache_dir=Path("~/.cache/synrxn").expanduser(),
)
print(dl.available_names())
df = dl.load("schneider_b")
print(df.shape)
print(df.columns.tolist())
Use an exact commit for development snapshots you want to reproduce later:
from pathlib import Path
from synrxn.data import DataLoader
dl = DataLoader(
task="property",
source="commit",
version="3e1612e2199e8b0e369fce3ed9aff3dda68e4c32",
cache_dir=Path("~/.cache/synrxn").expanduser(),
gh_enable=True,
)
df = dl.load("b97xd3")
print(df[["r_id", "ea", "dh"]].head())
Data Concept
The public data lives in Data/ and is grouped by benchmark task:
| Folder | Purpose | Example datasets | Core columns |
|---|---|---|---|
Data/aam/ |
Atom-atom mapping comparison | uspto_3k, golden, ecoli |
ground_truth, mapper outputs, rxn |
Data/classification/ |
Reaction class, template, and enzyme classification | uspto_50k_b, tpl_u, ecreact |
rxn, labels, optional split |
Data/property/ |
Reaction property prediction | b97xd3, rgd1, sn2 |
aam or rxn, target values, optional split |
Data/rbl/ |
Reaction balancing and rebalancing | mos, mnc, mbs, complex |
unbalanced rxn, balanced ground_truth |
Data/synthesis/ |
Synthesis and retrosynthesis datasets | uspto_mit, uspto_50k, da |
reactions, split/source metadata, optional reagents |
Reproducible Splits
from pathlib import Path
from synrxn.data import DataLoader
from synrxn.split.repeated_kfold import RepeatedKFoldsSplitter
dl = DataLoader(
task="property",
source="zenodo",
version="1.0.0",
cache_dir=Path("~/.cache/synrxn").expanduser(),
)
df = dl.load("b97xd3")
splitter = RepeatedKFoldsSplitter(
n_splits=5,
n_repeats=2,
ratio=(8, 1, 1),
shuffle=True,
random_state=1,
)
splitter.prepare_splits(df, stratify=None)
train_df, val_df, test_df = splitter.get_split(0, 0, as_frame=True)
print(len(train_df), len(val_df), len(test_df))
Documentation
- Documentation: https://synrxn.readthedocs.io/en/latest/
- Data release: https://doi.org/10.5281/zenodo.17297258
- Source code: https://github.com/TieuLongPhan/SynRXN
- Issues: https://github.com/TieuLongPhan/SynRXN/issues
Citation
If you use SynRXN in your research, please cite:
Tieu-Long Phan, Nhu-Ngoc Nguyen Song, and Peter F. Stadler. SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling. Scientific Data 13, 625 (2026). https://doi.org/10.1038/s41597-026-07260-w
@article{phan2026synrxn,
title = {SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling},
author = {Phan, Tieu-Long and Nguyen Song, Nhu-Ngoc and Stadler, Peter F.},
journal = {Scientific Data},
volume = {13},
pages = {625},
year = {2026},
doi = {10.1038/s41597-026-07260-w},
url = {https://www.nature.com/articles/s41597-026-07260-w}
}
License
This project is licensed under the MIT License. Dataset-specific terms are summarized in Data/LICENSE when applicable.
Acknowledgments
This project has received funding from the European Union's Horizon Europe Doctoral Network programme under the Marie Sklodowska-Curie grant agreement No. 101072930 (TACsy).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synrxn-1.0.0.tar.gz.
File metadata
- Download URL: synrxn-1.0.0.tar.gz
- Upload date:
- Size: 91.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12b4d78fbc1afc5ccf162982529f3443c7f9e4db755fe39be910f9775c414245
|
|
| MD5 |
44be078c1ebee68585813457a8265221
|
|
| BLAKE2b-256 |
44105256e2bc97f19434e03639646ebfe01ae1fd4b9ea8710356a2eae014a5b2
|
Provenance
The following attestation bundles were made for synrxn-1.0.0.tar.gz:
Publisher:
publish-package.yml on TieuLongPhan/SynRXN
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synrxn-1.0.0.tar.gz -
Subject digest:
12b4d78fbc1afc5ccf162982529f3443c7f9e4db755fe39be910f9775c414245 - Sigstore transparency entry: 1539635884
- Sigstore integration time:
-
Permalink:
TieuLongPhan/SynRXN@c74187e840c53aa86d3be9218ab70116f8649f7a -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/TieuLongPhan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-package.yml@c74187e840c53aa86d3be9218ab70116f8649f7a -
Trigger Event:
release
-
Statement type:
File details
Details for the file synrxn-1.0.0-py3-none-any.whl.
File metadata
- Download URL: synrxn-1.0.0-py3-none-any.whl
- Upload date:
- Size: 108.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
693cea2aec7bb5c4d75cbbfcdeb0ec8884292a159b30568a5bc0a0c69c3a89c5
|
|
| MD5 |
35bfcae68a02df30cc6bd4893666a268
|
|
| BLAKE2b-256 |
b58c2f7196f876aaae15f20c5dfc6f1296dab7b8d83f4b82f4f5d2deefc5c5d6
|
Provenance
The following attestation bundles were made for synrxn-1.0.0-py3-none-any.whl:
Publisher:
publish-package.yml on TieuLongPhan/SynRXN
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synrxn-1.0.0-py3-none-any.whl -
Subject digest:
693cea2aec7bb5c4d75cbbfcdeb0ec8884292a159b30568a5bc0a0c69c3a89c5 - Sigstore transparency entry: 1539636077
- Sigstore integration time:
-
Permalink:
TieuLongPhan/SynRXN@c74187e840c53aa86d3be9218ab70116f8649f7a -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/TieuLongPhan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-package.yml@c74187e840c53aa86d3be9218ab70116f8649f7a -
Trigger Event:
release
-
Statement type: