Skip to main content

Package for extracting chemical reaction serialized (Google Protocol Buffers mechanism) in Open Reaction Database (ORD) schema to relational database (RDB) and Resource Descriptive Framework (RDF).

Project description

Project Description for ord_rxn_converter

Introduction

ord_rxn_converter is a Python package designed to streamline the transformation of chemical reaction data from the Open Reaction Database (ORD) in Google Protocol Buffer format into structured datasets suitable for downstream machine learning and data analysis tasks. It provides modular tools for parsing, extracting, and converting complex reaction schema into interpretable tables, lists, and dictionaries that can be easily ingested by models or used in exploratory chemical data analysis.

The library is organized into specialized modules that handle different components of the reaction schema — including identifiers, inputs, conditions, setup, workups, outcomes, and notes/observations — as well as utility functions for key operations and dataset generation. The package is structured for clarity and extendibility, enabling researchers to adapt it to varying needs in computational chemistry or cheminformatics pipelines.

The codebase is written in Python 3 and supports integration into Jupyter notebooks, standalone scripts, or larger ML pipelines for tasks such as property prediction, reaction classification, or synthesis planning.

Motivation

Chemical reaction data is often stored in highly nested or semi-structured formats that are difficult to work with directly in data science workflows. The Open Reaction Database provides a valuable standardized format, but researchers and developers often require a flat, structured format with clean fields to build models or perform analysis.

ord_rxn_converter was developed to automate and standardize this transformation process. It allows users to systematically convert the complex data in ORD protobuf files into simplified Python structures (lists, dictionaries, Pandas DataFrames), reducing time spent on preprocessing and improving reproducibility in ML workflows. By modularizing the conversion process, the package promotes clarity, flexibility, and easier debugging.

The project originated as part of a broader effort to accelerate machine learning-driven synthesis planning by improving the usability of publicly available chemical data.

Limitations

  • The package currently assumes that input ORD data conforms closely to the expected schema. It may require modification or additional error handling for incomplete or non-standard records.

  • Complex reaction pathways involving multi-step synthesis or overlapping outcomes may not be fully supported in this version.

  • The current modules focus primarily on extraction rather than validation or correction of chemical information. Users are advised to preprocess or sanitize their data before applying the conversion tools if needed.

  • While the package is modular, it is not yet fully abstracted for plug-and-play use in non-ORD schemas. Adapting it to other chemical data formats (e.g., USPTO, Reaxys) would require extension.

  • The project is in active development, and interface or function-level changes may occur in future versions.

Affiliations:

Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3-COE), Solar Durability and Lifetime Extension (SDLE) Research Center, Materials Science and Engineering, Case Western Reserve University, Cleveland, OH 44106, USA

Package Usage:

The package will convert a dataset (that contains hundreds to thousands of reactions) in ORD schema in Google Protocol Buffers format into a dictionary of pandas DataFrames for each reaction portion: reaction identifiers, reaction inputs, reaction conditions, reaction setup, reaction outcomes, reaction notes and observations.

Python package documentation

https://ord-rxn-converter.readthedocs.io/en/latest/

Acknowledgements:

This work was supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Numbers DE-EE0009353 and DE-EE0009347, Department of Energy (National Nuclear Security Administration) under Award Number DE-NA0004104 and Contract number B647887, and U.S. National Science Foundation Award under Award Number 2133576.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ord_rxn_converter-0.1.4.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ord_rxn_converter-0.1.4-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file ord_rxn_converter-0.1.4.tar.gz.

File metadata

  • Download URL: ord_rxn_converter-0.1.4.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ord_rxn_converter-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d5b97674ce20bb583c667bdf6d29b08bdd0c7503499109048525fd01c7fda55b
MD5 849894c483975f3b34fdff402ed49c3f
BLAKE2b-256 bfebf6fd6ffb85b08f2d351162189504e6508a75cc6632d07da57f95a1149a48

See more details on using hashes here.

Provenance

The following attestation bundles were made for ord_rxn_converter-0.1.4.tar.gz:

Publisher: python-publish.yml on cwru-sdle/ord_rxn_converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ord_rxn_converter-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ord_rxn_converter-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 84dbf3b1abd03f9759e59da56213e7138d0c245bd1a54b2b8cdc18673b787feb
MD5 3a813bc0650ce25f97148effe06232ca
BLAKE2b-256 09e481a8dd5b04147a77118d793090091c14d4d2d394cd74b4208e5a56f69672

See more details on using hashes here.

Provenance

The following attestation bundles were made for ord_rxn_converter-0.1.4-py3-none-any.whl:

Publisher: python-publish.yml on cwru-sdle/ord_rxn_converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page