Skip to main content

Package for extracting chemical reaction serialized (Google Protocol Buffers mechanism) in Open Reaction Database (ORD) schema to Resource Descriptive Framework (RDF) triples in JSON-LD and/or Turtle format.

Project description

Reaction Knowledge Graph Processor: rxn_rdf_converter

Overview

The rxn_rdf_converter is a Python package which aims to process reaction data stored in Google Protocol Buffers with the Open Reaction Database (ORD) schema into a knowledge graph representation as Resource Description Framework (RDF) triples using MDS-Onto (a domain ontology for Materials Data Science) as the semantic model.

This package facilitates the transformation of raw experimental data into structured RDF triples (Turtle or JSON-LD format), making the data semantically searchable, linkable, and machine-readable for advanced data analysis and machine learning applications in chemistry and materials science.


Authors

  • Quynh D. Tran
  • Holly Schreiber
  • Brandon Lee
  • Owen Schessler
  • Laura S. Bruckman
  • Roger H. French

Motivation

We built this package to streamline the data integration of reaction/synthesis data into one centralized database with formulation, manufacturing, and degradation.

The bottleneck of mapping reaction data to an ontology can be tedious and error-prone. Thus, this package hopefully will provide an automated tool to reduce the time needed to integrate data.


Installation

Prerequisites

  • Python 3.11+
  • The ord-schema library.
  • The owlready2 library for ontology handling.
  • The rdflib library for RDF graph generation.
  • RDKit for chemical identifier normalization (InChIKey, SMILES conversion).

Setup

Install the required dependencies using pip


Package Usage

The package will convert a dataset (that contains hundreds to thousands of reactions) in ORD schema in Google Protocol Buffers format into Resource Description Framework (RDF) triples in JSON-LD or Turtle format using MDS-Onto as the semantic model.

The core workflow involves initializing a DatasetProcessor to manage logging and file paths, and then iterating over its reactions using the ReactionKG class to build the individual knowledge graphs (Turtle or JSON-LD serialization format).

The package is capable of batch processing multiple datasets or only one dataset. In addition, using the ReactionKG class, a user can generate one single reaction at a time.


Limitation

This package currently only works for reaction data in ORD schema and MDS-Onto is passed as an argument. It will not work with data in other Google Protocol Buffer schemas or other ontologies.

If a user has more than 50 datasets (each contains hundreds to thousands of reactions), running the package to process multiple datasets will cause a crash since there is not enough in-memory storage. The multiple-dataset batch processing was designed to run on distributed, parallel computing infrastructure with Hadoop ecosystem.


Affiliations:

Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3-COE), Solar Durability and Lifetime Extension (SDLE) Research Center, Materials Science and Engineering, Case Western Reserve University, Cleveland, OH 44106, USA


Python package documentation

https://rxn-rdf-converter.readthedocs.io/en/latest/


Acknowledgements:

This work was supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Numbers DE-EE0009353 and DE-EE0009347, Department of Energy (National Nuclear Security Administration) under Award Number DE-NA0004104 and Contract number B647887, and U.S. National Science Foundation Award under Award Number 2133576.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxn_rdf_converter-0.1.3.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxn_rdf_converter-0.1.3-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file rxn_rdf_converter-0.1.3.tar.gz.

File metadata

  • Download URL: rxn_rdf_converter-0.1.3.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rxn_rdf_converter-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0dfff429ec738833745d114d063d5a28f9e856ffa5d9aef0d70c1bee511fe6b3
MD5 89a268775ee50192eb430f17d70e7b25
BLAKE2b-256 a90cabaf3abb1094dd112f21592b34aa7a0f8178afdf5406d67eaae2da9286c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for rxn_rdf_converter-0.1.3.tar.gz:

Publisher: python-publish.yml on cwru-sdle/rxn-rdf-converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rxn_rdf_converter-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for rxn_rdf_converter-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 31b6b893709bf5aad82ff33f4d72cdcf523f90651edf5cebcd0c554caf63198b
MD5 844b622a9e232a9d630cba480c5ec3f1
BLAKE2b-256 b99f1d2761c644c6a81c797837a01a6d83faa9943da77bf01fdd090e7f4decae

See more details on using hashes here.

Provenance

The following attestation bundles were made for rxn_rdf_converter-0.1.3-py3-none-any.whl:

Publisher: python-publish.yml on cwru-sdle/rxn-rdf-converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page