Package for extracting chemical reaction serialized (Google Protocol Buffers mechanism) in Open Reaction Database (ORD) schema to Resource Descriptive Framework (RDF) triples in JSON-LD and/or Turtle format.
Project description
Reaction Knowledge Graph Processor: rxn_rdf_converter
Overview
The rxn_rdf_converter is a Python package which aims to process reaction data stored in Google Protocol Buffers with the Open Reaction Database (ORD) schema into a knowledge graph representation as Resource Description Framework (RDF) triples using MDS-Onto (a domain ontology for Materials Data Science) as the semantic model.
This package facilitates the transformation of raw experimental data into structured RDF triples (Turtle or JSON-LD format), making the data semantically searchable, linkable, and machine-readable for advanced data analysis and machine learning applications in chemistry and materials science.
Authors
- Quynh D. Tran
- Holly Schreiber
- Brandon Lee
- Owen Schessler
- Laura S. Bruckman
- Roger H. French
Motivation
We built this package to streamline the data integration of reaction/synthesis data into one centralized database with formulation, manufacturing, and degradation.
The bottleneck of mapping reaction data to an ontology can be tedious and error-prone. Thus, this package hopefully will provide an automated tool to reduce the time needed to integrate data.
Installation
Prerequisites
- Python 3.11+
- The
ord-schemalibrary. - The
owlready2library for ontology handling. - The
rdfliblibrary for RDF graph generation. - RDKit for chemical identifier normalization (InChIKey, SMILES conversion).
Setup
Install the required dependencies using pip
Package Usage
The package will convert a dataset (that contains hundreds to thousands of reactions) in ORD schema in Google Protocol Buffers format into Resource Description Framework (RDF) triples in JSON-LD or Turtle format using MDS-Onto as the semantic model.
The core workflow involves initializing a DatasetProcessor to manage logging and file paths, and then iterating over its reactions using the ReactionKG class to build the individual knowledge graphs (Turtle or JSON-LD serialization format).
The package is capable of batch processing multiple datasets or only one dataset. In addition, using the ReactionKG class, a user can generate one single reaction at a time.
Limitation
This package currently only works for reaction data in ORD schema and MDS-Onto is passed as an argument. It will not work with data in other Google Protocol Buffer schemas or other ontologies.
If a user has more than 50 datasets (each contains hundreds to thousands of reactions), running the package to process multiple datasets will cause a crash since there is not enough in-memory storage. The multiple-dataset batch processing was designed to run on distributed, parallel computing infrastructure with Hadoop ecosystem.
Affiliations:
Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3-COE), Solar Durability and Lifetime Extension (SDLE) Research Center, Materials Science and Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
Python package documentation
https://rxn-rdf-converter.readthedocs.io/en/latest/
Acknowledgements:
This work was supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Numbers DE-EE0009353 and DE-EE0009347, Department of Energy (National Nuclear Security Administration) under Award Number DE-NA0004104 and Contract number B647887, and U.S. National Science Foundation Award under Award Number 2133576.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rxn_rdf_converter-0.1.3.tar.gz.
File metadata
- Download URL: rxn_rdf_converter-0.1.3.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dfff429ec738833745d114d063d5a28f9e856ffa5d9aef0d70c1bee511fe6b3
|
|
| MD5 |
89a268775ee50192eb430f17d70e7b25
|
|
| BLAKE2b-256 |
a90cabaf3abb1094dd112f21592b34aa7a0f8178afdf5406d67eaae2da9286c7
|
Provenance
The following attestation bundles were made for rxn_rdf_converter-0.1.3.tar.gz:
Publisher:
python-publish.yml on cwru-sdle/rxn-rdf-converter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rxn_rdf_converter-0.1.3.tar.gz -
Subject digest:
0dfff429ec738833745d114d063d5a28f9e856ffa5d9aef0d70c1bee511fe6b3 - Sigstore transparency entry: 1393610761
- Sigstore integration time:
-
Permalink:
cwru-sdle/rxn-rdf-converter@4f3aba05bb82b0b6a3a17cf8136d5c32ccd66f9e -
Branch / Tag:
refs/tags/v0.1.3a - Owner: https://github.com/cwru-sdle
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4f3aba05bb82b0b6a3a17cf8136d5c32ccd66f9e -
Trigger Event:
release
-
Statement type:
File details
Details for the file rxn_rdf_converter-0.1.3-py3-none-any.whl.
File metadata
- Download URL: rxn_rdf_converter-0.1.3-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31b6b893709bf5aad82ff33f4d72cdcf523f90651edf5cebcd0c554caf63198b
|
|
| MD5 |
844b622a9e232a9d630cba480c5ec3f1
|
|
| BLAKE2b-256 |
b99f1d2761c644c6a81c797837a01a6d83faa9943da77bf01fdd090e7f4decae
|
Provenance
The following attestation bundles were made for rxn_rdf_converter-0.1.3-py3-none-any.whl:
Publisher:
python-publish.yml on cwru-sdle/rxn-rdf-converter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rxn_rdf_converter-0.1.3-py3-none-any.whl -
Subject digest:
31b6b893709bf5aad82ff33f4d72cdcf523f90651edf5cebcd0c554caf63198b - Sigstore transparency entry: 1393610768
- Sigstore integration time:
-
Permalink:
cwru-sdle/rxn-rdf-converter@4f3aba05bb82b0b6a3a17cf8136d5c32ccd66f9e -
Branch / Tag:
refs/tags/v0.1.3a - Owner: https://github.com/cwru-sdle
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4f3aba05bb82b0b6a3a17cf8136d5c32ccd66f9e -
Trigger Event:
release
-
Statement type: