Chemical rdf file fixer for exports from Reaxys, SciFinder, Deepmatter, etc.
Project description
Chemistry RDF fixer / converter
Converts chemistry containing RDF files stemming from Scifinder or Reaxys. A new addition is the support for Infochem's ICsynth RDFs.
It fixes missing molecule blocks by removing corresponding entries entirely and some potential small errors (remove certain empty lines, or use uppercase for certain tags)
The resulting fixed RDF file is saved, as well as being converted to a tab separated CSV file.
Structures in CSV are in SMILES format.
Other sources e.g. MarvinSketch or ChemDraw should work with these converted files but have not been thoroughly enough tested.
Why would you need this?
Because RDF files that contain a missing structure might throw errors in certain programs or even make them crash.
Examples are MarvinSketch or MarvinView. They sometimes are able to handle missing reaction structures, sometimes not.
In Knime, the Erlwood extenstion "Chemical Reaction File Reader" won't work at all (not tested in 2026 if that was fixed or not).
Requirements:
Python >= V3.8.
Windows or Linux. MacOS not tested.
Type of installation shouldn't matter (Vanilla/Conda/Mamba/venv).
Installation
Option 1: Install from PyPI (simplest)
Install to your existing environment:
pip install rdf-fixer
Option 2: Install from source (for development)
If you downloaded/cloned the code for development:
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install the package in development mode
pip install -e .
Or simply:
pip install .
Optional: Jupyter notebook
This is only if you want to run the .ipynb file from your browser:
pip install notebook
Note: The setup.py is used for package installation (PyPI releases or local pip install .). For setting up a development environment with dependencies, use requirements.txt.
Importing the module:
from rdfmodule import rdf_fixer
Usage:
To fix a single RDF file, or a whole folder containing multiple RDF files:
rdf_fixer.fix("file or directory name")
There is an optional flag (True/False), with default being True, creating csv files as well. To skip csv creation, set flag to False.
rdf_fixer.fix("file or directory name", flag=False)
Implement e.g. via the enclosed example script or Jupyter Notebook:
convert_example.py "./filename.rdf" for single file usage (with or without quotes)
convert_example.py /directory/ for RDF files in directory including all subdirectories
Testing
The testfiles folder contains three RDF files for a quick test; where e.g. the Scifinder one contains an erroneous (i.e. missing) structre.
Please note that copyright for the enclosed test data lies with the respective companies (see also License section).
Notes:
The parsing is by no means perfect, though a best effort was made. Suggestions for changes are welcome, please submit an issue or do your own fork.
Converting the current function(s) into a class has also been abandoned, there is no point really, since it doesn't have to be persistent the way it is applied here.
Update history
See the "VERSIONS.md" readme file.
License
Independent of the code or whatever license, the test files provided are not to be included for further distribution other than ones initial testing.
The copyright for the data for these files lies with the providers (Deepmatter/Infochem, ACS, Elsevier Life Sciences IP Limited) and not with the author or anyone reusing/changing this code.
For the code section: Copyright (c) 2021-2024 DocMinus, MIT License (see also LICENSE file).
If you add a shout-out to your code, I don't mind!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rdf_fixer-3.2.0.tar.gz.
File metadata
- Download URL: rdf_fixer-3.2.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5faf7d8463e2ad1129d8e2ce8aa742eb690f9d0c36bab688fd5764d0d1c2a4fe
|
|
| MD5 |
be3482480bc9519c11fd7775949f592f
|
|
| BLAKE2b-256 |
58a3e40e98c81e7e93439df73b3f21a70297e762b2d1fe22e3d22b34404ed505
|
File details
Details for the file rdf_fixer-3.2.0-py3-none-any.whl.
File metadata
- Download URL: rdf_fixer-3.2.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2aa6400143f4e89718652c1e5d2618a9f5ffc3e91760bfbd86e0504ffeebd56
|
|
| MD5 |
8b32d401de50a4f120cebf078f09bd56
|
|
| BLAKE2b-256 |
6e18366054fe3275371c05069bf45598781a76866fafde2132a764408496dae3
|