Skip to main content

Chemical rdf file fixer for exports from Reaxys, SciFinder, Deepmatter, etc.

Project description

Code style: black GitHub GitHub release (latest by date) Python

Chemistry RDF fixer / converter

Converts chemistry containing RDF files stemming from Scifinder or Reaxys. A new addition is the support for Infochem's ICsynth RDFs.
It fixes missing molecule blocks by removing corresponding entries entirely and some potential small errors (remove certain empty lines, or use uppercase for certain tags)
The resulting fixed RDF file is saved, as well as being converted to a tab separated CSV file.
Structures in CSV are in SMILES format.
Other sources e.g. MarvinSketch or ChemDraw should work with these converted files but have not been thoroughly enough tested.

Why would you need this?

Because RDF files that contain a missing structure might throw errors in certain programs or even make them crash.
Examples are MarvinSketch or MarvinView. They sometimes are able to handle missing reaction structures, sometimes not.
In Knime, the Erlwood extenstion "Chemical Reaction File Reader" won't work at all (not tested in 2026 if that was fixed or not).

Requirements:

Python >= V3.8.
Windows or Linux. MacOS not tested.
Type of installation shouldn't matter (Vanilla/Conda/Mamba/venv).

Installation

Option 1: Install from PyPI (simplest)

Install to your existing environment:

pip install rdf-fixer

Option 2: Install from source (for development)

If you downloaded/cloned the code for development:

# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install the package in development mode
pip install -e .

Or simply:

pip install .

Optional: Jupyter notebook

This is only if you want to run the .ipynb file from your browser:

pip install notebook

Note: The setup.py is used for package installation (PyPI releases or local pip install .). For setting up a development environment with dependencies, use requirements.txt.

Importing the module:

from rdfmodule import rdf_fixer

Usage:

To fix a single RDF file, or a whole folder containing multiple RDF files:

rdf_fixer.fix("file or directory name")

There is an optional flag (True/False), with default being True, creating csv files as well. To skip csv creation, set flag to False.

rdf_fixer.fix("file or directory name", flag=False)

Implement e.g. via the enclosed example script or Jupyter Notebook:

convert_example.py "./filename.rdf" for single file usage (with or without quotes)
convert_example.py /directory/ for RDF files in directory including all subdirectories

Testing

The testfiles folder contains three RDF files for a quick test; where e.g. the Scifinder one contains an erroneous (i.e. missing) structre. Please note that copyright for the enclosed test data lies with the respective companies (see also License section).

Notes:

The parsing is by no means perfect, though a best effort was made. Suggestions for changes are welcome, please submit an issue or do your own fork.
Converting the current function(s) into a class has also been abandoned, there is no point really, since it doesn't have to be persistent the way it is applied here.

Update history

See the "VERSIONS.md" readme file.

License

Independent of the code or whatever license, the test files provided are not to be included for further distribution other than ones initial testing.
The copyright for the data for these files lies with the providers (Deepmatter/Infochem, ACS, Elsevier Life Sciences IP Limited) and not with the author or anyone reusing/changing this code.
For the code section: Copyright (c) 2021-2024 DocMinus, MIT License (see also LICENSE file).
If you add a shout-out to your code, I don't mind!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdf_fixer-3.2.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdf_fixer-3.2.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file rdf_fixer-3.2.0.tar.gz.

File metadata

  • Download URL: rdf_fixer-3.2.0.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for rdf_fixer-3.2.0.tar.gz
Algorithm Hash digest
SHA256 5faf7d8463e2ad1129d8e2ce8aa742eb690f9d0c36bab688fd5764d0d1c2a4fe
MD5 be3482480bc9519c11fd7775949f592f
BLAKE2b-256 58a3e40e98c81e7e93439df73b3f21a70297e762b2d1fe22e3d22b34404ed505

See more details on using hashes here.

File details

Details for the file rdf_fixer-3.2.0-py3-none-any.whl.

File metadata

  • Download URL: rdf_fixer-3.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for rdf_fixer-3.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2aa6400143f4e89718652c1e5d2618a9f5ffc3e91760bfbd86e0504ffeebd56
MD5 8b32d401de50a4f120cebf078f09bd56
BLAKE2b-256 6e18366054fe3275371c05069bf45598781a76866fafde2132a764408496dae3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page