Skip to main content

Chemical rdf file fixer for exports from Reaxys, SciFinder, Deepmatter, etc.

Project description

Code style: black GitHub GitHub release (latest by date) Python

Chemistry RDF fixer / converter

Converts chemistry containing RDF files stemming from Scifinder or Reaxys. A new addition is the support for Infochem's ICsynth RDFs.
It fixes missing molecule blocks by removing corresponding entries entirely and some potential small errors (remove certain empty lines, or use uppercase for certain tags)
The resulting fixed RDF file is saved, as well as being converted to a tab separated CSV file.
Structures in CSV are in SMILES format.
Other sources e.g. MarvinSketch or ChemDraw should work with these converted files but have not been thoroughly enough tested.

Why would you need this?

Because RDF files that contain a missing structure might throw errors in certain programs or even make them crash.
Examples are MarvinSketch or MarvinView. They sometimes are able to handle missing reaction structures, sometimes not.
In Knime, the Erlwood extenstion "Chemical Reaction File Reader" won't work at all.

Requirements:

Python >= V3.8.
Windows or Linux. MacOS not tested.
Type of installation shouldn't matter (Vanilla/Conda/Mamba/venv).

Installation

Simplest: pip install (from Pypi)

pip install rdf-fixer

"Manualy"

If you downloaded/cloned the code:

pip install .

Yet another way, directly from the repository:

python -m pip install git+https://github.com/DocMinus/chem-rdf-fixer.git`

Optional: Jupyter notebook

This is only if you want to run the .ipynb file from your browser:

conda install -c anaconda jupyter

Importing the module:

from rdfmodule import rdf_fixer

Usage:

To fix a single RDF file, or a whole folder containing multiple RDF files:

rdf_fixer.fix("file or directory name")

There is an optional flag (True/False), with default being True, creating csv files as well. To skip csv creation, set flag to False.

rdf_fixer.fix("file or directory name", flag=False)

Implement e.g. via the enclosed example script or Jupyter Notebook:

convert_example.py "./filename.rdf" for single file usage (with or without quotes)
convert_example.py /directory/ for RDF files in directory including all subdirectories

Testing

The testfiles folder contains three RDF files for a quick test; where e.g. the Scifinder one contains an erroneous (i.e. missing) structre. Please note that copyright for the enclosed test data lies with the respective companies (see also License section).

Notes:

The parsing is by no means perfect, though a best effort was made. Suggestions for changes are welcome, please submit an issue or do your own fork.
Converting the current function(s) into a class has also been abandoned, there is no point really, since it doesn't have to be persistent the way it is applied here.

Update history

See the "VERSIONS.md" readme file.

License

Independent of the code or whatever license, the test files provided are not to be included for further distribution other than ones initial testing.
The copyright for the data for these files lies with the providers (Deepmatter/Infochem, ACS, Elsevier Life Sciences IP Limited) and not with the author or anyone reusing/changing this code.
For the code section: Copyright (c) 2021-2024 DocMinus, MIT License (see also LICENSE file).
If you add a shout-out to your code, I don't mind!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdf_fixer-3.0.8.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdf_fixer-3.0.8-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file rdf_fixer-3.0.8.tar.gz.

File metadata

  • Download URL: rdf_fixer-3.0.8.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for rdf_fixer-3.0.8.tar.gz
Algorithm Hash digest
SHA256 1fe742c0c6e626a2bb2cde85469b81d56e9ac469f980cd5850b37274896b969c
MD5 38fe6dbbd54fab7f4403ab924a72eccd
BLAKE2b-256 864091cc26fd785fae6e0a79e2ca82a2e94dbb4d3a79d6d067222b1b2120a949

See more details on using hashes here.

File details

Details for the file rdf_fixer-3.0.8-py3-none-any.whl.

File metadata

  • Download URL: rdf_fixer-3.0.8-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for rdf_fixer-3.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ac5839ad4e2284492c468110d14a1d84f5bd1b22a7cdf9bac885b4531660348c
MD5 f2f29a0875018bf57ca58c9037a797dd
BLAKE2b-256 1774a0b3620333bfed2c88ad3dbc96cfb93fa623e4b4a54969b1243f45dffbe6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page