Skip to main content

Chemical rdf file fixer for exports from Reaxys, SciFinder, Deepmatter, etc.

Project description

Code style: black GitHub GitHub release (latest by date) Python

Chemistry RDF fixer / converter

Converts chemistry containing RDF files stemming from Scifinder or Reaxys. A new addition is the support for Infochem's ICsynth RDFs.
It fixes missing molecule blocks by removing corresponding entries entirely and some potential small errors (remove certain empty lines, or use uppercase for certain tags)
The resulting fixed RDF file is saved, as well as being converted to a tab separated CSV file.
Structures in CSV are in SMILES format.
Other sources e.g. MarvinSketch or ChemDraw should work with these converted files but have not been thoroughly enough tested.

Why would you need this?

Because RDF files that contain a missing structure might throw errors in certain programs or even make them crash.
Examples are MarvinSketch or MarvinView. They sometimes are able to handle missing reaction structures, sometimes not.
In Knime, the Erlwood extenstion "Chemical Reaction File Reader" won't work at all.

Requirements:

Python >= V3.8.
Windows or Linux. MacOS not tested.
Type of installation shouldn't matter (Vanilla/Conda/Mamba/venv).

Installation

Simplest: pip install (from Pypi)

pip install rdf-fixer

"Manualy"

If you downloaded/cloned the code:

pip install .

Yet another way, directly from the repository:

python -m pip install git+https://github.com/DocMinus/chem-rdf-fixer.git`

Optional: Jupyter notebook

This is only if you want to run the .ipynb file from your browser:

conda install -c anaconda jupyter

Importing the module:

from rdfmodule import rdf_fixer

Usage:

To fix a single RDF file, or a whole folder containing multiple RDF files:

rdf_fixer.fix("file or directory name")

There is an optional flag (True/False), with default being True, creating csv files as well. To skip csv creation, set flag to False.

rdf_fixer.fix("file or directory name", flag=False)

Implement e.g. via the enclosed example script or Jupyter Notebook:

convert_example.py "./filename.rdf" for single file usage (with or without quotes)
convert_example.py /directory/ for RDF files in directory including all subdirectories

Testing

The testfiles folder contains three RDF files for a quick test; where e.g. the Scifinder one contains an erroneous (i.e. missing) structre. Please note that copyright for the enclosed test data lies with the respective companies (see also License section).

Notes:

The parsing is by no means perfect, though a best effort was made. Suggestions for changes are welcome, please submit an issue or do your own fork.
Converting the current function(s) into a class has also been abandoned, there is no point really, since it doesn't have to be persistent the way it is applied here.

Update history

See the "VERSIONS.md" readme file.

License

Independent of the code or whatever license, the test files provided are not to be included for further distribution other than ones initial testing.
The copyright for the data for these files lies with the providers (Deepmatter/Infochem, ACS, Elsevier Life Sciences IP Limited) and not with the author or anyone reusing/changing this code.
For the code section: Copyright (c) 2021-2024 DocMinus, MIT License (see also LICENSE file).
If you add a shout-out to your code, I don't mind!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdf_fixer-3.0.9.tar.gz (9.8 kB view hashes)

Uploaded Source

Built Distribution

rdf_fixer-3.0.9-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page