Skip to main content

Chemistry-related utilities

Project description

RXN chemistry utilities package

Actions tests

This repository contains various chemistry-related Python utilities used in the RXN universe. For general utilities not related to chemistry, see our other repository rxn-utilities.

Links:

System Requirements

This package is supported on all operating systems. It has been tested on the following systems:

  • macOS: Big Sur (11.1)
  • Linux: Ubuntu 18.04.4

A Python version of 3.7 or greater is recommended.

Installation guide

The package can be installed from Pypi:

pip install rxn-chem-utils

For local development, the package can be installed with:

pip install -e .[dev]

The RDKit dependency is not installed automatically and can be installed via Conda or Pypi:

# Install RDKit from Conda
conda install -c conda-forge rdkit

# Install RDKit from Pypi
pip install rdkit

Package highlights

Convert between compound representations

There are functions to convert between SMILES, RDKit.Mol, MDL, InChI, etc. All of them work in a similar way:

>>> from rxn.chemutils.conversion import smiles_to_mol, mol_to_smiles
>>> mol = smiles_to_mol("CO(C)")
>>> mol_to_smiles(mol)
'COC'

The functions raise exceptions when failing, and allow to be used without sanitization.

>>> mol = smiles_to_mol("CFC")
Traceback (most recent call last):
[...]
rxn.chemutils.exceptions.InvalidSmiles: "CFC" is not a valid SMILES string
>>> mol = smiles_to_mol("CFC", sanitize=False)
>>> mol_to_smiles(mol)
'CFC'

Reaction SMILES

The package supports different kinds of reaction SMILES, which, internally, are stored as ReactionEquations.

To convert to and from ReactionEquation, a few functions are provided:

Examples:

>>> from rxn.chemutils.reaction_smiles import ReactionFormat, determine_format, parse_reaction_smiles, to_reaction_smiles, parse_any_reaction_smiles
>>> rxn_smiles = "CC.O.[Na+]~[Cl-]>>CCO"
>>> determine_format(rxn_smiles)
<ReactionFormat.STANDARD_WITH_TILDE: 3>
>>> parse_reaction_smiles(rxn_smiles, ReactionFormat.STANDARD_WITH_TILDE)
ReactionEquation(reactants=['CC', 'O', '[Na+].[Cl-]'], agents=[], products=['CCO'])
>>> parse_any_reaction_smiles(rxn_smiles)
ReactionEquation(reactants=['CC', 'O', '[Na+].[Cl-]'], agents=[], products=['CCO'])
>>> to_reaction_smiles(parse_any_reaction_smiles(rxn_smiles), ReactionFormat.EXTENDED)
'CC.O.[Na+].[Cl-]>>CCO |f:2.3|'

Multicomponent SMILES

Sometimes, it is necessary to represent multiple compounds as one single SMILES string. For fragments / ions, it becomes necessary to distinguish between what parts belong together as one compound, and what are differentt compounds. In such "multitcomponent SMILES", we typically use tildes, ~, to indicate that different SMILES fragments belong to the same compound.

>>> from rxn.chemutils.multicomponent_smiles import multicomponent_smiles_to_list, list_to_multicomponent_smiles
>>> list_to_multicomponent_smiles(["CC", "[Na+].[Cl-]"], fragment_bond="~")
'CC.[Na+]~[Cl-]'
>>> multicomponent_smiles_to_list('CC.[Na+]~[Cl-]', fragment_bond="~")
['CC', '[Na+].[Cl-]']

Canonicalization

Canonicalization of compounds, with the possibility to remove the valence check:

>>> from rxn.chemutils.conversion import canonicalize_smiles
>>> canonicalize_smiles("CC(O)")
'CCO'
>>> canonicalize_smiles("ABCD")  # Invalid SMILES
Traceback (most recent call last):
[...]
rxn.chemutils.exceptions.InvalidSmiles: "ABCD" is not a valid SMILES string
>>> canonicalize_smiles("CF(C)")  # Invalid valence, fails by default
Traceback (most recent call last):
[...]
rxn.chemutils.exceptions.InvalidSmiles: "CFC" is not a valid SMILES string
>>> canonicalize_smiles("CF(C)", check_valence=False)  # Invalid valence, does not fail
'CFC'

Canonicalization of any kind of SMILES (components, multicomponent SMILES, reaction SMILES, etc.), again with the possibility to disable the valence check. Note that the resulting string is in the same format.

>>> from rxn.chemutils.miscellaneous import canonicalize_any
>>> canonicalize_any("[Na+].[Cl-]")
'[Cl-].[Na+]'
>>> canonicalize_any("OC.C(O)~CF(C)", check_valence=False)
'CO.CFC~CO'
>>> canonicalize_any("CC(C)>C(O)>C(O)")
'CCC>CO>CO'
>>> canonicalize_any("CO.O.C>>C(O) |f:1.2|")
'CO.C.O>>CO |f:1.2|'

The executable rxn-canonicalize (installed with the package), which works either on files or on stdin

rxn-canonicalize --help

Augmentation

See smiles_randomization.py and smiles_augmenter.py for the augmentation of compound SMILES and reaction SMILES strings.

Others

Without going into details, the package also does the following:

  • Tokenization and detokenization of SMILES strings in tokenization.py, and the executables rxn-tokenize and rxn-detokenize.
  • Easy combination of precursor SMILES and product SMILES into a reaction SMILES with the ReactionCombiner, and the executable rxn-combine-reaction.
  • Parsing of RDFs into reaction SMILES: different modules, and the executable rxn-rdf-to-smiles.
  • ... and many others.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxn_chem_utils-1.6.0.tar.gz (47.2 kB view details)

Uploaded Source

Built Distribution

rxn_chem_utils-1.6.0-py3-none-any.whl (38.7 kB view details)

Uploaded Python 3

File details

Details for the file rxn_chem_utils-1.6.0.tar.gz.

File metadata

  • Download URL: rxn_chem_utils-1.6.0.tar.gz
  • Upload date:
  • Size: 47.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for rxn_chem_utils-1.6.0.tar.gz
Algorithm Hash digest
SHA256 54f2973b32cdd67625bbc5441984323235b17b20e00dc5b5a83c20f64850bacb
MD5 6db96690ec3aba5848277288ce9e5b04
BLAKE2b-256 bb0ba384ae73fa9fdf47148d9cfab54b05f5e737c55a3dbc98cce44a42f3082f

See more details on using hashes here.

File details

Details for the file rxn_chem_utils-1.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rxn_chem_utils-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 196a8498256d4774164944a6e16c7116111077316307437e183a792dbd3668f3
MD5 9e07cabbb861649d969cbaf5f0680f76
BLAKE2b-256 87f921a083ae60f56e7d3f3b2349017124a470d75b9a196e213e20c0ac154ed1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page