Skip to main content

RXN insight package

Project description

Rxn-INSIGHT: Fast Chemical Reaction Analysis Using Bond-Electron Matrices

Coverage Status

Rxn-INSIGHT is an open-source algorithm, written in python, to classify and name chemical reactions, and suggest reaction conditions based on similarity and popularity.

1. Installation

Rxn-INSIGHT relies on NumPy, Pandas, RDKit, RDChiral, and RXNMapper.

A virtual environment can be installed with Anaconda as follows:

conda create -n rxn-insight python=3.10
conda activate rxn-insight

Option 1: Installing via PyPI:

pip install rxn-insight

Option 2: Installing directly from source:

git clone https://github.com/mrodobbe/Rxn-INSIGHT.git
cd Rxn-INSIGHT
pip install .

Or, for developing with the optional dependencies, which are required to run the tests and build the docs:

pip install -e ".[test,doc]"

All of the test environments can be run using the command tox from the top directory. Alternatively, individual test environments can be run using the -e flag as in tox -e env-name. To run the tests, tests with coverage report, style checks, and docs build, respectively:

tox -e py3
tox -e py3-coverage
tox -e style
tox -e docs

2. Usage

Basic Usage

from rxn_insight.reaction import Reaction
r = "c1ccccc1I.C=CC(=O)OC>>COC(=O)/C=C/c1ccccc1"  # Define a Reaction SMILES identifier
rxn = Reaction(r)
ri = rxn.get_reaction_info()

The reaction info contains most of the information:

{'REACTION': 'C=CC(=O)OC.Ic1ccccc1>>COC(=O)/C=C/c1ccccc1', 
 'MAPPED_REACTION': '[CH3:1][O:2][C:3](=[O:4])[CH:5]=[CH2:6].I[c:7]1[cH:8][cH:9][cH:10][cH:11][cH:12]1>>[CH3:1][O:2][C:3](=[O:4])/[CH:5]=[CH:6]/[c:7]1[cH:8][cH:9][cH:10][cH:11][cH:12]1', 
 'N_REACTANTS': 2, 
 'N_PRODUCTS': 1, 
 'FG_REACTANTS': ('Aromatic halide', 'Vinyl'), 
 'FG_PRODUCTS': (), 
 'PARTICIPATING_RINGS_REACTANTS': ('c1ccccc1',), 
 'PARTICIPATING_RINGS_PRODUCTS': ('c1ccccc1',), 
 'ALL_RINGS_PRODUCTS': ('c1ccccc1',), 
 'BY-PRODUCTS': ('HI',), 
 'CLASS': 'C-C Coupling', 
 'TAG': '55becfded1a3842d5a03bbf3e1610411c659aff0806930400c4db2ef61f9c87f', 
 'SOLVENT': ('',), 
 'REAGENT': ('',), 
 'CATALYST': ('',), 
 'REF': '', 
 'NAME': 'Heck terminal vinyl', 
 'SCAFFOLD': 'c1ccccc1'}

Similarity Search

A similarity search can be performed when a database with similar reactions is provided as a pandas DataFrame (df in this case). Another Pandas DataFrame is returned.

df_nbs = rxn.find_neighbors(df, fp="MACCS", concatenate=True, threshold=0.5, broaden=True, full_search=False)

Condition Suggestion

Reaction conditions can be suggested when a Pandas DataFrame is provided.

rxn.suggest_conditions(df)
suggested_solvents = rxn.suggested_solvent
suggested_catalysts = rxn.suggested_catalyst
suggested_reagents = rxn.suggested_reagent

Creating a Rxn-INSIGHT-Compatible Database

If you want to use similarity search or condition suggestion with your own data, then you need to create a database that is compatible with Rxn-INSIGHT. This action is possible with the Database module from either a csv file or a Pandas DataFrame. Below an example is shown:

from rxn_insight.database import Database
from rxn_insight.reaction import Reaction

db = Database()
db.create_database_from_csv(fname="test.csv", 
                            reaction_column="RXN", 
                            solvent_column="SOLVENTS")
r = Reaction("CCO>>CC=O")
sug_conds = r.suggest_conditions(db.df)

Two arguments are required: fname (location of your csv file) and reaction_column (the name of the column containing the reactions). The optional arguments are solvent_column, reagent_column, catalyst_column, yield_column, and ref_column. Default (or when these columns are not available), the column names will be set to respectively SOLVENT, REAGENT, CATALYST, YIELD, REF.

3. Datasets

The complete USPTO dataset that is analyzed by Rxn-INSIGHT, as described in the manuscript, can be found on Zenodo: https://doi.org/10.5281/zenodo.10171745. The gzip file should be downloaded and placed in the folder data/.

4. Reference

When using Rxn-INSIGHT for your own work, please refer to the original publication:

M. R. Dobbelaere, I. Lengyel, C. V. Stevens, and K. M. Van Geem, ‘Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices’, J. Cheminform., vol. 16, no. 1, Mar. 2024.

@ARTICLE{Dobbelaere2024-es,
  title     = "{Rxn-INSIGHT}: fast chemical reaction analysis using
               bond-electron matrices",
  author    = "Dobbelaere, Maarten R and Lengyel, Istv{\'a}n and Stevens,
               Christian V and Van Geem, Kevin M",
  journal   = "J. Cheminform.",
  publisher = "Springer Science and Business Media LLC",
  volume    =  16,
  number    =  1,
  month     =  mar,
  year      =  2024,
  copyright = "https://creativecommons.org/licenses/by/4.0",
  language  = "en"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxn_insight-0.1.0.tar.gz (961.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxn_insight-0.1.0-py3-none-any.whl (47.7 kB view details)

Uploaded Python 3

File details

Details for the file rxn_insight-0.1.0.tar.gz.

File metadata

  • Download URL: rxn_insight-0.1.0.tar.gz
  • Upload date:
  • Size: 961.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for rxn_insight-0.1.0.tar.gz
Algorithm Hash digest
SHA256 079d534605d891a9e00b3cc2b88d4f6f80aae9207b69dc81ede59d82c3ee222a
MD5 3286f6fc0696876fb93c01a15a33e674
BLAKE2b-256 9f70d9bff7036669aab644ffbf28075eba7ab3ca2923e40e80c351cf85289196

See more details on using hashes here.

File details

Details for the file rxn_insight-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rxn_insight-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for rxn_insight-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f70ad63fa31c9ba97a49afba3d171545311adb8e6b04b6495f974704f0f5668
MD5 6e389c24cd9f4d0a5ab3cd5830192a97
BLAKE2b-256 e074f97f91f0b964bc21ddeccdd505fead4e0e3048d763102d5fa30067dd98d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page