Reaction preprocessing tools

These details have not been verified by PyPI

Project links

Project description

RXN reaction preprocessing

This repository is devoted to preprocessing chemical reactions: standardization, filtering, etc. It also includes code for stable train/test/validation splits and data augmentation.

Links:

System Requirements

This package is supported on all operating systems. It has been tested on the following systems:

macOS: Big Sur (11.1)
Linux: Ubuntu 18.04.4

A Python version of 3.7 or greater is recommended.

Installation guide

The package can be installed from Pypi:

pip install rxn-reaction-preprocessing[rdkit]

You can leave out [rdkit] if you prefer to install rdkit manually (via Conda or Pypi).

For local development, the package can be installed with:

pip install -e ".[dev]"

Usage

The following command line scripts are installed with the package.

rxn-data-pipeline

Wrapper for all other scripts. Allows constructing flexible data pipelines. Entrypoint for Hydra structured configuration.

For an overview of all available configuration parameters and default values, run: rxn-data-pipeline --cfg job.

Configuration using YAML (see the file config.py for more options and their meaning):

defaults:
  - base_config

data:
  path: /tmp/inference/input.csv
  proc_dir: /tmp/rxn-preproc/exp
common:
  sequence:
    # Define which steps and in which order to execute:
    - IMPORT
    - STANDARDIZE
    - PREPROCESS
    - SPLIT
    - TOKENIZE
  fragment_bond: TILDE
preprocess:
  min_products: 0
split:
  split_ratio: 0.05
tokenize:
  input_output_pairs:
    - inp: ${data.proc_dir}/${data.name}.processed.train.csv
      out: ${data.proc_dir}/${data.name}.processed.train
    - inp: ${data.proc_dir}/${data.name}.processed.validation.csv
      out: ${data.proc_dir}/${data.name}.processed.validation
    - inp: ${data.proc_dir}/${data.name}.processed.test.csv
      out: ${data.proc_dir}/${data.name}.processed.test

rxn-data-pipeline --config-dir . --config-name example_config

Configuration using command line arguments (example):

rxn-data-pipeline \
  data.path=/path/to/data/rxns-small.csv \
  data.proc_dir=/path/to/proc/dir \
  common.fragment_bond=TILDE \
  rxn_import.data_format=TXT \
  tokenize.input_output_pairs.0.out=train.txt \
  tokenize.input_output_pairs.1.out=validation.txt \
  tokenize.input_output_pairs.2.out=test.txt

Note about reading CSV files

Pandas appears not to always be able to write a CSV and re-read it if it contains Windows carriage returns. In order for the scripts to work despite this, all the pd.read_csv function calls should include the argument lineterminator='\n'.

Examples

A pipeline supporting augmentation

A config supporting augmentation of the training split called train-augmentation-config.yaml:

defaults:
  - base_config

data:
  name: pipeline-with-augmentation
  path: /tmp/file-with-reactions.txt
  proc_dir: /tmp/rxn-preprocessing/experiment
common:
  sequence:
    # Define which steps and in which order to execute:
    - IMPORT
    - STANDARDIZE
    - PREPROCESS
    - SPLIT
    - AUGMENT
    - TOKENIZE
  fragment_bond: TILDE
rxn_import:
  data_format: TXT
preprocess:
  min_products: 1
split:
  input_file_path: ${preprocess.output_file_path}
  split_ratio: 0.05
augment:
  input_file_path: ${data.proc_dir}/${data.name}.processed.train.csv
  output_file_path: ${data.proc_dir}/${data.name}.augmented.train.csv
  permutations: 10
  tokenize: false
  random_type: rotated
tokenize:
  input_output_pairs:
    - inp: ${data.proc_dir}/${data.name}.augmented.train.csv
      out: ${data.proc_dir}/${data.name}.augmented.train
      reaction_column_name: rxn_rotated
    - inp: ${data.proc_dir}/${data.name}.processed.validation.csv
      out: ${data.proc_dir}/${data.name}.processed.validation
    - inp: ${data.proc_dir}/${data.name}.processed.test.csv
      out: ${data.proc_dir}/${data.name}.processed.test

rxn-data-pipeline --config-dir . --config-name train-augmentation-config

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.6.0

Sep 10, 2025

2.5.0

Aug 13, 2025

2.4.0

Sep 18, 2023

2.3.0

Sep 9, 2023

2.2.0

Aug 17, 2023

2.1.0

Jun 12, 2023

2.0.3

May 3, 2023

2.0.2

Sep 30, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxn_reaction_preprocessing-2.6.0.tar.gz (97.3 kB view details)

Uploaded Sep 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rxn_reaction_preprocessing-2.6.0-py3-none-any.whl (98.0 kB view details)

Uploaded Sep 10, 2025 Python 3

File details

Details for the file rxn_reaction_preprocessing-2.6.0.tar.gz.

File metadata

Download URL: rxn_reaction_preprocessing-2.6.0.tar.gz
Upload date: Sep 10, 2025
Size: 97.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rxn_reaction_preprocessing-2.6.0.tar.gz
Algorithm	Hash digest
SHA256	`6c4081093aa030815d3e667b51ecbe5bd86d17c9a762522463fb02ca71734316`
MD5	`e682faae04849a1ec586544407868e30`
BLAKE2b-256	`e5c4ca6bbc78bd4a68d968accf603cb1b3c801898b27ba6885d3965945bf83e2`

See more details on using hashes here.

File details

Details for the file rxn_reaction_preprocessing-2.6.0-py3-none-any.whl.

File metadata

Download URL: rxn_reaction_preprocessing-2.6.0-py3-none-any.whl
Upload date: Sep 10, 2025
Size: 98.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rxn_reaction_preprocessing-2.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`243c4a4782eb423df5abc066eae567a6cc19eef21f29289f3e788cd98388a311`
MD5	`7e976f14d4607e0a64f42889951c4134`
BLAKE2b-256	`86794d7a490da44c7705f2f3845a06bd758d6319351a7b0764fb1a7bffae995e`

See more details on using hashes here.

rxn-reaction-preprocessing 2.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RXN reaction preprocessing

System Requirements

Installation guide

Usage

rxn-data-pipeline

Note about reading CSV files

Examples

A pipeline supporting augmentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes