Skip to main content

Training of OpenNMT-based RXN models

Project description

RXN package for OpenNMT-based models

Actions tests

This repository contains a Python package and associated scripts for training reaction models based on the OpenNMT library. The repository is built on top of other RXN packages; see our other repositories rxn-utilities, rxn-chemutils, and rxn-onmt-utils.

For the evaluation of trained models, see the rxn-metrics repository.

The documentation can be found here.

This repository was produced through a collaborative project involving IBM Research Europe and Syngenta.

System Requirements

This package is supported on all operating systems. It has been tested on the following systems:

  • macOS: Big Sur (11.1)
  • Linux: Ubuntu 18.04.4

A Python version of 3.6, 3.7, or 3.8 is recommended. Python versions 3.9 and above are not expected to work due to compatibility with the selected version of OpenNMT.

Installation guide

The package can be installed from Pypi:

pip install rxn-onmt-models[rdkit]

You can leave out [rdkit] if RDKit is already available in your environment.

For local development, the package can be installed with:

pip install -e ".[dev,rdkit]"

Training models.

Example of usage for training RXN models

The easy way

Simply execute the interactive program rxn-plan-training in your terminal and follow the instructions.

The complicated way

  1. Optional: set shell variables, to be used in the commands later on.
MODEL_TASK="forward"

# Existing TXT files
DATA_1="/path/to/data_1.txt"
DATA_2="/path/to/data_2.txt"
DATA_3="/path/to/data_3.txt"

# Where to put the processed data
DATA_DIR_1="/path/to/processed_data_1"
DATA_DIR_2="/path/to/processed_data_2"
DATA_DIR_3="/path/to/processed_data_3"

# Where to save the ONMT-preprocessed data
PREPROCESSED="/path/to/onmt-preprocessed"

# Where to save the models
MODELS="/path/to/models"
MODELS_FINETUNED="/path/to/models_finetuned"
  1. Prepare the data (standardization, filtering, etc.)
rxn-prepare-data --input_data $DATA_1 --output_dir $DATA_DIR_1
  1. Preprocess the data with OpenNMT
rxn-onmt-preprocess --input_dir $DATA_DIR_1 --output_dir $PREPROCESSED --model_task $MODEL_TASK
  1. Train the model (here with small parameter values, to make it fast on CPU for testing).
rxn-onmt-train --model_output_dir $MODELS --preprocess_dir $PREPROCESSED_SINGLE --train_num_steps 10 --batch_size 4 --heads 2 --layers 2 --transformer_ff 512 --no_gpu

Multi-task training

For multi-task training, the process is similar. We need to prepare also the second data set; in addition, the OpenNMT preprocessing and training take additional arguments. To sum up:

rxn-prepare-data --input_data $DATA_1 --output_dir $DATA_DIR_1
rxn-prepare-data --input_data $DATA_2 --output_dir $DATA_DIR_2
rxn-prepare-data --input_data $DATA_2 --output_dir $DATA_DIR_3
rxn-onmt-preprocess --input_dir $DATA_DIR_1 --output_dir $PREPROCESSED --model_task $MODEL_TASK \
  --additional_data $DATA_DIR_2 --additional_data $DATA_DIR_3
rxn-onmt-train --model_output_dir $MODELS --preprocess_dir $PREPROCESSED --train_num_steps 30 --batch_size 4 --heads 2 --layers 2 --transformer_ff 256 --no_gpu \
  --data_weights 1 --data_weights 3 --data_weights 4

Continuing the training

Continuing training is possible (for both single-task and multi-task); it needs fewer parameters:

rxn-onmt-continue-training --model_output_dir $MODELS --preprocess_dir $PREPROCESSED --train_num_steps 30 --batch_size 4 --no_gpu \
  --data_weights 1 --data_weights 3 --data_weights 4

Fine-tuning

Fine-tuning is in principle similar to continuing the training. The main differences are the potential occurrence of new tokens, as well as the optimizer being reset. There is a dedicated command for fine-tuning. For example:

rxn-onmt-finetune --model_output_dir $MODELS_FINETUNED --preprocess_dir $PREPROCESSED --train_num_steps 20 --batch_size 4 --no_gpu \
  --train_from $MODELS/model_step_30.pt

The syntax is very similar to rxn-onmt-train and rxn-onmt-continue-training. This is compatible both with single-task and multi-task.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxn-onmt-models-1.0.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

rxn_onmt_models-1.0.0-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file rxn-onmt-models-1.0.0.tar.gz.

File metadata

  • Download URL: rxn-onmt-models-1.0.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for rxn-onmt-models-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6eb563652020de19ffb6b3d7658a7df6282f9a22a67af6d688c45e8e79ba9fe7
MD5 36290318c6aea51bb23631249ca5c223
BLAKE2b-256 86fafa7dc9dfc9c54468b2921d2623cf7541345ce65f62ae6cbde942323adde8

See more details on using hashes here.

Provenance

File details

Details for the file rxn_onmt_models-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rxn_onmt_models-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a19fa258a616bdac4f0da4fae7b7ca292733a0c9997553a72ce8d6c762b2a740
MD5 c52aa6f52bd60f5fb03573d1e8cdc05b
BLAKE2b-256 15a204aa5fee9f18cfe8867f77f07c25a99a09c74c3d5a163864f9609f887093

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page