Skip to main content

Synthesis Rebalancing Framework for Computational Chemistry

Project description

SynRBL: Synthesis Rebalancing Framework

PyPI version License Release CI Dependency PRs Stars

SynRBL is a toolkit tailored for computational chemistry, aimed at correcting imbalances in chemical reactions. It employs a dual strategy: a rule-based method for adjusting non-carbon elements and an mcs-based (maximum common substructure) technique for carbon element adjustments.

screenshot

Table of Contents

Installation

The easiest way to use SynRBL is by installing the PyPI package synrbl.

Follow these steps to setup a working environment. Please ensure you have Python 3.11 or later installed on your system.

Prerequisites

The requirements are automatically installed with the pip package.

  • Python 3.11
  • rdkit >= 2023.9.4
  • joblib >= 1.3.2
  • seaborn >= 0.13.2
  • xgboost >= 2.0.3
  • scikit_learn == 1.4.0
  • imbalanced_learn >= 0.12.0
  • reportlab >= 4.1.0
  • fgutils >= 0.1.3

Step-by-Step Installation Guide

  1. Python Installation: Ensure that Python 3.11 or later is installed on your system. You can download it from python.org.

  2. Creating a Virtual Environment (Optional but Recommended): It's recommended to use a virtual environment to avoid conflicts with other projects or system-wide packages. Use the following commands to create and activate a virtual environment:

python -m venv synrbl-env
source synrbl-env/bin/activate  # On Windows use `synrbl-env\Scripts\activate`

Or Conda

conda create --name synrbl-env python=3.11
conda activate synrbl-env
  1. Install with pip:
pip install synrbl
  1. Verify Installation: After installation, you can verify that SynRBL is correctly installed by running a simple test.
python -c "from synrbl import Balancer; bal = Balancer(n_jobs=1); print(bal.rebalance('CC(=O)OCC>>CC(=O)O'))"

Usage

Use in script

from synrbl import Balancer

smiles = (
  "COC(=O)[C@H](CCCCNC(=O)OCc1ccccc1)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O>>"
  + "COC(=O)[C@H](CCCCN)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O"
)
synrbl = Balancer()

results = synrbl.rebalance(smiles, output_dict=True)
>> [{
      "reaction": "COC(=O)[C@H](CCCCNC(=O)OCc1ccccc1)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O.O>>"
      + "COC(=O)[C@H](CCCCN)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O.O=C(O)OCc1ccccc1",
      "solved": True,
      "input_reaction": "COC(=O)[C@H](CCCCNC(=O)OCc1ccccc1)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O>>"
      + "COC(=O)[C@H](CCCCN)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O",
      "issue": "",
      "rules": ["append O when next to O or N", "default single bond"],
      "solved_by": "mcs-based",
      "confidence": 0.999,
  }]

New config

from synrbl import Balancer

smiles = 'CC(=O)O>>CCO'
synrbl = Balancer(use_default_reduction=True) # we try to match correct reduction agent
results = synrbl.rebalance(smiles, output_dict=True)
>> 'CC(=O)O.[AlH4-].[Li+].[H+].[AlH4-].[Li+].[H+]>>CCO.O.[AlH3].[Li+].[AlH3].[Li+]'

synrbl = Balancer(use_default_reduction=True) # leave hydrogen
results = synrbl.rebalance(smiles, output_dict=True)
>> 'CC(=O)O.[H][H].[H][H]>>CCO.O'

Batch Process

from synrbl import ReactionRebalancer, RebalanceConfig

data = [{'id':1, 'rxn':'CC(=O)O>>CCO'},
        {'id':2, 'rxn':('COC(=O)[C@H](CCCCNC(=O)OCc1ccccc1)NC(=O)Nc1cc(OC)cc(C(C)'
        +'(C)C)c1O>>COC(=O)[C@H](CCCCN)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O')}]

config = RebalanceConfig(reaction_col="rxn", id_col="id", n_jobs=2, batch_size=500,
                         enable_logging=False, use_default_reduction=True)
rebalancer = ReactionRebalancer(config=config, user_logger=None)
result = rebalancer.rebalance(data, keep_extra=False)
result
>> [{'id': 2,
    'rxn': 'COC(=O)C(CCCCNC(=O)OCc1ccccc1)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O.O>>COC(=O)C(CCCCN)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O.O=C(O)OCc1ccccc1'},
    {'id': 1, 'rxn': 'CC(=O)O.[H][H].[H][H]>>CCO.O'}]

If you want to keep metadata, use keep_extra=True

Use in command line

echo "id,reaction\n0,CC(=O)OCC>>CC(=O)O" > unbalanced.csv
python -m synrbl run -o balanced.csv unbalanced.csv

Benchmark your own dataset

Prepare your dataset as a csv file datafile with a column reaction of unbalanced reaction SMILES and a column expected_reaction containing the expected balanced reactions.

Rebalance the reactions and forward the expected reactions column to the output.

python -m synrbl run -o balanced.csv --col <reaction> --out-columns <expected_reaction> <datafile>

After rebalancing you can use the benchmark command to compute the success and accuracy rates of your dataset. Keep in mind that an exact comparison between rebalanced and expected reaction is a highly conservative evaluation. An unbalance reaction might have multiple equaly viable balanced solutions. Besides the exact comparison (default) the benchmark command supports a few similarity measures like ECFP and pathway fingerprints for the comparison between rebalanced reaction and the expected balanced reaction.

python -m synrbl benchmark --col <reaction> --target-col <expected_reaction> balanced.csv

Reproduce benchmark results from validation set

To test SynRBL on the provided validation set use the following commands. Run these commands from the root of the cloned repository.

Rebalance the dataset

python -m synrbl run -o validation_set_balanced.csv --out-columns expected_reaction ./Data/Validation_set/validation_set.csv

and compute the benchmark results

python -m synrbl benchmark validation_set_balanced.csv

Contributing

License

This project is licensed under MIT License - see the License file for details.

Publication

Reaction rebalancing: a novel approach to curating reaction databases

Citation

@Article{Phan2024,
  author={Phan, Tieu-Long and Weinbauer, Klaus and G{\"a}rtner, Thomas and Merkle, 
    Daniel and Andersen, Jakob L. and Fagerberg, Rolf and Stadler, Peter F.},
  title={Reaction rebalancing: a novel approach to curating reaction databases},
  journal={Journal of Cheminformatics},
  year={2024},
  month={Jul},
  day={19},
  volume={16},
  number={1},
  pages={82},
  issn={1758-2946},
  doi={10.1186/s13321-024-00875-4},
  url={https://doi.org/10.1186/s13321-024-00875-4}
}

Acknowledgments

This project has received funding from the European Unions Horizon Europe Doctoral Network programme under the Marie-Skłodowska-Curie grant agreement No 101072930 (TACsy -- Training Alliance for Computational)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synrbl-1.0.6.tar.gz (5.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synrbl-1.0.6-py3-none-any.whl (205.2 kB view details)

Uploaded Python 3

File details

Details for the file synrbl-1.0.6.tar.gz.

File metadata

  • Download URL: synrbl-1.0.6.tar.gz
  • Upload date:
  • Size: 5.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synrbl-1.0.6.tar.gz
Algorithm Hash digest
SHA256 f3f030edbb8fcb5fd34c72762ff0a371182346c54c0682febc80b13fd9311d7d
MD5 9318c3e71f17c2952ff205657d42398d
BLAKE2b-256 8b6bbee4b4da1742b4507493f9a92fab156a64f041c50a6503e351548ee16c63

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrbl-1.0.6.tar.gz:

Publisher: publish-package.yml on TieuLongPhan/SynRBL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file synrbl-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: synrbl-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 205.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synrbl-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 05010e8da80673781fd41a978b4de4212c040fa41e47ad55268fcf8bf1a1b8ad
MD5 b1f1cb0621fb80543621ec62b032f448
BLAKE2b-256 f0c3d7efc366a781c4d72f55b84695a158db4143176a889676bdef7b426be011

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrbl-1.0.6-py3-none-any.whl:

Publisher: publish-package.yml on TieuLongPhan/SynRBL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page