Skip to main content

In-silico MeOX/TMS derivatization of chemical compounds

Project description

In silico derivatization

Overview

This package performs in-silico MeOX + TMS derivatization (as described e.g. in https://doi.org/10.1021/acs.analchem.7b01010):

  • Methoximation: ketone R(C=O)R' and aldehyde (-HC=O) karboxyl groups are substituted with -C=NOCH3
  • Trimethylsilylation: the acidic hydrogen in -OH, -SH, -COOH, -NH2, -NHR, =NH, the hydrogen is substituted with -Si(CH3)3 The substitution needn't happen always, their probability currently hardcoded in the package. Typically, multiple substitution attempts are run on each input molecule, and all distinct results are gathered.

Known limitation is methoximation on cycles which should be broken. This is not implemented yet.

Installation

There are a few ways to install gc-meox-tms:

  1. Install in a new conda environment (recommended):
$ conda create -n gc-meox-tms -c bioconda gc-meox-tms
$ conda activate gc-meox-tms
  1. From source by cloning the repository and installing the package with pip as follows:
$ git clone https://github.com/RECETOX/gc-meox-tms.git

# install the package:
$ python -m pip install gc-meox-tms

# if you want to run examples in the Jupyter notebook, install with this command:
$ python -m pip install gc-meox-tms[eda]
  1. Install via Conda:
$ conda create --name gc-meox-tms gc-meox-tms
$ conda activate gc-meox-tms 

Usage

Command-Line Tool

gc-meox-tms can be used as a command line tool to produce all MeOX/TMS derivatives of given compounds. To use it via the command line you will need one or more txt files with chemical compounds represented as SMILES (one SMILES per line). The tool can output results in flat txt format(one compound per line) or tab separated tsv format (all derivatives of a given molecule per line).

$ python -m gc_meox_tms \
-f <path to write flat txt result> \
-t <path to write tab separated result> \
<paths to input txt files>

More parameters can be specified, such as number of cores or repeats. For more information run:

$ python -m gc_meox_tms --help

Python Package

Package provides functions:

  • is_derivatized() checks whether the molecule contains MeOX or TMS groups that are likely to be result of derivatization
  • remove_derivatization_groups() removes the suspected groups, reconstructing the original molecule
  • add_derivatization_groups() does the substitution above
from gc_meox_tms import add_derivatization_groups, is_derivatized, remove_derivatization_groups
from rdkit.Chem import MolToSmiles

# Example compounds in SMILES format
compounds = ["CC=O", "CC=NOC", "CCO[Si](C)(C)C"]

# Check derivatization
[is_derivatized(smiles=smiles) for smiles in compounds]
>>> [False, True, True]

# Remove derivatization groups from derivatized molecules
underivatized = [remove_derivatization_groups(smiles=smiles) for smiles in compounds[1:]]
print([MolToSmiles(mol) for mol in underivatized])
>>> ["CC=O", "CCO"]

# Convert molecules back to derivatized forms
rederivatized = [add_derivatization_groups(mol=mol) for mol in underivatized]
print([MolToSmiles(mol) for mol in rederivatized])
>>> ['CC=NOC', 'CCO[Si](C)(C)C']

Note that your results may differ from the presented since add_derivatization_groups is not deterministic. If you rerun the function enough times you will get all possible derivatizations. The number of reruns to obtain all possible conformations is individual for each compound (depends on possible conversion degrees etc.).

See also the Jupyter notebook in example/ directory for more examples.

Developer documentation


Installation

Create a virtual environment of your choice (e.g., conda or venv). The development version can be installed with conda or pip as follows:

# 1. Fork and clone the repository
$ git clone https://github.com/<YOUR_GITHUB_USERNAME>/gc-meox-tms.git
$ cd gc-meox-tms

# 2a. To create a conda env run from the package directory:
$ conda env create -f conda/environment-dev.yaml
$ conda activate gc-meox-tms-dev

# 2b. Alternatively, install using python venv:
$ python3 -m venv gc-meox-tms-dev
$ source gc-meox-tms-dev/bin/activate
$ pip install -e .[dev]

Contributing

Before opening a PR make sure all the tests are passing by running pytest from within the package directory:

$ pytest

It may happen that some tests which are dependent on probabilistic logic may fail. If that occurs, try rerunning the tests. Usually one rerun is enough.

We strongly advise you to add new tests for the functionality that you want to contribute. If you want to check whether your changes are covered with tests, run $ pytest --cov and examine the output to see what parts may need better test coverage.

Run linter, to make sure all is nicely formatted:

$ flake8

# if you use venv, exclude venv directory from linting
$ flake8 --exclude 'gc-meox-tms-dev'

Lastly make sure the Python imports are in a proper order:

$ isort gc_meox_tms

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gc-meox-tms-1.0.1.tar.gz (7.2 kB view hashes)

Uploaded Source

Built Distribution

gc_meox_tms-1.0.1-py3-none-any.whl (7.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page