In-silico MeOX/TMS derivatization of chemical compounds
Project description
In silico derivatization
Overview
This package performs in-silico MeOX + TMS derivatization (as described e.g. in https://doi.org/10.1021/acs.analchem.7b01010):
- Methoximation: ketone R(C=O)R' and aldehyde (-HC=O) karboxyl groups are substituted with -C=NOCH3
- Trimethylsilylation: the acidic hydrogen in -OH, -SH, -COOH, -NH2, -NHR, =NH, the hydrogen is substituted with -Si(CH3)3 The substitution needn't happen always, their probability currently hardcoded in the package. Typically, multiple substitution attempts are run on each input molecule, and all distinct results are gathered.
Known limitation is methoximation on cycles which should be broken. This is not implemented yet.
Installation
There are a few ways to install gc-meox-tms
:
- Install in a new
conda
environment (recommended):
$ conda create -n gc-meox-tms -c bioconda gc-meox-tms
$ conda activate gc-meox-tms
- From source by cloning the repository and installing the package with
pip
as follows:
$ git clone https://github.com/RECETOX/gc-meox-tms.git
# install the package:
$ python -m pip install gc-meox-tms
# if you want to run examples in the Jupyter notebook, install with this command:
$ python -m pip install gc-meox-tms[eda]
- Install via Conda:
$ conda create --name gc-meox-tms gc-meox-tms
$ conda activate gc-meox-tms
Usage
Command-Line Tool
gc-meox-tms
can be used as a command line tool to produce all MeOX/TMS derivatives of given compounds. To use it via
the command line you will need one or more txt
files with chemical compounds represented as SMILES
(one SMILES per line). The tool can output results in flat txt
format(one compound per line) or tab separated tsv
format (all derivatives of a given molecule per line).
$ python -m gc_meox_tms \
-f <path to write flat txt result> \
-t <path to write tab separated result> \
<paths to input txt files>
More parameters can be specified, such as number of cores or repeats. For more information run:
$ python -m gc_meox_tms --help
Python Package
Package provides functions:
is_derivatized()
checks whether the molecule contains MeOX or TMS groups that are likely to be result of derivatizationremove_derivatization_groups()
removes the suspected groups, reconstructing the original moleculeadd_derivatization_groups()
does the substitution above
from gc_meox_tms import add_derivatization_groups, is_derivatized, remove_derivatization_groups
from rdkit.Chem import MolToSmiles
# Example compounds in SMILES format
compounds = ["CC=O", "CC=NOC", "CCO[Si](C)(C)C"]
# Check derivatization
[is_derivatized(smiles=smiles) for smiles in compounds]
>>> [False, True, True]
# Remove derivatization groups from derivatized molecules
underivatized = [remove_derivatization_groups(smiles=smiles) for smiles in compounds[1:]]
print([MolToSmiles(mol) for mol in underivatized])
>>> ["CC=O", "CCO"]
# Convert molecules back to derivatized forms
rederivatized = [add_derivatization_groups(mol=mol) for mol in underivatized]
print([MolToSmiles(mol) for mol in rederivatized])
>>> ['CC=NOC', 'CCO[Si](C)(C)C']
Note that your results may differ from the presented since add_derivatization_groups
is not deterministic. If you rerun
the function enough times you will get all possible derivatizations. The number of reruns to obtain all possible conformations
is individual for each compound (depends on possible conversion degrees etc.).
See also the Jupyter notebook in example/
directory for more examples.
Developer documentation
Installation
Create a virtual environment of your choice (e.g., conda or venv). The development version can be installed with conda or pip as follows:
# 1. Fork and clone the repository
$ git clone https://github.com/<YOUR_GITHUB_USERNAME>/gc-meox-tms.git
$ cd gc-meox-tms
# 2a. To create a conda env run from the package directory:
$ conda env create -f conda/environment-dev.yaml
$ conda activate gc-meox-tms-dev
# 2b. Alternatively, install using python venv:
$ python3 -m venv gc-meox-tms-dev
$ source gc-meox-tms-dev/bin/activate
$ pip install -e .[dev]
Contributing
Before opening a PR make sure all the tests are passing by running pytest
from within the package directory:
$ pytest
It may happen that some tests which are dependent on probabilistic logic may fail. If that occurs, try rerunning the tests. Usually one rerun is enough.
We strongly advise you to add new tests for the functionality that you want to contribute. If you want to check whether
your changes are covered with tests, run $ pytest --cov
and examine the output to see what parts may need better test coverage.
Run linter, to make sure all is nicely formatted:
$ flake8
# if you use venv, exclude venv directory from linting
$ flake8 --exclude 'gc-meox-tms-dev'
Lastly make sure the Python imports are in a proper order:
$ isort gc_meox_tms
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gc_meox_tms-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0eb29c82a9ddf25e5098a36ce04781de596d3625e6c1dd89457d5ec08900a505 |
|
MD5 | cf84caa1eaacc2170f0d02fe34fb8598 |
|
BLAKE2b-256 | 1e807fe0ae0ef032ad39cf35d946f88812eeea7135b8693bf89e1852855ee39b |