Skip to main content

CYPstrate: Prediction of Cytochrome P450 substrates

Project description

Cypstrate

CYPstrate consists of a collection of machine learning classifiers (random forest and support vector machines) for the prediction of substrates and non-substrates of the nine most important human CYP isozymes in the metabolism of xenobiotics (i.e. CYPs 1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1 and 3A4). The models are trained on a high-quality data set of 1831 substrates and non-substrates compiled from public sources.

Installation

# requires Python 3.8
pip install -U cypstrate

Usage

CYPstrate can be called from the command line. Examples:

# input in SMILES format
cypstrate "CCOC(=O)N1CCN(CC1)C2=C(C(=O)C2=O)N3CCN(CC3)C4=CC=C(C=C4)OC"

# prediction is one of "best_performance" (default) or "full_coverage"
cypstrate --prediction-mode full_coverage "CCN(C)C(=O)OC1=CC=CC(=C1)C(C)N(C)C"

# input can be a file
cypstrate molecules.sdf > result.csv

# output format can be specified
cypstrate --output sdf molecules.smiles > result.sdf

# more information via --help
cypstrate --help

The model can be used in Python. Calling the predict function of the CypstrateModel class results in a pandas DataFrame containing the prediction results for each input molecule.

from cypstrate import CypstrateModel

model = CypstrateModel()

# "predict" method accepts a list of SMILES representations
df_predictions = model.predict(['CCN(C)C(=O)OC1=CC=CC(=C1)C(C)N(C)C'])

# ... or a list of file paths
df_predictions = model.predict(['part1.sdf', 'part2.sdf'])

The result DataFrame contains the columns:

  • mol_id: unique number identifying the input molecule
  • input: the raw representation provided as input (e.g. OCCCCC)
  • input_type: the representation type of the input (e.g. smiles)
  • source: the input source (e.g. my_molecules.sdf)
  • name: the name of the input molecule (if provided in the input)
  • input_mol: the RDKit molecule parsed from the input representation
  • preprocessed_mol: the RDKit molecule after preprocessing
  • errors: a list of errors that occured during reading or preprocessing the input
  • prediction_1a2, prediction_2a6, prediction_2b6, prediction_2c8, prediction_2c9, prediction_2c19, prediction_2d6, prediction_2e1, prediction_3a4: probability (between 0 and 1) of being a substrate of the given CYP isozyme
  • neighbor_1a2, neighbor_2a6, neighbor_2b6, neighbor_2c8, neighbor_2c9, neighbor_2c19, neighbor_2d6,neighbor_2e1,neighbor_3a4: similarity to the most similar molecule in the corresponding training set

Contribute

conda env create -f environment.yml
conda activate cypstrate
pip install -e .[dev,test]
ptw

Contributors

  • Malte Holmer
  • Steffen Hirte
  • Axinya Tokareva

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cypstrate-0.1.5.tar.gz (97.9 MB view details)

Uploaded Source

Built Distribution

cypstrate-0.1.5-py3-none-any.whl (99.1 MB view details)

Uploaded Python 3

File details

Details for the file cypstrate-0.1.5.tar.gz.

File metadata

  • Download URL: cypstrate-0.1.5.tar.gz
  • Upload date:
  • Size: 97.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.15

File hashes

Hashes for cypstrate-0.1.5.tar.gz
Algorithm Hash digest
SHA256 856e56cd496f5b2c7e8f36196bae78ef98d4f5dac2cbfcd1974f2a4685bb5e5a
MD5 15543934d454bec93bf5b0198cad0fae
BLAKE2b-256 304c01a74584bc6e602334d3d97e951c8dbf7ade45d66c5688728696e9447559

See more details on using hashes here.

File details

Details for the file cypstrate-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: cypstrate-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 99.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.15

File hashes

Hashes for cypstrate-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c264437b73b861841b655fe8077977327362af6cff635c90031cb47ff973c5b2
MD5 78cf3926d22b22bd1d45fea709eff22e
BLAKE2b-256 c8caeba26e51eec5a6be779b309bc7e91b8afa817c2e97b4b93ef716a9f27f2c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page