Skip to main content

Predicting enzyme catalytic optimum temperature with ML

Project description

TOMER: Temperature Optima for Enzymes with Resampling

TOMER is a Python package for predicting the catalytic optimum temperature (Topt) of enzymes with machine learning. TOMER was trained with a bagging ensemble on a dataset of 2,917 proteins. To prevent large error on the prediction of higher temperature values, resampling strategies were applied to mitigate the effects of the imbalanced distribution of the dataset. Code for design of TOMER can be found here.

Citation

If you find TOMER useful, please cite:

  • Gado, J.E., Beckham, G.T., and Payne, C.M (2020). Improving enzyme optimum temperature prediction with resampling strategies and ensemble learning. J. Chem. Inf. Model. 60(8), 4098-4107.

Installation

Install with pip

pip install tomer

Or from source (preferred).

git clone https://github.com/jafetgado/tomer.git
cd tomer
pip install -r requirements.txt
python setup.py install

Prerequisites

(version used in this work)

  1. Python 3 (3.6.6)

  2. scikit-learn (0.21.2)

  3. numpy (1.19.5)

  4. pandas (0.24.1)

Usage

There are two main functions in TOMER for predicting the enzyme optimum temperature: pred_seq_topt, which predicts optimum temperature of a single protein sequence (in string format), and pred_fasta_topt, which predicts the optimum temperatures of protein sequences in a fasta file. To use these functions, you need to specify the optimal growth temperature (OGT) of the source organism of the protein. If the OGT is not known, a prediction may be obtained using TOME.

Examples

import tomer

# Predict optimum temperature of a single sequence.
sequence = '''MKKQVVEVLVEGGKATPGPPLGPAIGPLGLNVKQVVDKINEATKEFAGMQVPVKIIV
              DPVTKQFEIEVGVPPTSQLIKKELGLEKGSGEPKHNIVGNLTMEQVIKIAKMKRSQML
              ALTLKAAAKEVIGTALSMGVTVEGKDPRIVQREIDEGVYDELFEKAEKE'''
ogt = 95
y_pred, y_err = tomer.pred_seq_topt(sequence, ogt)

print(y_pred)   # predicted optimum temperature
84.4

print(y_err)    # Standard error of prediction (from 100 base learners in ensemble)
1.929

# Predict optimum temperatures of sequences in fasta file
fasta_file = 'test/sequences.fasta'
ogt_file = 'test/ogts.txt'
df = tomer.pred_fasta_topt(fasta_file, ogt_file) # returns Pandas dataframe

print(df)
      ID     Topt    Std err
0   P43408  79.345   1.53561
1   Q97X08  81.705  0.442442
2   F8A9V0   76.37   1.16195

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tomer-1.0-py3-none-any.whl (839.1 kB view details)

Uploaded Python 3

File details

Details for the file tomer-1.0-py3-none-any.whl.

File metadata

  • Download URL: tomer-1.0-py3-none-any.whl
  • Upload date:
  • Size: 839.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.56.1 CPython/3.6.5

File hashes

Hashes for tomer-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 398e54fcdca5e7d4ed3ad4f9e320fc446bb5fcb3b48a48f408af6cb28d872bae
MD5 07c311856cd2088f8fe1c642652880ca
BLAKE2b-256 0e25c5add678f7feb2f4e885263b112d5062eb9ba9fb96d6859d15d6c8306c82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page