Tool for predicting a PTS1 (peroxisomal targeting sequence 1) in a amino acid sequence.
Project description
pts1-prediction-tool
This project offers a classification-algorithm for predicting the peroxisomal targeting signal 1 (PTS1) in a given amino acid sequence (primary structure of a protein).
Installation
pip install pts1-prediction-tool==1.0.0
Userguide:
Simple example for using the pts1-prediction-tool in your application.
from pts1_prediction_tool.pts1_prediction import PTS1_Predictor
#Instantiates the svm and creates the prediction model
predictor = PTS1_Predictor()
aminoacid_sequence = "MMMMMKLSKMLLLSLSKLSKLSKLSKL"
# Checks a amino acid sequence for an existing PTS1
result = predictor.check_for_pts1(aminoacid_sequence)
print(result.isPeroxisomal)
Run tests with:
pipenv shell
python -m unittest tests/pts1_prediction_tests.py
Algorithm:
The used classification-algorithm is a support vector machine (svm, sklearn.svm.SVC) from https://scikit-learn.org/.
This machine learning algorithm was trained to predict the PTS1 in a amino acid sequence (aa_sequence) with a dataset of
514 PTS1/peroxisomal and 11.337 not peroxisomal aa_sequences.
The peroxisomal dataset was generated out of 2324 peroxisomal aa_sequences, which were filtered for the c-terminal-pts1-tripeptide
(S, A, C, P, H, T, N, Q, E, G, V) / (K, R, H, Q, D, N, S, M) / (L, F, I, M, Y)*
For the training of the svm, the last 14 C-terminal amino acids of the sequence are used.
The optimal parameters for the svm and the used c-terminal length were determined
by 5-fold-cross validation. For this the perxosiomal and not peroxisomal trainingsets were merged and
separated into 80 % training-sets and 20 % validation-sets
The final svm has the following statistical average quantities:
- Specifity = 1.0
- Sensitivity = 0.86
- Precision = 0.98
Learning data for the svm
The aa_sequences for the learning-sets are downloaded from UniProt (https://www.uniprot.org/), 23.07.2020.
An example application will be published on my website, https://olis-lab.de/
@Copyright 2021, Oliver Koch
Sources:
- Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
- https://scikit-learn.org/, 11.04.2021
- https://www.uniprot.org/), 23.07.2020
- https://biopython.org/, 11.04.2021
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pts1_prediction_tool-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b8969a7aaeac0733e6711dbeea386b0c808534f60282cdd3d9745f8aab0f478 |
|
MD5 | 07ee18486a965f6397ca4a6d70ba07b0 |
|
BLAKE2b-256 | 5afd1376d2e469448e4d4e595c6e016bdb44a0d0431e22ecfefd15f9aef0da08 |
Hashes for pts1_prediction_tool-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a2b709f30190bf3c9048e0a499d984b3c7540aa3da3ae7c70a9c902d0799d70 |
|
MD5 | 04553750b0afe288e0d70e38a1b642a5 |
|
BLAKE2b-256 | f6f7aaf0b33d9b86c84f83d0b7c2a35d2378445d83eeeb1a782645eec3ef6885 |