HemoPI2.0: A tool to predict hemolytic activity of peptides.
Project description
HemoPI2.0
A method for predicting hemolytic activity of the peptides
Introduction
HemoPI2.0 is developed for identification (Classification) as well as quantification (regression) methods for predicting hemolytic activity peptides with their hemolytic concentration (HC50 value), especially targets for mammalian red blood cells (RBCs). It uses various composition based features for predicting hemolytic activity peptides. The final model also deploys a motif-based module which has been implemented using MERCI. More information on HemoPI2.0 is available from its web server http://webs.iiitd.edu.in/raghava/hemopi2. Please read/cite the content about HemoPI2.0 for complete information including algorithm behind the approach.
PIP Installation
PIP version is also available for easy installation and usage of this tool. The following command is required to install the package
pip install hemopi2
To know about the available option for the pip package, type the following command:
hemopi2_regression -h
hemopi2_classification -h
Standalone
Standalone version of HemoPI2.0 is written in python3 and the following libraries are necessary for a successful run:
- scikit-learn
pip install scikit-learn==1.3.1
- Pandas
- Numpy
- PyTorch: PyTorch is an open-source machine learning library. You can install it using pip (Python’s package installer). Open your terminal and type:
!pip install torch
- Transformers: The Transformers library provides state-of-the-art machine learning models like ESM. Install it with:
!pip install transformers
- ESM: ESM (Evolutionary Scale Modeling) is a library for protein sequence modeling.
!pip install git+https://github.com/facebookresearch/esm.git
Regression
Predicts the Hazardous Concentration (HC50) or Half Maximum Effective Concentration (EC50) in μM. This indicates the concentration at which 50% of red blood cells (RBCs) are lysed. This model operates on the Random Forest Regressor (RFR) algorithm.
Minimum USAGE To know about the available option for the standalone, type the following command:
hemopi2_regrssion -h
To run the example, type the following command:
hemopi2_regrssion -i peptide.fa
Full Usage:
Following is complete list of all options, you may get these options
usage: hemopi2_regrssion [-h]
[-i INPUT]
[-o OUTPUT]
[-j {1,2,3,4}]
[-d {1,2}]
[-wd Working Directory]
Please provide following arguments
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input: protein or peptide sequence(s) in FASTA format or single sequence per line in single letter code
-o OUTPUT, --output OUTPUT
Output: File for saving results by default outfile.csv
-j {1,2,3,4}, --job {1,2,3,4}
Job Type: 1: Predict, 2: Protein Scanning, 3: Design, 4: Design all possible mutants,by default 1
-p POSITION, --Position POSITION
Position of mutation (1-indexed)
-r RESIDUES, --Residues RESIDUES
Mutated residues (one or two of the 20 essential amino acids in upper case)
-w {8,9,10,11,12,13,14,15,16,17,18,19,20}, --winleng {8,9,10,11,12,13,14,15,16,17,18,19,20}
Window Length: 8 to 20 (scan mode only), by default 8
-d {1,2}, --display {1,2}
Display: 1: Hemolytic, 2: All peptides, by default 2
-wd WORKING, --working WORKING
Working Directory: Location for writing results
Input File: It allow users to provide input in two format; i) FASTA format (standard) (e.g. peptide.fa) and ii) Simple Format. In case of simple format, file should have one peptide sequence in a single line in single letter code (eg. peptide.seq).
Output File: Program will save result in CSV format, in case user do not provide output file name, it will be stored in outfile.csv.
Jobs: In this program, two models have been incorporated;
- Prediction: Prediction for predicting given input peptide sequence as hemolytic and non-hemolytic peptide.
- Protein Scanning: for the prediction of hemolytic regions in a protein sequence.
- Design: generates mutant peptides with a single amino acid or dipeptide at particulal position provided by user and predict their hemolytic activity. Provide residue (-r) and position (-p) while using this job.
- Design all possible mutants: Design all possible mutants predict their hemolytic activity.
Position: User can provide position at which he/she wants insert any single amino acid or dipeptide for creating mutation. This option is available for only Design module.
Residue: Mutated residues (one or two of the 20 essential amino acids in upper case) (e.g., A for Alanine)
Window length: User can choose any pattern length between 8 and 20 in long sequences. This option is available for only protein scan module.
Working Directory: Location for writing results
Classification
Determines whether peptides are hemolytic or non-hemolytic based on their primary sequence. We have employed machine learning models and protein language models. The provided options include RF and ESM2-t6 models, as well as their hybrids with MERCI. You can select your preferred model for prediction. By default, this use the Hybrid1 (ESM2-t6+MERCI) approach, which has demonstrated best performance on our evaluation on independent dataset as well as runtime efficient.
Minimum USAGE To know about the available option for the standalone, type the following command:
hemopi2_classification -h
To run the example, type the following command:
hemopi2_classification -i peptide.fa
Full Usage:
Following is complete list of all options, you may get these options
usage: toxinpred3 [-h]
[-i INPUT]
[-o OUTPUT]
[-t THRESHOLD]
[-j {1,2,3,4}]
[-m {1,2,3,4}]
[-d {1,2}]
[-wd Working Directory]
Please provide following arguments
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input: protein or peptide sequence(s) in FASTA format or single sequence per line in single letter code
-o OUTPUT, --output OUTPUT
Output: File for saving results by default outfile.csv
-j {1,2,3,4,5}, --job {1,2,3,4,5}
Job Type: 1: Predict, 2: Protein Scanning, 3: Design, 4: Design all possible mutants, 5: Motif Scanning, by default 1
-m {1,2,3,4}, --model {1,2,3,4}
Model: 1: Random Forest, 2: Hybrid1 (RF+MERCI), 3: ESM2-t6, 4: Hybrid2 (ESM+MERCI) by default 4
-t THRESHOLD, --threshold THRESHOLD
Threshold: Value between 0 to 1 by default 0.46 (For RF and Hybrid1) and 0.55 (For ESM and Hybrid2)
-p POSITION, --Position POSITION
Position of mutation (1-indexed)
-r RESIDUES, --Residues RESIDUES
Mutated residues (one or two of the 20 essential amino acids in upper case)
-w {8,9,10,11,12,13,14,15,16,17,18,19,20}, --winleng {8,9,10,11,12,13,14,15,16,17,18,19,20}
Window Length: 8 to 20 (scan mode only), by default 8
-wd WORKING, --working WORKING
Working Directory: Location for writing results
-d {1,2}, --display {1,2}
Display: 1: Hemolytic, 2: All peptides, by default 2
Input File: It allow users to provide input in two format; i) FASTA format (standard) (e.g. peptide.fa) and ii) Simple Format. In case of simple format, file should have one peptide sequence in a single line in single letter code (eg. peptide.seq).
Output File: Program will save result in CSV format, in case user do not provide output file name, it will be stored in outfile.csv.
Threshold: User should provide threshold between 0 and 1, please note score is proportional to hemolytic potential of peptide.
Jobs: In this program, two models have been incorporated;
- Prediction: Prediction for predicting given input peptide sequence as hemolytic and non-hemolytic peptide.
- Protein Scanning: for the prediction of hemolytic regions in a protein sequence.
- Design: generates mutant peptides with a single amino acid or dipeptide at particulal position provided by user and predict their hemolytic activity. Provide residue (-r) and position (-p) while using this job.
- Design all possible mutants: Design all possible mutants predict their hemolytic activity.
- Motif Scanning: This job facilitates users in scanning or mapping hemolytic motifs within the query sequence using MERCI.
Models: In this program, four models have been incorporated;
i) Model1 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using Random Forest (RF) algorithm based on various composition based features using Pfeature tool of the peptide;
ii) Model3 Model1 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using protein language model ESM2-t6.
iii) Model2 & Model4 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using Hybrid approach, the first ensemble is ESM2-t6 and MERCI second is RF and MERCI. It combines the scores generated from machine learning (ET) and protein language model (ESM2-t6), and MERCI as Hybrid Score, and the prediction is based on Hybrid Score.
Position: User can provide position at which he/she wants insert any single amino acid or dipeptide for creating mutation. This option is available for only Design module.
Residue: Mutated residues (one or two of the 20 essential amino acids in upper case) (e.g., A for Alanine)
Window length: User can choose any pattern length between 8 and 20 in long sequences. This option is available for only protein scan module.
Working Directory: Location for writing results
HemoPI2.0 Package Files
It contain following files, brief description of these files given below
INSTALLATION : Installation instructions
LICENSE : License information
merci : This folder contains the program to run MERCI
README.md : This file provide information about this package
hemopi2_regrssion : Python program for regrssion
hemopi2_classification : Python program for classification
peptide.fa : Example file contain peptide sequences in FASTA format
peptide.seq : Example file contain peptide sequences in simple format
Installation via PIP
User can install Hemopi2.0 via PIP also
pip install hemopi2
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hemopi2-1.2.tar.gz
.
File metadata
- Download URL: hemopi2-1.2.tar.gz
- Upload date:
- Size: 84.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f58eff3d9810546416f5740d9b91d6c15d97dd87c71ce57292bf7db1a29b7e7 |
|
MD5 | b58a70128fbb779feae382f6a4fc51fd |
|
BLAKE2b-256 | 389ed44adc33a9f0b470627722ae0ea73ed175d69ff9bb2122b0ad097471c9ac |
File details
Details for the file hemopi2-1.2-py3-none-any.whl
.
File metadata
- Download URL: hemopi2-1.2-py3-none-any.whl
- Upload date:
- Size: 85.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ec5db644f3bbbb5c1716f227179e97bbfb6b9c19513575ef9c994b9d483c475 |
|
MD5 | 6875d698fe496cebde927cbc1de74597 |
|
BLAKE2b-256 | 181e408efea985a08e6b8a0319533ed24ef8767068be70acf9c8f0400428079b |