Skip to main content

HemoPI2.0: A tool to predict hemolytic activity of peptides.

Project description

HemoPI2.0

A method for predicting hemolytic activity of the peptides

Introduction

HemoPI2.0 is developed for identification (Classification) as well as quantification (regression) methods for predicting hemolytic activity peptides with their hemolytic concentration (HC50 value), especially targets for mammalian red blood cells (RBCs). It uses various composition based features for predicting hemolytic activity peptides. The final model also deploys a motif-based module which has been implemented using MERCI. More information on HemoPI2.0 is available from its web server http://webs.iiitd.edu.in/raghava/hemopi2. Please read/cite the content about HemoPI2.0 for complete information including algorithm behind the approach.

PIP Installation

PIP version is also available for easy installation and usage of this tool. The following command is required to install the package

pip install hemopi2

To know about the available option for the pip package, type the following command:

hemopi2_regression -h
hemopi2_classification -h

Standalone

Standalone version of HemoPI2.0 is written in python3 and the following libraries are necessary for a successful run:

  • scikit-learn
  pip install scikit-learn==1.3.1
  • Pandas
  • Numpy
  • PyTorch: PyTorch is an open-source machine learning library. You can install it using pip (Python’s package installer). Open your terminal and type:
!pip install torch
  • Transformers: The Transformers library provides state-of-the-art machine learning models like ESM. Install it with:
!pip install transformers
  • ESM: ESM (Evolutionary Scale Modeling) is a library for protein sequence modeling.
!pip install git+https://github.com/facebookresearch/esm.git

Regression

Predicts the Hazardous Concentration (HC50) or Half Maximum Effective Concentration (EC50) in μM. This indicates the concentration at which 50% of red blood cells (RBCs) are lysed. This model operates on the Random Forest Regressor (RFR) algorithm.

Minimum USAGE To know about the available option for the standalone, type the following command:

hemopi2_regrssion -h

To run the example, type the following command:

hemopi2_regrssion -i peptide.fa

Full Usage:

Following is complete list of all options, you may get these options
usage: hemopi2_regrssion [-h] 
                     [-i INPUT]
                     [-o OUTPUT]
                     [-j {1,2,3,4}] 
                     [-d {1,2}]
                     [-wd Working Directory]
Please provide following arguments

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input: protein or peptide sequence(s) in FASTA format or single sequence per line in single letter code
  -o OUTPUT, --output OUTPUT
                        Output: File for saving results by default outfile.csv
  -j {1,2,3,4}, --job {1,2,3,4}
                        Job Type: 1: Predict, 2: Protein Scanning, 3: Design, 4: Design all possible mutants,by default 1
  -p POSITION, --Position POSITION
                        Position of mutation (1-indexed)
  -r RESIDUES, --Residues RESIDUES
                        Mutated residues (one or two of the 20 essential amino acids in upper case)
  -w {8,9,10,11,12,13,14,15,16,17,18,19,20}, --winleng {8,9,10,11,12,13,14,15,16,17,18,19,20}
                        Window Length: 8 to 20 (scan mode only), by default 8
  -d {1,2}, --display {1,2}
                        Display: 1: Hemolytic, 2: All peptides, by default 2
  -wd WORKING, --working WORKING
                        Working Directory: Location for writing results

Input File: It allow users to provide input in two format; i) FASTA format (standard) (e.g. peptide.fa) and ii) Simple Format. In case of simple format, file should have one peptide sequence in a single line in single letter code (eg. peptide.seq).

Output File: Program will save result in CSV format, in case user do not provide output file name, it will be stored in outfile.csv.

Jobs: In this program, two models have been incorporated;

  1. Prediction: Prediction for predicting given input peptide sequence as hemolytic and non-hemolytic peptide.
  2. Protein Scanning: for the prediction of hemolytic regions in a protein sequence.
  3. Design: generates mutant peptides with a single amino acid or dipeptide at particulal position provided by user and predict their hemolytic activity. Provide residue (-r) and position (-p) while using this job.
  4. Design all possible mutants: Design all possible mutants predict their hemolytic activity.

Position: User can provide position at which he/she wants insert any single amino acid or dipeptide for creating mutation. This option is available for only Design module.

Residue: Mutated residues (one or two of the 20 essential amino acids in upper case) (e.g., A for Alanine)

Window length: User can choose any pattern length between 8 and 20 in long sequences. This option is available for only protein scan module.

Working Directory: Location for writing results

Classification

Determines whether peptides are hemolytic or non-hemolytic based on their primary sequence. We have employed machine learning models and protein language models. The provided options include RF and ESM2-t6 models, as well as their hybrids with MERCI. You can select your preferred model for prediction. By default, this use the Hybrid1 (ESM2-t6+MERCI) approach, which has demonstrated best performance on our evaluation on independent dataset as well as runtime efficient.

Minimum USAGE To know about the available option for the standalone, type the following command:

hemopi2_classification -h

To run the example, type the following command:

hemopi2_classification -i peptide.fa

Full Usage:

Following is complete list of all options, you may get these options
usage: toxinpred3 [-h] 
                     [-i INPUT]
                     [-o OUTPUT]
                     [-t THRESHOLD]
                     [-j {1,2,3,4}]
                     [-m {1,2,3,4}] 
                     [-d {1,2}]
                     [-wd Working Directory]
Please provide following arguments

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input: protein or peptide sequence(s) in FASTA format or single sequence per line in single letter code
  -o OUTPUT, --output OUTPUT
                        Output: File for saving results by default outfile.csv
  -j {1,2,3,4,5}, --job {1,2,3,4,5}
                        Job Type: 1: Predict, 2: Protein Scanning, 3: Design, 4: Design all possible mutants, 5: Motif Scanning, by default 1
  -m {1,2,3,4}, --model {1,2,3,4}
                        Model: 1: Random Forest, 2: Hybrid1 (RF+MERCI), 3: ESM2-t6, 4: Hybrid2 (ESM+MERCI) by default 4
  -t THRESHOLD, --threshold THRESHOLD
                        Threshold: Value between 0 to 1 by default 0.46 (For RF and Hybrid1) and 0.55 (For ESM and Hybrid2)
  -p POSITION, --Position POSITION
                        Position of mutation (1-indexed)
  -r RESIDUES, --Residues RESIDUES
                        Mutated residues (one or two of the 20 essential amino acids in upper case)
  -w {8,9,10,11,12,13,14,15,16,17,18,19,20}, --winleng {8,9,10,11,12,13,14,15,16,17,18,19,20}
                        Window Length: 8 to 20 (scan mode only), by default 8
  -wd WORKING, --working WORKING
                        Working Directory: Location for writing results
  -d {1,2}, --display {1,2}
                        Display: 1: Hemolytic, 2: All peptides, by default 2

Input File: It allow users to provide input in two format; i) FASTA format (standard) (e.g. peptide.fa) and ii) Simple Format. In case of simple format, file should have one peptide sequence in a single line in single letter code (eg. peptide.seq).

Output File: Program will save result in CSV format, in case user do not provide output file name, it will be stored in outfile.csv.

Threshold: User should provide threshold between 0 and 1, please note score is proportional to hemolytic potential of peptide.

Jobs: In this program, two models have been incorporated;

  1. Prediction: Prediction for predicting given input peptide sequence as hemolytic and non-hemolytic peptide.
  2. Protein Scanning: for the prediction of hemolytic regions in a protein sequence.
  3. Design: generates mutant peptides with a single amino acid or dipeptide at particulal position provided by user and predict their hemolytic activity. Provide residue (-r) and position (-p) while using this job.
  4. Design all possible mutants: Design all possible mutants predict their hemolytic activity.
  5. Motif Scanning: This job facilitates users in scanning or mapping hemolytic motifs within the query sequence using MERCI.

Models: In this program, four models have been incorporated;
i) Model1 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using Random Forest (RF) algorithm based on various composition based features using Pfeature tool of the peptide;

ii) Model3 Model1 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using protein language model ESM2-t6.

iii) Model2 & Model4 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using Hybrid approach, the first ensemble is ESM2-t6 and MERCI second is RF and MERCI. It combines the scores generated from machine learning (ET) and protein language model (ESM2-t6), and MERCI as Hybrid Score, and the prediction is based on Hybrid Score.

Position: User can provide position at which he/she wants insert any single amino acid or dipeptide for creating mutation. This option is available for only Design module.

Residue: Mutated residues (one or two of the 20 essential amino acids in upper case) (e.g., A for Alanine)

Window length: User can choose any pattern length between 8 and 20 in long sequences. This option is available for only protein scan module.

Working Directory: Location for writing results

HemoPI2.0 Package Files

It contain following files, brief description of these files given below

INSTALLATION : Installation instructions

LICENSE : License information

merci : This folder contains the program to run MERCI

README.md : This file provide information about this package

hemopi2_regrssion : Python program for regrssion

hemopi2_classification : Python program for classification

peptide.fa : Example file contain peptide sequences in FASTA format

peptide.seq : Example file contain peptide sequences in simple format

Installation via PIP

User can install Hemopi2.0 via PIP also

pip install hemopi2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hemopi2-1.3.tar.gz (84.3 MB view details)

Uploaded Source

Built Distribution

hemopi2-1.3-py3-none-any.whl (85.7 MB view details)

Uploaded Python 3

File details

Details for the file hemopi2-1.3.tar.gz.

File metadata

  • Download URL: hemopi2-1.3.tar.gz
  • Upload date:
  • Size: 84.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for hemopi2-1.3.tar.gz
Algorithm Hash digest
SHA256 4fb1b0ffa4454863d598c1b84c0d3d45829fd541cbd5e6338439e9173cdd968d
MD5 4f4d6914622907a2d08cbef11a267e50
BLAKE2b-256 72b833ca8c247b59f6e7aedc6ff7f6b293bc6e8e9913b4e0019719b26123fb27

See more details on using hashes here.

File details

Details for the file hemopi2-1.3-py3-none-any.whl.

File metadata

  • Download URL: hemopi2-1.3-py3-none-any.whl
  • Upload date:
  • Size: 85.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for hemopi2-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 53ee834b39ffa26b271674412d2b43ebaadcb27c5eedba376e5d8f4ab6feb34b
MD5 5c2947f34dc0053e19b7a48f7ecd87a3
BLAKE2b-256 b58ddb9100d4184e93144b961249bcefb0c1cf235a29e632921aac758b21ff31

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page