Skip to main content

A tool to compute the features of protein and peptide sequences

Project description

Pfeature

Introduction

Pfeature is developed for computing wide range of protein and peptides features from their amino acid sequences. More information on Pfeature is available from its web server https://webs.iiitd.edu.in/raghava/pfeature. This page provide information about standalone version of Pfeature. This standalone contains three scripts, their description is as follows:

#############################################################################################################################################################################################################################

1: Standalone for calculating composition based features:

Important: To run this script 'Data' folder should be in the same directory.

Minimum USAGE: Minimum ussage is "pfeature_comp -i protein.fa" where protein.fa is a input fasta file. This will calculate the amino acid composition of the seqeunces provided in the fasta file. It will use other parameters by default. It will save output in "pfeature_result.csv" in CSV (comma seperated variables).

#Full Usage: Following is complete list of all options, you may get these options by "pfeature_comp.py -h"

usage: pfeature_comp [-h] -i INPUT [-o OUTPUT]
                        [-j {AAC,DPC,TPC,ATC,BTC,PCP,AAI,RRI,PRI,DDR,SEP,SER,SPC,ACR,CTC,CeTD,PAAC,APAAC,QSO,SOC,ALLCOMP}]
                        [-n N_TERMINAL] [-c C_TERMINAL] [-nct NC_TERMINAL]
                        [-rn REST_N] [-rc REST_C] [-s SPLIT] [-d LAG]
                        [-w WEIGHT] [-t PWEIGHT]
optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input: protein or protein.sequence in FASTA format or single sequence per line in single letter code
  -o OUTPUT, --output OUTPUT
                        Output: File for saving results by default pfeature_result.csv
  -j {AAC,DPC,TPC,ATC,BTC,PCP,AAI,RRI,PRI,DDR,SEP,SER,SPC,ACR,CTC,CeTD,PAAC,APAAC,QSO,SOC,ALLCOMP}, --job {AAC,DPC,TPC,ATC,BTC,PCP,AAI,RRI,PRI,DDR,SEP,SER,SPC,ACR,CTC,CeTD,PAAC,APAAC,QSO,SOC,ALLCOMP}
                        Job Type:
                        AAC: Amino acid composition
                        DPC: Dipeptide composition
                        TPC: Tripeptide composition
                        ATC: Atomic composition
                        BTC: Bond composition
                        PCP: Physico-chemical properties composition
                        AAI: Amino-acid indices composition
                        RRI: Residue repeat information
                        PRI: Physico-chemical properties repeat information
                        DDR: Distance distribution of residues
                        SEP: Shannon entropy of protein
                        SER: Shannon entropy of residues
                        SPC: Shannon entropy of physico-chemical properties
                        ACR: Autocorrelation descriptors
                        CTC: Conjoint triad descriptors
                        CeTD: Composition enhanced transition distribution
                        PAAC: Pseudo amino acid composition
                        APAAC: Amphiphilic pseudo amino acid composition
                        QSO: Quasi sequence order
                        SOC: Sequence order coupling number
                        ALLCOMP:All composition features together except ACR and AAI
                        by default AAC
  -n N_TERMINAL, --n_terminal N_TERMINAL
                        Window Length from N-terminal: by default 0
  -c C_TERMINAL, --c_terminal C_TERMINAL
                        Window Length from C-terminal: by default 0
  -nct NC_TERMINAL, --nc_terminal NC_TERMINAL
                        Residues from N- and C-terminal: by default 0
  -rn REST_N, --rest_n REST_N
                        Number of residues removed from N-terminal, by default 0
  -rc REST_C, --rest_c REST_C
                        Number of residues removed from C-terminal, by default 0
  -s SPLIT, --split SPLIT
                        Number of splits a sequence divided into, by default 0
  -d LAG, --lag LAG     This represents the order of gap, lag or dipeptide, by default 1
  -w WEIGHT, --weight WEIGHT
                        Weighting Factor for QSO: Value between 0 to 1, by default 0.1
  -t PWEIGHT, --pweight PWEIGHT
                        Weighting factor for pseudo and amphiphlic pseudo amino acid composition: Value between 0 to 1, by default 0.05

#Parameters Description:

Input File: It allow users to provide input in two format; i) FASTA format (standard) (e.g. protein.fa) ii) Simple Format, in this case, file should have sequences in a single line in single letter code (eg. protein.seq).

Output File: Program will save result in CSV format, in the provided filename. In case user do not provide output file name, it will be stored in pfeature_results.csv. In case user want to calculate all the features except AAI and ACR, the job name will be 'ALLCOMP'. Reason to leave AAI and ACR is, the feature calculation takes long time for longer sequences.

Job name: It allows users to choose the type of composition, the user want to calculate, such as AAC which stands for Amino Acid composition. In case user do not provide any job name, it will choose AAC by default.

N-terminal: It allows user to cut the specific number of residues from the N-terminal of the sequences.

C-terminal: It allows user to cut the specific number of residues from the C-terminal of the sequences.

NCT-terminal: It allows user to cut the specific number of residues from the N- and C-terminal of the sequences, and join them.

Rest_N : It allow users to drop the specific number of residues from N-terminal, and perform operations on the rest.

Rest_C : It allow users to drop the specific number of residues from C-terminal, and perform operations on the rest.

Split: It allow users to divided the sequence into number of sequences.

Lag : It defines the value for order of lag, lambda, gap or dipeptide, to calculate certain features.

Weight: It defines the weight factor to calculate the quasi-sequence order, by default it is set at 0.1.

Pweight: It defines the weight factor to calculate the pseudo and amphiphlic pseudo amino acid composition, by default it is set at 0.05.

#############################################################################################################################################################################################################################

2: Standalone for calculating binary profiles based features:

Important: To run this script 'Data' folder should be in the same directory.

Minimum USAGE: Minimum ussage is "pfeature_bin.py -i protein.fa" where protein.fa is a input fasta file. This will calculate the amino acid binary profile of the seqeunces provided in the fasta file. It will use other parameters by default. It will save output in "pfeature_result.csv" in CSV (comma seperated variables).

usage: pfeature_bin [-h] -i INPUT [-o OUTPUT]
                       [-j {AAB,DPB,ATB,BTB,PCB,AIB,ALLBIN}] [-n N_TERMINAL]
                       [-c C_TERMINAL] [-nct NC_TERMINAL] [-rn REST_N]
                       [-rc REST_C] [-s SPLIT] [-d LAG]
Please provide following arguments

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input: protein or protein.sequence in FASTA format or single sequence per line in single letter code
  -o OUTPUT, --output OUTPUT
                        Output: File for saving results by default pfeature_result.csv
  -j {AAB,DPB,ATB,BTB,PCB,AIB,ALLBIN}, --job {AAB,DPB,ATB,BTB,PCB,AIB,ALLBIN}
                        Job Type:
                        AAB: Amino acid based binary profile
                        DPB: Dipeptide based binary profile
                        ATB: Atom based binary profile
                        BTB: Bond based binary profile
                        PCB: Physico-chemical properties based binary profile
                        AIB: Amino-acid indices based binary profile
                        ALLBIN:All binary profiles together except ATB and BTB
                        by default AAB
  -n N_TERMINAL, --n_terminal N_TERMINAL
                        Window Length from N-terminal: by default 0
  -c C_TERMINAL, --c_terminal C_TERMINAL
                        Window Length from C-terminal: by default 0
  -nct NC_TERMINAL, --nc_terminal NC_TERMINAL
                        Residues from N- and C-terminal: by default 0
  -rn REST_N, --rest_n REST_N
                        Number of residues removed from N-terminal, by default 0
  -rc REST_C, --rest_c REST_C
                        Number of residues removed from C-terminal, by default 0
  -s SPLIT, --split SPLIT
                        Number of splits a sequence divided into, by default 0
  -d LAG, --lag LAG     This represents the order of gap, lag or dipeptide, by default 1

#Parameters Description:

Input File: It allow users to provide input in two format; i) FASTA format (standard) (e.g. protein.fa) ii) Simple Format, in this case, file should have sequences in a single line in single letter code (eg. protein.seq).

Output File: Program will save result in CSV format, in the provided filename. In case user do not provide output file name, it will be stored in pfeature_results.csv. In case user want to calculate all the features except ATB and BTB, the job name will be 'ALLBIN'. Reason to leave ATB and BTB is, the number of atoms and bonds are not equal in all amino acid residues.

Job name: It allows users to choose the type of composition, the user want to calculate, such as AAB which stands for Amino Acid based binary profile. In case user do not provide any job name, it will choose AAB by default.

N-terminal: It allows user to cut the specific number of residues from the N-terminal of the sequences.

C-terminal: It allows user to cut the specific number of residues from the C-terminal of the sequences.

NCT-terminal: It allows user to cut the specific number of residues from the N- and C-terminal of the sequences, and join them.

Rest_N : It allow users to drop the specific number of residues from N-terminal, and perform operations on the rest.

Rest_C : It allow users to drop the specific number of residues from C-terminal, and perform operations on the rest.

Split: It allow users to divided the sequence into number of sequences.

Lag : It defines the value for order of dipeptide, to calculate the dipeptide based binary profiles.

#############################################################################################################################################################################################################################

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pfeature-1.4.tar.gz (135.0 kB view details)

Uploaded Source

Built Distribution

pfeature-1.4-py3-none-any.whl (131.7 kB view details)

Uploaded Python 3

File details

Details for the file pfeature-1.4.tar.gz.

File metadata

  • Download URL: pfeature-1.4.tar.gz
  • Upload date:
  • Size: 135.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pfeature-1.4.tar.gz
Algorithm Hash digest
SHA256 009c4c909bb2b8344553610d2000e03d28b54cb720392f02a4f93801ba4fe643
MD5 bd4db4cc040557f546abf7dff4177284
BLAKE2b-256 658315070594fa6d55034c283315cce076e6a00e8464587693e963c4a2d134f8

See more details on using hashes here.

File details

Details for the file pfeature-1.4-py3-none-any.whl.

File metadata

  • Download URL: pfeature-1.4-py3-none-any.whl
  • Upload date:
  • Size: 131.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pfeature-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9627b3f6438749a123694ef647e9788c2936b9b07165562d7daae6b03cdbd6f3
MD5 c01d2a89663a41c6c9d40dd310176671
BLAKE2b-256 c591ff9b5e2f6c3d8ba510d720b43ae83aa733273546345fea239a3c7c37aa9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page