Skip to main content

A python tool for numerically encoding protein sequences based on physicochemical properties

Project description

ifeatpro (Physicochemical Feature Encoder for Protein Sequences)

A python package that generates 21 numerically encoded feature representation for protein sequences based on their physicochemical properties.

Note: ifeatpro is based on iFeature, a python based toolkit available at link. Here, we have packaged 21 alignment free feature encoding functions available in iFeature into a pip installable module for easy usage and improved accessibility of a protein feature encoding tool.

ifeatpro installation

pip install ifeatpro

ifeatpro usage

from ifeatpro.features import get_feature, get_all_features

Generating some random protein sequences and storing them in fasta format

import random


AA = "ACDEFGHIKLMNPQRSTVWY"

sequences = ["".join([random.choice(AA) for _ in range(150)]) for _ in range(5)]

!mkdir -p ifeatpro_data

fasta_file = "ifeatpro_data/seq.fa"
with open(fasta_file, 'w') as f:
    for i, seq in enumerate(sequences):
        f.write(f">enz_{i}")
        f.write("\n")
        f.write(seq)
        f.write("\n")

Getting all 21 feature encodings from protein sequences using ifeatpro

ifeatpro contains 21 features which are capable of numerically encoding protein sequences based on their physicochemical properties. They are:

  1. aac
  2. apaac
  3. cksaagp
  4. cksaap
  5. ctdc
  6. ctdd
  7. ctdt
  8. ctriad
  9. dde
  10. dpc
  11. gaac
  12. gdpc
  13. geary
  14. gtpc
  15. ksctriad
  16. moran
  17. nmbroto
  18. paac
  19. qsorder
  20. socnumber
  21. tpc

Using get_all_features function, an user can create all the 21 physicochemical encoding based feature extraction techniques provided by ifeatpro. The first argument of this function denotes the fasta file that contains protein sequences while the second argument denotes the output directory where the files will be stored as csv files.

get_all_features(fasta_file, "./ifeatpro_data/")

Creating a single feature encoding using ifeatpro

An user can also create any one of the 21 feature extraction techniques available in ifeatpro using the get_feature function. The function takes the fasta file as the first argument, feature encoding type as the second argument and output directory where the file will be stored as the third argument. For example if an user wants to create aac type feature encoding using the fasta_file that we created above and would like to store it in ifeatpro_data directory, they can run the following command:

get_feature(fasta_file, "aac", "ifeatpro_data/")

feature extraction techniques description

To get a detailed description of the feature extraction techniques used in ifeatpro, please refer to the Supplementary Document of the paper link to be added soon.

similar modules to encode protein sequences

Other modules that can be used to generate numerical encoding of protein sequences are:

  1. ngrampro link
  2. pssmpro link

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ifeatpro-0.0.3.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ifeatpro-0.0.3-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file ifeatpro-0.0.3.tar.gz.

File metadata

  • Download URL: ifeatpro-0.0.3.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ifeatpro-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7d8d9d589a5e7415dbfa705236891e272ec328f27130b5633ae681f8e403fdb0
MD5 cae28f5b6e0e1fd211bffd53e1977cf7
BLAKE2b-256 49e48cd6e1563210a74477a48cb204ace5e85bb984db1d85116cfaa9461d5687

See more details on using hashes here.

File details

Details for the file ifeatpro-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: ifeatpro-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ifeatpro-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5510a94241622ef3002e7fe3d061036fbf230a15ec2bcf5fb36912b9ed453748
MD5 7c5b7d5ddb7dff5d4f0c48a7fd2a7e55
BLAKE2b-256 fec8d283e8c0246909182ba74460caa6f46b2dbbf566e47ae7a9009c0cce50da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page