Skip to main content

SENSE-PPI: Sequence-based EvolutioNary ScalE Protein-Protein Interaction prediction

Project description

SENSE-PPI

DOI - 10.1101/2023.09.19.558413 PyPI License - CeCILL Documentation Status

SENSE-PPI is a Deep Learning model for predicting physical protein-protein interactions based on amino acid sequences. It is based on embeddings generated by ESM2 and uses Siamese RNN architecture to perform a binary classification.

Installation

SENSE-PPI requires Python 3.9 or 3.10. To install the package, run:

pip install senseppi

N.B.: if you intend to use the create_dataset command to generate new datasets from STRING, do not forget to additionally install the MMseqs2 software (instructions can be found at: https://github.com/soedinglab/MMseqs2). The mmseqs command should be available in your PATH.

Usage

There are 5 commands available in the package:

  • train: trains SENSE-PPI on a given dataset
  • test: computes test metrics (AUROC, AUPRC, F1, MCC, Presicion, Recall, Accuracy) on a given dataset
  • predict: predicts interactions for a given dataset
  • predict_string: predicts interactions for a given dataset using STRING database: the interactions are taken from the STRING database (based on seed proteins). Predictions are compared with the STRING database. Optionally, the graphs can be constructed.
  • create_dataset: creates a dataset from the STRING database based on the taxonomic ID of the organism.

The package already comes with one pretrained version of the model fly_worm_human_chiken.ckpt (checkpoint with weights) that is used by default if model path is not specified. This model was trained on dataset that combined PPIs from D. melanogaster, C. elegans, H. sapiens and G. gallus, and it provides the best performance with respect to the other pretrained models.

The original SENSE-PPI repository also contains two human-based models pretrained on human PPIs: senseppi.ckpt and dscript.ckpt pretrained on SENSE-PPI and DSCRIPT human datasets respectively.

  • senseppi.ckpt: Download from here
  • dscript.ckpt : Download from here

For information about the other models that can be found in the pretrained_models folder, please refer to the original article.

N.B.: All pretrained models were made to work with proteins in range 50-800 amino acids.

  1. By running the 'predict' command the model will automatically take 1 as the minimum length and the maximum length will be the length of the longest protein in the dataset. However, it is strongly recommended to use the proteins in range 50-800 amino acids for the best performance.
  2. if you use --min_len and --max_len arguments your fasta file will be filtered automatically, so make sure you have a backup.

In order to cite the original SENSE-PPI paper, please use the following link: https://doi.org/10.1016/j.isci.2024.110371

The documentation for the package can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

senseppi-0.7.5.tar.gz (28.9 MB view details)

Uploaded Source

Built Distribution

senseppi-0.7.5-py2.py3-none-any.whl (28.9 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file senseppi-0.7.5.tar.gz.

File metadata

  • Download URL: senseppi-0.7.5.tar.gz
  • Upload date:
  • Size: 28.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for senseppi-0.7.5.tar.gz
Algorithm Hash digest
SHA256 d7cf31478608aedd0ccc3be31acbda7c2a8e5b9199627821e2ca79fbb253e44e
MD5 b9e79544c8ed8781f03f0abfd99fb520
BLAKE2b-256 1213315000d742f81dd74dc8fba8eb24161809b2c335936e0040dbeee0bed296

See more details on using hashes here.

File details

Details for the file senseppi-0.7.5-py2.py3-none-any.whl.

File metadata

  • Download URL: senseppi-0.7.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 28.9 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for senseppi-0.7.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1d8557712fa67ad77b38158ed3d836d3e6a6bff96cc8a851ebd77466af06ad22
MD5 f75818f0277af9da60c460efc63b39f6
BLAKE2b-256 05835be69c8369423956050d7ad85086c296b837bcb932d1a4c251e51a96f14d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page