Skip to main content

SENSE-PPI: Sequence-based EvolutioNary ScalE Protein-Protein Interaction prediction

Project description

SENSE-PPI

DOI - 10.1101/2023.09.19.558413 PyPI License - CeCILL Documentation Status

SENSE-PPI is a Deep Learning model for predicting physical protein-protein interactions based on amino acid sequences. It is based on embeddings generated by ESM2 and uses Siamese RNN architecture to perform a binary classification.

Installation

SENSE-PPI requires Python 3.9 or higher. To install the package, run:

pip install senseppi

N.B.: if you intend to use the create_dataset command to generate new datasets from STRING, do not forget to additionally install the MMseqs2 software (instructions can be found at: https://github.com/soedinglab/MMseqs2). The mmseqs command should be available in your PATH.

Usage

There are 5 commands available in the package:

  • train: trains SENSE-PPI on a given dataset
  • test: computes test metrics (AUROC, AUPRC, F1, MCC, Presicion, Recall, Accuracy) on a given dataset
  • predict: predicts interactions for a given dataset
  • predict_string: predicts interactions for a given dataset using STRING database: the interactions are taken from the STRING database (based on seed proteins). Predictions are compared with the STRING database. Optionally, the graphs can be constructed.
  • create_dataset: creates a dataset from the STRING database based on the taxonomic ID of the organism.

The package already comes with one pretrained version of the model fly_worm_human_chiken.ckpt (checkpoint with weights) that is used by default if model path is not specified. This model was trained on dataset that combined PPIs from D. melanogaster, C. elegans, H. sapiens and G. gallus, and it provides the best performance with respect to the other pretrained models.

The original SENSE-PPI repository also contains two human-based models pretrained on human PPIs: senseppi.ckpt and dscript.ckpt pretrained on SENSE-PPI and DSCRIPT human datasets respectively.

  • senseppi.ckpt: Download from here
  • dscript.ckpt : Download from here

For information about the other models that can be found in the pretrained_models folder, please refer to the original article.

N.B.: All pretrained models were made to work with proteins in range 50-800 amino acids.

  1. By running the 'predict' command the model will automatically take 1 as the minimum length and the maximum length will be the length of the longest protein in the dataset. However, it is strongly recommended to use the proteins in range 50-800 amino acids for the best performance.
  2. if you use --min_len and --max_len arguments your fasta file will be filtered automatically, so make sure you have a backup.

In order to cite the original SENSE-PPI paper, please use the following link: https://doi.org/10.1101/2023.09.19.558413

The documentation for the package can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

senseppi-0.7.2.tar.gz (28.9 MB view hashes)

Uploaded Source

Built Distribution

senseppi-0.7.2-py2.py3-none-any.whl (28.9 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page