Skip to main content

a program to find cross-reactive epitopes with structural information from known protein structures.

Project description

Cross-Reactive-Epitope-Search-using-Structural-Properties-of-proteins (CRESSP)

A program to find cross-reactive epitopes with structural information from known protein structures.

Introduction

Our novel pipeline, called Cross-ReactiveEpitope-Search-using-Structural-Properties-of-proteins (CRESSP), use structural information from RCSB-PDB database to search potential cross-reactive B-cell epitopes of human and pathogen proteins.

First, protein sequences of interest (provided by user) are searched with either BLASTP alone or in combination with HMMER3 with an HMM profile database built from ~5000 Pan-Proteomes from the UniProt database.

Second, using pre-computed experimental and predicted relative-surface-availability (RSA) values of human protein residues, the alignment between human and pathogen proteins are further analyzed to identify potential cross-reactive B-cell epitopes. The RSA-weighted BLOSUM62 scores are calculated with an array of sliding windows (provided by user) to estimate possible cross-reactivity between two proteins.

Lastly, the output file from our pipeline can be visualized interactively with web-browser-based application (similar to the interactive interface in our web-application for SARS-CoV-2, http://ahs2202.github.io/3M/)

Currently, we are additionally implementing neural-network-based (bi-directional stacked RNN) surface-availability prediction module in our tool to efficiently predict RSA values of any protein sequences of interests so that CRESSP can predict B-cell cross-reactivity between any proteome of interest, including proteins from metagenome-assembled genomes.

Installation

CRESSP requires BLASTP from BLAST+ (v2.10.1+) and HMMER3, which can be installed through conda with the following commands:

    conda install -c bioconda blast
    conda install -c bioconda hmmer

BLAST+ binaries can be also downloaded from here.

CRESSP can be installed from PyPI (https://pypi.org/project/cressp/)

    pip install cressp

CRESSP use TensorFlow (>2.3.0) for prediction of relative surface area (RSA) and secondary structure classification. If CUDA-enabled GPU is available, CRESSP will automatically use GPU for prediction of structural properties of input proteins.

Usage

usage: cressp [-h] [-t DIR_FILE_PROTEIN_TARGET] [-q DIR_FILE_PROTEIN_QUERY] [-o DIR_FOLDER_OUTPUT] [-c CPU] [-w WINDOW_SIZE] [-s FLOAT_THRES_AVG_SCORE_BLOSUM_WEIGHTED] [-e FLOAT_THRES_E_VALUE] [-H]
              [-d DIR_FILE_QUERY_HMMDB] [-Q]


arguments:
  -h, --help            show this help message and exit
  -t DIR_FILE_PROTEIN_TARGET, --dir_file_protein_target DIR_FILE_PROTEIN_TARGET
                        (Required) an input FASTA file containing target protein sequences.
  -q DIR_FILE_PROTEIN_QUERY, --dir_file_protein_query DIR_FILE_PROTEIN_QUERY
                        (Default: UniProt human proteins) an input FASTA file containing query protein sequences.
  -o DIR_FOLDER_OUTPUT, --dir_folder_output DIR_FOLDER_OUTPUT
                        (Default: a subdirectory of the current directory) an output directory
  -c CPU, --cpu CPU     (Default: 1) Number of logical CPUs (threads) to use in the current compute node.
  -w WINDOW_SIZE, --window_size WINDOW_SIZE
                        (Default: 30) list of window sizes separated by comma. Example: 15,30,45
  -s FLOAT_THRES_AVG_SCORE_BLOSUM_WEIGHTED, --float_thres_avg_score_blosum_weighted FLOAT_THRES_AVG_SCORE_BLOSUM_WEIGHTED
                        (Default: 0.15) threshold for average weighted BLOSOM62 alignment score for filtering aligned sequences
  -e FLOAT_THRES_E_VALUE, --float_thres_e_value FLOAT_THRES_E_VALUE
                        (Default: 1e-20) threshold for the global alignment e-value in a scientific notation Example: 1e-3
  -H, --flag_use_HMM_search
                        (Default: False) Set this flag to perform HMM search in addition to BLASTP search. HMM profile search is performed with HMMER3. The search usually takes several hours for
                        metagenome-assembled genomes
  -d DIR_FILE_QUERY_HMMDB, --dir_file_query_hmmdb DIR_FILE_QUERY_HMMDB
                        (Default: a HMM profile database of 1012 human proteins searched against UniProt Pan Proteomes. These proteins consist of experimentally validated human autoantigens) a file
                        containing HMM DB of query proteins aligned against pan-proteomes
  -Q, --flag_skip_struc_prop_for_protein_target
                        (Default: False) Set this flag to skip the estimation of structural properties of target proteins. Only structural properties of query proteins will be used to calculate
                        accessibility-weighted-similarity scores

Tutorial

Download the proteome of SARS-CoV-2 from UniProt (UP000464024) as a fasta sequence

run the following command to

  1. Search SARS-CoV-2 protein sequences with human protein sequences with BLASTP and HMMER3, and
  2. Calculate similarity scores based on relative-surface-availability of residues of human proteins
    cressp -t UP000464024_2697049.fasta.gz --flag_use_HMM_search --float_thres_e_value 5e-2 --cpu 2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cressp-0.0.11.tar.gz (95.7 kB view details)

Uploaded Source

Built Distribution

cressp-0.0.11-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file cressp-0.0.11.tar.gz.

File metadata

  • Download URL: cressp-0.0.11.tar.gz
  • Upload date:
  • Size: 95.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for cressp-0.0.11.tar.gz
Algorithm Hash digest
SHA256 f4214640efdc0271d1472894a6dd0217411e49dcc10f80a784f73a3d2c9a423b
MD5 f2d86b69df2ec8ec4c3fa72de47c36b1
BLAKE2b-256 cfee23438ab261b3e42a12502fc9654bdb6cecea265c17595b68448d3ba117f5

See more details on using hashes here.

File details

Details for the file cressp-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: cressp-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for cressp-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 faf097a5c534d7c657cce6bb8080a49126a3127c2e4eaaf02426fe7eaeafae11
MD5 fc8a8daa07b22f588cc670bf69e5d7e6
BLAKE2b-256 374dea8be86440300569c64485a134d51513ee2b97b1afc4459b1c72370267e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page