Skip to main content

bio2Byte software suite to predict protein biophysical properties from their amino-acid sequences

Project description


Bio2Byte Tools

This package provides you structural predictions for protein sequences made by Bio2Byte group.

       

🧪 List of available predictors

Predictor Usage
Dynamine Fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle.
Disomine Predicts protein disorder with recurrent neural networks not directly from the amino acid sequence, but instead from more generic predictions of key biophysical properties, here protein dynamics, secondary structure and early folding.
EfoldMine Predicts from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events.
AgMata Single-sequence based predictor of protein regions that are likely to cause beta-aggregation.

🔗 Related link: These listed tools and others are described on the Bio2Byte website inside the Tools section.

⚡️Quick start

First of all, download and install the package:

$ pip install b2bTools

Use this example as an entry point:

from b2bTools import SingleSeq
single_seq = SingleSeq("/path/to/example.fasta")
single_seq.predict(tools=['dynamine', 'agmata'])
predictions = single_seq.get_all_predictions('SEQ001')

backbone_pred = predictions['SEQ001']['backbone']
sidechain_pred = predictions['SEQ001']['sidechain']
agmata_pred = predictions['SEQ001']['agmata']

plt.plot(range(len(backbone_pred)), backbone_pred, label = "Backbone")
plt.plot(range(len(backbone_pred)), sidechain_pred, label = "Sidechain")
plt.plot(range(len(backbone_pred)), agmata_pred, label = "Agmata")

plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()

💡 Relevant idea: Using the package from Jupyter Notebooks is a good idea to test the package. If you are using Google Colab, install the package directly from pip inside a code block

!pip install b2bTools

🐳 Docker-way to quick start

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.

🔗 Related link: Docker official documentation.

Preconditions

You have downloaded the source code of the Bio2Byte Tools in your local environment:

$ git clone git@bitbucket.org:bio2byte/b2btools.git && cd b2btools

Steps

In order to import/export files from your host to the container and viceversa create a volume using the -v $(pwd)/swap:/data parameter.

⚠️ Important note: Be sure your input files are inside $(pwd)/swap.

$ docker build --tag b2b-tools .
$ docker run -it -v $(pwd)/swap:/data b2b-tools -disomine -file /data/input_example.fasta -output /data/result.json -identifier test

⚠️ Important note:

  • The output file titled result.json will be stored inshde $(pwd)/swap.
  • The available parameters after b2b-tools are:
Parameter Purpose Example
-file Path to the input file -input /path/to/input/file.fasta
-output Path to the output file (a JSON file with the results) -output /path/to/output/results.json
-dynamine Run Dynamine predictor -dynamine
-disomine Run Disomine predictor -disomine
-efoldmine Run EfoldMine predictor -efoldmine
-agmata Run AgMata predictor -agmata

⚙️ First time setup

The following steps are required in order to install the b2bTools package in your local environment:

Conda package installation

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

🔗 Related link: Conda official documentation.

To install this package with conda, run:

$ conda install -c Bio2Byte b2bTools

⚠️ Important note: some Linux users might experience dependency conflicts during the conda installation. Please use the pip installation (described below) if you encounter them.

If you must use conda, use the following command:

$ conda install --override-channels --channel defaults --channel conda-forge --channel Bio2Byte --channel pytorch b2btools

Pip package installation

pip is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.

🔗 Related link: Pip official documentation.

$ pip install b2bTools

🐍 Package usage

Given a predictor could be built on top of other, it is usual to get more output predictions than the expected:

Predictor Depends on
Dynamine None
EfoldMine [Dynamine]
Disomine [EfoldMine, Dynamine]
AgMata [EfoldMine, Dynamine]

🧭 Basic flow

This section will explain you in details the script mentioned inside the Quick start section.

  1. Import the SingleSeq class from the b2bTools package:
from b2bTools import SingleSeq
  1. Instantiate an object by passing the path to the input file in FASTA format:
single_seq = SingleSeq("/path/to/example.fasta")
  1. Run the predictions you want to:
single_seq.predict(tools=['dynamine', 'efoldmine'])

⚠️ Important note: These are all the available options to put inside the tools parameter:

Predictor string value
Dynamine "dynamine"
EfoldMine "efoldmine"
Disomine "disomine"
AgMata "agmata"
  1. Get the prediction values after running the selected predictors for a specific sequence identifier:
predictions = single_seq.get_all_predictions('SEQ001')

⚠️ Important note: The method get_all_predictions will return a dictionary with the following structure:

{
  "SEQUENCE_ID_000": {
    "seq": "the input sequence 0",
    "result001": [0.001, 0.002, ..., 0.00],
    "result002": [0.001, 0.002, ..., 0.00],
    "...": [...],
    "resultN": [0.001, 0.002, ..., 0.00]
  },
  "SEQUENCE_ID_001": {
    "seq": "the input sequence 1",
    "result001": [0.001, 0.002, ..., 0.00],
    "result002": [0.001, 0.002, ..., 0.00],
    "...": [...],
    "resultN": [0.001, 0.002, ..., 0.00]
  },
  "...": { ... },
  "SEQUENCE_ID_N": {
    "seq": "the input sequence N",
    "result001": [0.001, 0.002, ..., 0.00],
    "result002": [0.001, 0.002, ..., 0.00],
    "...": [...],
    "resultN": [0.001, 0.002, ..., 0.00]
  },
}

To know all the available result keys, please review this table:

Predictor Output key Output values (type) Output values (example)
None "seq" [Char] ['M', 'A', ..., 'S', 'T']
Dynamine "backbone" [Float] [0.6786, 0.71, ..., 0.7219]
Dynamine "sidechain" [Float] [0.5823, 0.23, ..., 0.1995]
Dynamine "helix" [Float] [0.0122, 0.84, ..., 0.2345]
Dynamine "ppII" [Float] [0.0420, 0.69, ..., 0.5566]
Dynamine "coil" [Float] [0.6666, 0.13, ..., 0.9954]
Dynamine "sheet" [Float] [0.1992, 0.12, ..., 0.0020]
EfoldMine "earlyFolding" [Float] [0.1989, 0.08, ..., 0.0031]
Disomine "disoMine" [Float] [0.1996, 0.12, ..., 0.0019]
AgMata "agmata" [Float] [0.1954, 0.06, ..., 0.0007]
  1. You are ready to use the sequence and predictions to work with them. Here is an example of plotting the data.
backbone_pred = predictions['SEQ001']['backbone']
sidechain_pred = predictions['SEQ001']['sidechain']
agmata_pred = predictions['SEQ001']['agmata']

plt.plot(range(len(backbone_pred)), backbone_pred, label = "Backbone")
plt.plot(range(len(backbone_pred)), sidechain_pred, label = "Sidechain")
plt.plot(range(len(backbone_pred)), agmata_pred, label = "Agmata")

plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()

⌨️ Running as Python module (no Python code involved)

You are able to use this package directly from your console session with no Python code involved. Further details available on the official Python documentation site

$ python -m b2bTools -file ./swap/input_example.fasta -dynamics -disomine -identifier test -output ./swap/result-from-package.json

⚠️ Important note:

  • The output file titled result.json will be stored inshde $(pwd)/swap.
  • The available parameters after b2b-tools are:
Parameter Purpose Example
-file Path to the input file -input /path/to/input/file.fasta
-output Path to the output file (a JSON file with the results) -output /path/to/output/results.json
-dynamine Run Dynamine predictor -dynamine
-disomine Run Disomine predictor -disomine
-efoldmine Run EfoldMine predictor -efoldmine
-agmata Run AgMata predictor -agmata

📚 Package classes & methods

If you are interested in further details, please read the full documentation on the Bio2Byte website.

To generate locally the documentation you can follow the next steps described in this section.

Preconditions

You have downloaded the source code of the Bio2Byte Tools in your local environment:

$ git clone git@bitbucket.org:bio2byte/b2btools.git && cd b2btools

Steps

  1. Run the following command:
$ make generate-docs
  1. And then open folder ./wrapped_documentation

💡 Relevant idea: At any moment, you can read the docs of a method invoking the __doc__ method (e.g. print(SingleSeq.predict.__doc__)).

📖 How to cite

If you use this package or data in this package, please cite:

Predictor Cite Digital Object Identifier (DOI)
Dynamine Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken. From protein sequence to dynamics and disorder with DynaMine Nature Communications 4:2741 (2013) https://www.nature.com/articles/ncomms3741
Disomine Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken. Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics. bioRxiv 2020.05.25.115253 (2020) https://www.biorxiv.org/content/10.1101/2020.05.25.115253v1
EfoldMine Raimondi, D., Orlando, G., Pancsa, R. et al. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep 7, 8826 (2017) https://doi.org/10.1038/s41598-017-08366-3
AgMata Gabriele Orlando, Alexandra Silva, Sandra Macedo-Ribeiro, Daniele Raimondi, Wim Vranken. Accurate prediction of protein beta-aggregation with generalized statistical potentials Bioinformatics , Volume 36, Issue 7, 1 April 2020, Pages 2076–2081 (2020) https://academic.oup.com/bioinformatics/article/36/7/2076/5670527

📝 Terms of use

  1. The Bio2Byte group aims to promote open science by providing freely available online services, database and software relating to the life sciences, with focus on proteins. Where we present scientific data generated by others we impose no additional restriction on the use of the contributed data than those provided by the data owner.
  2. The Bio2Byte group expects attribution (e.g. in publications, services or products) for any of its online services, databases or software in accordance with good scientific practice. The expected attribution will be indicated in 'How to cite' sections (or equivalent).
  3. The Bio2Byte group is not liable to you or third parties claiming through you, for any loss or damage.
  4. Any questions or comments concerning these Terms of Use can be addressed to Wim Vranken.

© Wim Vranken, Bio2Byte group, VUB

https://bio2byte.be/b2btools/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

b2bTools-3.0.1b11.tar.gz (15.8 MB view hashes)

Uploaded Source

Built Distribution

b2bTools-3.0.1b11-py3-none-any.whl (16.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page