bio2Byte software suite to predict protein biophysical properties from their amino-acid sequences
Project description
Bio2Byte Tools
This package provides you structural predictions for protein sequences made by Bio2Byte group.
🧪 List of available predictors
Predictor | Usage |
---|---|
Dynamine | Fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle. |
Disomine | Predicts protein disorder with recurrent neural networks not directly from the amino acid sequence, but instead from more generic predictions of key biophysical properties, here protein dynamics, secondary structure and early folding. |
EfoldMine | Predicts from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. |
AgMata | Single-sequence based predictor of protein regions that are likely to cause beta-aggregation. |
🔗 Related link: These listed tools and others are described on the Bio2Byte website inside the Tools section.
⚡️Quick start
First of all, download and install the package:
$ pip install b2bTools
Use this example as an entry point:
from b2bTools import SingleSeq
single_seq = SingleSeq("/path/to/example.fasta")
single_seq.predict(tools=['dynamine', 'agmata'])
predictions = single_seq.get_all_predictions('SEQ001')
backbone_pred = predictions['SEQ001']['backbone']
sidechain_pred = predictions['SEQ001']['sidechain']
agmata_pred = predictions['SEQ001']['agmata']
plt.plot(range(len(backbone_pred)), backbone_pred, label = "Backbone")
plt.plot(range(len(backbone_pred)), sidechain_pred, label = "Sidechain")
plt.plot(range(len(backbone_pred)), agmata_pred, label = "Agmata")
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
💡 Relevant idea: Using the package from Jupyter Notebooks is a good idea to test the package. If you are using Google Colab, install the package directly from pip
inside a code block
!pip install b2bTools
🐳 Docker-way to quick start
Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.
🔗 Related link: Docker official documentation.
Preconditions
You have downloaded the source code of the Bio2Byte Tools in your local environment:
$ git clone git@bitbucket.org:bio2byte/b2btools.git && cd b2btools
Steps
In order to import/export files from your host to the container and viceversa create a volume using the -v $(pwd)/swap:/data
parameter.
⚠️ Important note: Be sure your input files are inside $(pwd)/swap
.
$ docker build --tag b2b-tools .
$ docker run -it -v $(pwd)/swap:/data b2b-tools -disomine -file /data/input_example.fasta -output /data/result.json -identifier test
⚠️ Important note:
- The output file titled
result.json
will be stored inshde$(pwd)/swap
. - The available parameters after
b2b-tools
are:
Parameter | Purpose | Example |
---|---|---|
-file |
Path to the input file | -input /path/to/input/file.fasta |
-output |
Path to the output file (a JSON file with the results) | -output /path/to/output/results.json |
-dynamine |
Run Dynamine predictor | -dynamine |
-disomine |
Run Disomine predictor | -disomine |
-efoldmine |
Run EfoldMine predictor | -efoldmine |
-agmata |
Run AgMata predictor | -agmata |
⚙️ First time setup
The following steps are required in order to install the b2bTools package in your local environment:
Conda package installation
Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.
🔗 Related link: Conda official documentation.
To install this package with conda, run:
$ conda install -c Bio2Byte b2bTools
⚠️ Important note: some Linux users might experience dependency conflicts during the conda installation. Please use the pip installation (described below) if you encounter them.
If you must use conda, use the following command:
$ conda install --override-channels --channel defaults --channel conda-forge --channel Bio2Byte --channel pytorch b2btools
Pip package installation
pip is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.
🔗 Related link: Pip official documentation.
$ pip install b2bTools
🐍 Package usage
Given a predictor could be built on top of other, it is usual to get more output predictions than the expected:
Predictor | Depends on |
---|---|
Dynamine | None |
EfoldMine | [Dynamine] |
Disomine | [EfoldMine, Dynamine] |
AgMata | [EfoldMine, Dynamine] |
🧭 Basic flow
This section will explain you in details the script mentioned inside the Quick start section.
- Import the
SingleSeq
class from theb2bTools
package:
from b2bTools import SingleSeq
- Instantiate an object by passing the path to the input file in FASTA format:
single_seq = SingleSeq("/path/to/example.fasta")
- Run the predictions you want to:
single_seq.predict(tools=['dynamine', 'efoldmine'])
⚠️ Important note: These are all the available options to put inside the tools parameter:
Predictor | string value |
---|---|
Dynamine | "dynamine" |
EfoldMine | "efoldmine" |
Disomine | "disomine" |
AgMata | "agmata" |
- Get the prediction values after running the selected predictors for a specific sequence identifier:
predictions = single_seq.get_all_predictions('SEQ001')
⚠️ Important note: The method get_all_predictions
will return a dictionary with the following structure:
{
"SEQUENCE_ID_000": {
"seq": "the input sequence 0",
"result001": [0.001, 0.002, ..., 0.00],
"result002": [0.001, 0.002, ..., 0.00],
"...": [...],
"resultN": [0.001, 0.002, ..., 0.00]
},
"SEQUENCE_ID_001": {
"seq": "the input sequence 1",
"result001": [0.001, 0.002, ..., 0.00],
"result002": [0.001, 0.002, ..., 0.00],
"...": [...],
"resultN": [0.001, 0.002, ..., 0.00]
},
"...": { ... },
"SEQUENCE_ID_N": {
"seq": "the input sequence N",
"result001": [0.001, 0.002, ..., 0.00],
"result002": [0.001, 0.002, ..., 0.00],
"...": [...],
"resultN": [0.001, 0.002, ..., 0.00]
},
}
To know all the available result keys, please review this table:
Predictor | Output key | Output values (type) | Output values (example) |
---|---|---|---|
None | "seq" |
[Char] | ['M', 'A', ..., 'S', 'T'] |
Dynamine | "backbone" |
[Float] | [0.6786, 0.71, ..., 0.7219] |
Dynamine | "sidechain" |
[Float] | [0.5823, 0.23, ..., 0.1995] |
Dynamine | "helix" |
[Float] | [0.0122, 0.84, ..., 0.2345] |
Dynamine | "ppII" |
[Float] | [0.0420, 0.69, ..., 0.5566] |
Dynamine | "coil" |
[Float] | [0.6666, 0.13, ..., 0.9954] |
Dynamine | "sheet" |
[Float] | [0.1992, 0.12, ..., 0.0020] |
EfoldMine | "earlyFolding" |
[Float] | [0.1989, 0.08, ..., 0.0031] |
Disomine | "disoMine" |
[Float] | [0.1996, 0.12, ..., 0.0019] |
AgMata | "agmata" |
[Float] | [0.1954, 0.06, ..., 0.0007] |
- You are ready to use the sequence and predictions to work with them. Here is an example of plotting the data.
backbone_pred = predictions['SEQ001']['backbone']
sidechain_pred = predictions['SEQ001']['sidechain']
agmata_pred = predictions['SEQ001']['agmata']
plt.plot(range(len(backbone_pred)), backbone_pred, label = "Backbone")
plt.plot(range(len(backbone_pred)), sidechain_pred, label = "Sidechain")
plt.plot(range(len(backbone_pred)), agmata_pred, label = "Agmata")
plt.legend()
plt.xlabel('aa_position')
plt.ylabel('pred_values')
plt.show()
⌨️ Running as Python module (no Python code involved)
You are able to use this package directly from your console session with no Python code involved. Further details available on the official Python documentation site
$ python -m b2bTools -file ./swap/input_example.fasta -dynamics -disomine -identifier test -output ./swap/result-from-package.json
⚠️ Important note:
- The output file titled
result.json
will be stored inshde$(pwd)/swap
. - The available parameters after
b2b-tools
are:
Parameter | Purpose | Example |
---|---|---|
-file |
Path to the input file | -input /path/to/input/file.fasta |
-output |
Path to the output file (a JSON file with the results) | -output /path/to/output/results.json |
-dynamine |
Run Dynamine predictor | -dynamine |
-disomine |
Run Disomine predictor | -disomine |
-efoldmine |
Run EfoldMine predictor | -efoldmine |
-agmata |
Run AgMata predictor | -agmata |
📚 Package classes & methods
If you are interested in further details, please read the full documentation on the Bio2Byte website.
To generate locally the documentation you can follow the next steps described in this section.
Preconditions
You have downloaded the source code of the Bio2Byte Tools in your local environment:
$ git clone git@bitbucket.org:bio2byte/b2btools.git && cd b2btools
Steps
- Run the following command:
$ make generate-docs
- And then open folder
./wrapped_documentation
💡 Relevant idea: At any moment, you can read the docs of a method invoking the __doc__
method (e.g. print(SingleSeq.predict.__doc__)
).
📖 How to cite
If you use this package or data in this package, please cite:
Predictor | Cite | Digital Object Identifier (DOI) |
---|---|---|
Dynamine | Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken. From protein sequence to dynamics and disorder with DynaMine Nature Communications 4:2741 (2013) | https://www.nature.com/articles/ncomms3741 |
Disomine | Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken. Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics. bioRxiv 2020.05.25.115253 (2020) | https://www.biorxiv.org/content/10.1101/2020.05.25.115253v1 |
EfoldMine | Raimondi, D., Orlando, G., Pancsa, R. et al. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep 7, 8826 (2017) | https://doi.org/10.1038/s41598-017-08366-3 |
AgMata | Gabriele Orlando, Alexandra Silva, Sandra Macedo-Ribeiro, Daniele Raimondi, Wim Vranken. Accurate prediction of protein beta-aggregation with generalized statistical potentials Bioinformatics , Volume 36, Issue 7, 1 April 2020, Pages 2076–2081 (2020) | https://academic.oup.com/bioinformatics/article/36/7/2076/5670527 |
📝 Terms of use
- The Bio2Byte group aims to promote open science by providing freely available online services, database and software relating to the life sciences, with focus on proteins. Where we present scientific data generated by others we impose no additional restriction on the use of the contributed data than those provided by the data owner.
- The Bio2Byte group expects attribution (e.g. in publications, services or products) for any of its online services, databases or software in accordance with good scientific practice. The expected attribution will be indicated in 'How to cite' sections (or equivalent).
- The Bio2Byte group is not liable to you or third parties claiming through you, for any loss or damage.
- Any questions or comments concerning these Terms of Use can be addressed to Wim Vranken.
© Wim Vranken, Bio2Byte group, VUB
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for b2bTools-3.0.1b14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2571adaf3c1b1285df7037ab31ccdf4959403e67b11ff5826af95a20eab57d9 |
|
MD5 | 6f726a179260373630a35da385c63acd |
|
BLAKE2b-256 | 0ed8b265bb4e3cd0d42b3158fd846725c66836b65f705d14e11abcfe7908e904 |