Skip to main content

No project description provided

Project description

PARROT_logo_full

PARROT: Protein Analysis using RecuRrent neural networks On Training data

PARROT encodes a computationally-robust bidirectional recurrent neural network (BRNN) behind an easy-to-use commandline interface. PARROT is well-suited for a variety of protein bioinformatics tasks. With only an input datafile containing sequences and mapped values, the user can automatically train a BRNN for whatever purpose. This trained network can then be applied to new, unlabeled data to generate predictions and generate biological hypotheses.

This package can handle regression and classification ML problems, as well as sequence-mapped and residue-mapped input data.

Installation:

PARROT is available through GitHub or the Python Package Index (PyPI). To install through PyPI, run

$ pip install idptools-parrot

It is possible that you may experience errors depending on what Python packages are already installed on your machine. If you encounter this, try installing PARROT in a "clean" virtual environment using conda:

$ conda create --name <env_name> python=3.7
$ conda activate <env_name>

Then install PARROT with pip.

Alternatively, to clone the GitHub repository and gain the ability to modify a local copy of the code, run:

$ git clone https://github.com/idptools/parrot.git
$ cd parrot
$ pip install .

This will install PARROT locally. If you modify the source code in the local repository, be sure to reinstall with pip.

Usage:

There are three primary commands that can be run within the parrot package. Each of these are briefly described below and for more information on their usage, visit their individual documentation pages.

  1. Train a BRNN with user-specified hyperparameters
  2. Train a BRNN with automatically-determined, optimal hyperparameters
  3. Generate predictions on unlabeled sequences using a trained BRNN

Input data format:

Before data can be integrated into training a BRNN, it must be formatted in the following manner:

seqID1 seq1 seq1data1 <seq1data2> <seq1data3> ... <seq1dataN1>  
seqID2 seq2 seq2data1 <seq2data2> <seq2data3> ... <seq2dataN2>  
.
.
.  
seqIDM seqM seqMdata1 <seqMdata2> <seqMdata3> ... <seqMdataNM>

Where Ni is the length of sequence i, and M is the total number of labeled sequences. Items must be whitespace-separated. For sequence-mapped data (i.e. each sequence constitutes a single datapoint), each row will only contain three columns. Note that it is not required that sequences are the same length. For example, if Sequence #1 has 12 amino acids and Sequence #2 has 15 amino acids, then these two rows in the input file will contain 14 and 17 fields respectively.

Optionally, you may use datasets that exclude the first column containing the ID of each sequence. In this case, be sure to use the --excludeSeqID flag.

Classification problem: the labeled data should be integer class labels. For example, if there are 3 classes, then each datapoint should be either a '0', '1', or '2' (with no quote marks).

Regression problem: If the user wishes to map each sequence or residue to a continuous real number, then each datapoint should be a float

For example datasets, see the TSV files provided in the data folder.

1. Train BRNN with provided hyperparameters: parrot-train

The parrot-train command is most useful in the initial stages of data exploration. This command requires the user to specify the hyperparameters to train the network, so it may not achieve the optimal results compared to more extensive training and hyperparameter search. However, if one wishes to quickly train a network for a given task, this command will give a sense of how effective a BRNN will be. Running brnn_train on a dataset for a large number of epochs can inform for how many epochs to train for during the more extensive hyperparameter optimization.

2. Optimize hyperparameters and train BRNN: parrot-optimize

The parrot-optimize command initiates an extensive search for the best-performing network hyperparameters for a given dataset using Bayesian optimiztion. Three hyperparameters, the learning rate, number of hidden layers, and hidden vector size can greatly impact network performance and training speed, so it is important to tune these for each particular dataset. This command will search across hyperparameter space by iteratively training and validating network performance (with 5-fold cross validation). The best performing hyperparameters will be selected, and used to train a network from scratch as if running brnn_train with these parameters.

3. Generate predictions with trained BRNN: parrot-predict

Once a network has been trained for a particular machine learning task, the user can generate predictions on new sequences with this network using the parrot-predict command. The user provides a list of sequences they would like to predict and the saved network, and a file is outputted with the predictions.

Copyright

Copyright (c) 2020, Holehouse Lab

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

idptools-parrot-1.5.0.tar.gz (45.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

idptools_parrot-1.5.0-py3-none-any.whl (75.2 kB view details)

Uploaded Python 3

File details

Details for the file idptools-parrot-1.5.0.tar.gz.

File metadata

  • Download URL: idptools-parrot-1.5.0.tar.gz
  • Upload date:
  • Size: 45.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for idptools-parrot-1.5.0.tar.gz
Algorithm Hash digest
SHA256 ee821939d816457317c70b4b18f62bdbb37a360cdb1c4cefce2a5375a4d866cc
MD5 9f66a3f81664b5bc4cdd3a0ae8efe540
BLAKE2b-256 ddea85b32d4b6c17e557d32e0beb11a3c113be67720f523d044ade21ea9d790f

See more details on using hashes here.

File details

Details for the file idptools_parrot-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: idptools_parrot-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 75.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for idptools_parrot-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddd39a5909dbeadb6681c998d58fb7725a39a4018cdeeede857cb69f4bb926f6
MD5 02184751f5da45e2112b0955cd490a0a
BLAKE2b-256 7506b40e3e44e27a408ea989b7c0b1af54cf69301c2a9091b75480e0628f09bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page