Skip to main content

An encoder for peptides (short amino acid sequences) based on BLOSUM similarity.

Project description

Peptide Encoder

An encoder for peptides (short amino acid sequences) based on BLOSUM similarity.

In particular, this package includes a model for learning peptide embeddings such that the embedding of two peptides in the vector space is proportional to their BLOSUM62 similarity.

Installation

This project is written in python3 and can be installed with pip. It is available on PyPI.

pip3 install --find-links https://download.pytorch.org/whl/cu113/torch_stable.html peptide-encoder

Alternatively, the package can be installed from source.

git clone https://github.com/bmmalone/peptide-encoder.git
cd piptide-encoder
pip3 install -r requirements.txt .

(The "period" at the end is required.)

Prerequisites: This project relies on quite a few prerequisites, such as pytorch, ray, cudnn, and others. Both the requirements.txt and setup.py files aim to install these dependencies correctly; nevertheless, it may still be preferable to install these dependencies before installing this package.

In particular, the find-links argument to pip may need to be adjusted depending on the available version of CUDA.

Usage

After installation, models can be trained using a command similar to the following:

train-pepenc-models /prj/peptide-encoder/conf/base/config.yaml --num-hyperparameter-configurations 500 --max-training-iterations 30 --name my-pepenc-tune-exp --out-dir /tmp

The --help flag can be used to see a description of all arguments to the script.

For adjusting the hyperparameter search space, algorithms, or schedulers, the pepenc/models/train_pepenc_models.py script can be adjusted. If the package was not installed in pip "editable" mode, then make sure to re-run pip install so that the changes take effect for the next run of ray.

Documentation

Unfortunately, there is no sphinx, etc., documentation at this time. The file conf/base/config.yaml shows examples of all hyperparameters, data files, etc., for training models.

Data format

The models in this project require an input csv file that has one row which is a header and remaining rows which are the peptides for the various datasets. The column in the csv file with the peptide sequences must be named sequence. (This can be adjusted if calling the pepenc library from python code.)

Tensorboard visualization

The <out_dir>/<name> directory (based on the arguments to train-pepenc-models) will contain output suitable for visualization with Tensorboard. The following command uses Docker to expose the results on port 6007.

docker run --rm --mount type=bind,source=/tmp/my-pepenc-tune-exp,target=/tensorboard --publish 6007:6006 nvcr.io/nvidia/tensorflow:21.12-tf2-py3 tensorboard --logdir /tensorboard

The tensorflow image can be updated as necessary.

N.B. The source of the bind mount must be the <out_dir>/<name> directory (based on the arguments to train-pepenc-models).

Training the model

The model consistently experiences vanishing (or, improbably, exploding) gradient issues when using a single LSTM layer. It is not clear why this happens, and it is currently suggested to avoid allowing lstm_layers == 1 in the hyperparameter search space (or directly setting it that way in the config).

Testing the code

The project uses pytest for unit testing. The testing prerequisites (though not other dependencies, as described above) can be installed as follows.

pip3 install .[test]

After installing pytest and other testing dependencies, tests can be performed as follows.

cd /path/to/peptide-encoder
pytest .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peptide_encoder-0.2.2.tar.gz (175.5 kB view details)

Uploaded Source

Built Distribution

peptide_encoder-0.2.2-py3-none-any.whl (174.7 kB view details)

Uploaded Python 3

File details

Details for the file peptide_encoder-0.2.2.tar.gz.

File metadata

  • Download URL: peptide_encoder-0.2.2.tar.gz
  • Upload date:
  • Size: 175.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.8.2 readme-renderer/27.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for peptide_encoder-0.2.2.tar.gz
Algorithm Hash digest
SHA256 d4399249d05b6fe0a97ffbad7b33a2e3b80d20172418ef1ff28c49f381b3fbf1
MD5 00186dd7d4910299f705209f7d712e0f
BLAKE2b-256 6613ea0935280a3da28e1c77eef147cfb8511f8ee830bc4a7b7b0dc57e7d6647

See more details on using hashes here.

File details

Details for the file peptide_encoder-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: peptide_encoder-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 174.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.8.2 readme-renderer/27.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for peptide_encoder-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 52ea6490d18c5e4c17f43b0586646ba78b70f8c35c2acc0e80a1975662dc7f8f
MD5 55c42330cc7f5c0cae2420227e14b702
BLAKE2b-256 d6e24bae4d7526864c4fc69136b1a0c5c5ddb35d821e7f869d63e038e5b9d224

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page