Skip to main content

TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation

Project description

TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation

TACaPe (Transformed-based Anti-Cancer Peptide Classification and Generation) is a commandline tool to train transformer-based models for anticancer peptide classification and generation. I was built on top of Tensorflow and uses an auto-regressive algorithm for peptide design, which results can be filtered using an optional classification model.

Setup

Installing from PyPI using pip

$ pip install tacape

Installing from GitHub

$ git clone https://github.com/omixlab/anticancer-peptide
$ cd anticancer-peptide

Using pip

$ pip install -r requirements.txt -e .

Using conda

$ conda env create
$ conda activate anticancer-peptide

Usage

tacape-train-classifier

Trains a classification model for anticancer peptide.

$ tacape-train-classifier -h

/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 

usage: TACaPe: Model Training [-h] --positive-train POSITIVE_TRAIN --negative-train NEGATIVE_TRAIN --positive-test POSITIVE_TEST
                              --negative-test NEGATIVE_TEST [--format {text,fasta}] --output OUTPUT [--epochs EPOCHS]

optional arguments:
  -h, --help            show this help message and exit
  --positive-train POSITIVE_TRAIN
                        Input file containing positive peptides for training
  --negative-train NEGATIVE_TRAIN
                        Input file containing negative peptides for training
  --positive-test POSITIVE_TEST
                        Input file containing positive peptides for testing
  --negative-test NEGATIVE_TEST
                        Input file containing negative peptides for testing
  --format {text,fasta}
                        [optional] Input file format (default: text)
  --output OUTPUT       Path prefix of the output files
  --epochs EPOCHS       [optional] Number of epochs to be used during training (default: 30)

tacape-predict

Runs a classification model for anticancer peptide prediction from a input file.

$ tacape-predict -h

/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 

usage: TACaPe: Predict [-h] --input INPUT [--format {text,fasta}] --classifier-prefix CLASSIFIER_PREFIX --output OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         Input file
  --format {text,fasta}
                        [optional] Input file format (default: text)
  --classifier-prefix CLASSIFIER_PREFIX
                        [optional] Path to the file prefix of the trained classification model
  --output OUTPUT       Path to the output CSV file

tacape-train-generator

Trains a auto-regressive generative model for anticancer peptide.

$ tacape-train-generator -h

/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 

usage: TACaPe: Generative Model Training [-h] --positive-train POSITIVE_TRAIN --positive-test POSITIVE_TEST [--format {text,fasta}] --output
                              OUTPUT [--epochs EPOCHS]

optional arguments:
  -h, --help            show this help message and exit
  --positive-train POSITIVE_TRAIN
                        Input file containing positive peptides for training
  --positive-test POSITIVE_TEST
                        Input file containing positive peptides for testing
  --format {text,fasta}
                        [optional] Input file format (default: text)
  --output OUTPUT       Path prefix of the output files containing the trained model
  --epochs EPOCHS       [optional] Number of epochs to be used during training (default: 30)

tacape-generate

Generates a set of peptides with potential anticancer activity from a trained generative model. If a classification model é provided, it will be used to filter the generated sequences and compute a probability of activity.

$ tacape-generate -h

/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 

usage: TACaPe: Generate [-h] --generator-prefix GENERATOR_PREFIX [--classifier-prefix CLASSIFIER_PREFIX]
                        [--number-of-sequences NUMBER_OF_SEQUENCES] [--temperature TEMPERATURE] [--threshold THRESHOLD] --output
                        OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --generator-prefix GENERATOR_PREFIX
                        Path to the file prefix of the trained generative model
  --classifier-prefix CLASSIFIER_PREFIX
                        [optional] Path to the file prefix of the trained classification model
  --number-of-sequences NUMBER_OF_SEQUENCES
                        [optional] Number of sequences to be generated (default: 1000)
  --temperature TEMPERATURE
                        [optional] Temperature used for logit scaling when sampling aminoacids during auto-regressive generation
                        (default: 1.0)
  --threshold THRESHOLD
                        [optional] Classification probability threshold (default: 0.5)
  --output OUTPUT       Path to the output CSV file

Example: generating sequences from the AntiCP2 dataset

Creating a peptide classifier for 100 epochs

$ tacape-train-classifier \
    --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
    --negative-train data/raw/anti_cp/anticp2_main_internal_negative.txt \
    --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
    --negative-test data/raw/anti_cp/anticp2_main_validation_negative.txt \
    --output data/models/classifier \
    --epochs 100

Run the predictive model on the validation dataset

$ tacape-predict \
    --input data/raw/anti_cp/anticp2_main_validation_positive.txt \
    --format text \
    --classifier-prefix data/models/internal \
    --output data/models/internal_results.csv

Creating a peptide generator for 100 epochs

$ tacape-train-generator \
    --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
    --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
    --output data/models/generator \
    --epochs 100

Run the generative model to generate 100 sequences

$ tacape-generate \
    --generator-prefix data/models/generator \
    --classifier-prefix data/models/classifier \
    --number-of-sequence 100 \
    --output data/models/generated.csv

Convert generated peptides to FASTA

$ tacape-csv-to-fasta \
    --input data/models/generated.csv \
    --output data/models/generated.fasta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tacape-0.0.6-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file tacape-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: tacape-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for tacape-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 61c69b1dcec0cd371cf80b6598e7de63d9220e429ad54e7e48333bb073360eeb
MD5 f8baf36ec77f3e869de718e79fb638ef
BLAKE2b-256 714774798ad33c381990d0da57728237758b381f159dafac93e848a7b3ae94b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page