TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation
Project description
TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation
TACaPe (Transformed-based Anti-Cancer Peptide Classification and Generation) is a commandline tool to train transformer-based models for anticancer peptide classification and generation. I was built on top of Tensorflow and uses an auto-regressive algorithm for peptide design, which results can be filtered using an optional classification model.
Setup
Installing from PyPI using pip
$ pip install tacape
Installing from GitHub
$ git clone https://github.com/omixlab/anticancer-peptide
$ cd anticancer-peptide
Using pip
$ pip install -r requirements.txt -e .
Using conda
$ conda env create
$ conda activate anticancer-peptide
Usage
tacape-train-classifier
Trains a classification model for anticancer peptide.
$ tacape-train-classifier -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Model Training [-h] --positive-train POSITIVE_TRAIN --negative-train NEGATIVE_TRAIN --positive-test POSITIVE_TEST
--negative-test NEGATIVE_TEST [--format {text,fasta}] --output OUTPUT [--epochs EPOCHS]
optional arguments:
-h, --help show this help message and exit
--positive-train POSITIVE_TRAIN
Input file containing positive peptides for training
--negative-train NEGATIVE_TRAIN
Input file containing negative peptides for training
--positive-test POSITIVE_TEST
Input file containing positive peptides for testing
--negative-test NEGATIVE_TEST
Input file containing negative peptides for testing
--format {text,fasta}
[optional] Input file format (default: text)
--output OUTPUT Path prefix of the output files
--epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)
tacape-predict
Runs a classification model for anticancer peptide prediction from a input file.
$ tacape-predict -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Predict [-h] --input INPUT [--format {text,fasta}] --classifier-prefix CLASSIFIER_PREFIX --output OUTPUT
optional arguments:
-h, --help show this help message and exit
--input INPUT Input file
--format {text,fasta}
[optional] Input file format (default: text)
--classifier-prefix CLASSIFIER_PREFIX
[optional] Path to the file prefix of the trained classification model
--output OUTPUT Path to the output CSV file
tacape-train-generator
Trains a auto-regressive generative model for anticancer peptide.
$ tacape-train-generator -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Generative Model Training [-h] --positive-train POSITIVE_TRAIN --positive-test POSITIVE_TEST [--format {text,fasta}] --output
OUTPUT [--epochs EPOCHS]
optional arguments:
-h, --help show this help message and exit
--positive-train POSITIVE_TRAIN
Input file containing positive peptides for training
--positive-test POSITIVE_TEST
Input file containing positive peptides for testing
--format {text,fasta}
[optional] Input file format (default: text)
--output OUTPUT Path prefix of the output files containing the trained model
--epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)
tacape-generate
Generates a set of peptides with potential anticancer activity from a trained generative model. If a classification model é provided, it will be used to filter the generated sequences and compute a probability of activity.
$ tacape-generate -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Generate [-h] --generator-prefix GENERATOR_PREFIX [--classifier-prefix CLASSIFIER_PREFIX]
[--number-of-sequences NUMBER_OF_SEQUENCES] [--temperature TEMPERATURE] [--threshold THRESHOLD] --output
OUTPUT
optional arguments:
-h, --help show this help message and exit
--generator-prefix GENERATOR_PREFIX
Path to the file prefix of the trained generative model
--classifier-prefix CLASSIFIER_PREFIX
[optional] Path to the file prefix of the trained classification model
--number-of-sequences NUMBER_OF_SEQUENCES
[optional] Number of sequences to be generated (default: 1000)
--temperature TEMPERATURE
[optional] Temperature used for logit scaling when sampling aminoacids during auto-regressive generation
(default: 1.0)
--threshold THRESHOLD
[optional] Classification probability threshold (default: 0.5)
--output OUTPUT Path to the output CSV file
Example: generating sequences from the AntiCP2 dataset
Creating a peptide classifier for 100 epochs
$ tacape-train-classifier \
--positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
--negative-train data/raw/anti_cp/anticp2_main_internal_negative.txt \
--positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
--negative-test data/raw/anti_cp/anticp2_main_validation_negative.txt \
--output data/models/classifier \
--epochs 100
Run the predictive model on the validation dataset
$ tacape-predict \
--input data/raw/anti_cp/anticp2_main_validation_positive.txt \
--format text \
--classifier-prefix data/models/internal \
--output data/models/internal_results.csv
Creating a peptide generator for 100 epochs
$ tacape-train-generator \
--positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
--positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
--output data/models/generator \
--epochs 100
Run the generative model to generate 100 sequences
$ tacape-generate \
--generator-prefix data/models/generator \
--classifier-prefix data/models/classifier \
--number-of-sequence 100 \
--output data/models/generated.csv
Convert generated peptides to FASTA
$ tacape-csv-to-fasta \
--input data/models/generated.csv \
--output data/models/generated.fasta
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file tacape-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: tacape-0.0.6-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61c69b1dcec0cd371cf80b6598e7de63d9220e429ad54e7e48333bb073360eeb |
|
MD5 | f8baf36ec77f3e869de718e79fb638ef |
|
BLAKE2b-256 | 714774798ad33c381990d0da57728237758b381f159dafac93e848a7b3ae94b8 |