Classify prokaryote protein sequences into COG functional category
Project description
COGclassifier
Table of Contents
Overview
COGclassifier is a tool for classifying prokaryote protein sequences into COG functional category.
Fig.1: Barchart of COG funcitional category classification result for E.coli
Fig.2: Piechart of COG funcitional category classification result for E.coli
Installation
COGclassifier is implemented in Python3 (Tested on Ubuntu20.04)
Install PyPI stable version with pip:
pip install cogclassifier
COGclassifier requires RPS-BLAST
for COG database search.
Download latest BLAST executable binary from NCBI FTP site and add to PATH.
:warning: 'mt_mode' option has been added since v2.12.0 or newer versions of BLAST. 'mt_mode=1' option setting makes effective use of multi-threading and is faster, so it is recommended that you install the latest version. See NCBI's article Threading By Query for details.
Workflow
-
Download COG & CDD resources
-
RPS-BLAST query sequences against COG database
-
Classify query sequences into COG functional category
Command Usage
Basic Command
COGclassifier -i [query protein fasta file] -o [output directory]
Options
-h, --help show this help message and exit
-i , --infile Input query protein fasta file
-o , --outdir Output directory
-d , --download_dir Download COG & CDD FTP data directory (Default: './cog_download')
-t , --thread_num RPS-BLAST num_thread parameter (Default: MaxThread - 1)
-e , --evalue RPS-BLAST e-value parameter (Default: 0.01)
-v, --version Print version information
Example Command
Classify E.coli protein sequences into COG functional category (ecoli.faa):
COGclassifier -i ./example/input/ecoli.faa -o ./ecoli_cog_classifier
Output Contents
COGclassifier outputs 4 result text files and 3 html format chart files.
-
rpsblast_result.tsv
(example)RPS-BLAST against COG database result (format =
outfmt 6
). -
classifier_result.tsv
(example)Query sequences classified into COG functional category result.
This file contains all classified query sequences and associated COG information.Table of detailed tsv format information (9 columns)
Columns Contents Example Value QUERY_ID Query ID NP_414544.1 COG_ID COG ID of RPS-BLAST top hit result COG0083 CDD_ID CDD ID of RPS-BLAST top hit result 223161 EVALUE RPS-BLAST top hit evalue 2.5e-150 IDENTITY RPS-BLAST top hit identity 45.806 GENE_NAME Abbreviated gene name ThrB COG_NAME COG gene name Homoserine kinase COG_LETTER Letter of COG functional category E COG_DESCRIPTION Description of COG functional category Amino acid transport and metabolism -
classifier_count.tsv
(example)Count classified sequences per COG functional category result.
Table of detailed tsv format information (4 columns)
Columns Contents Example Value LETTER Letter of COG functional category J COUNT Count of COG classified sequence 259 COLOR Symbol color of COG functional category #FCCCFC DESCRIPTION Description of COG functional category Translation, ribosomal structure and biogenesis -
classifier_stats.txt
(example)The percentages of the classified sequences are described as example below.
86.35% (3575 / 4140) sequences classified into COG functional category.
-
classifier_count_barchart.html
Barchart of COG funcitional category classification result.
COGclassifier usesAltair
visualization library for plotting html format charts.
In web browser, Altair charts interactively display tooltips and can export image as PNG or SVG format. -
classifier_count_piechart.html
Piechart of COG funcitional category classification result.
Functional category with percentages less than 1% don't display letter on piechart. -
classifier_count_piechart_sort.html
Piechart with descending sort by count.
Functional category with percentages less than 1% don't display letter on piechart.
Customize charts
COGclassifier also provides barchart & piechart plotting scripts to customize charts appearence. Each script can plot the following feature charts. See wiki for details.
-
Features of plot_cog_classifier_barchart script
- Adjust figure width, height, barwidth
- Plot charts with percentage style instead of count number style
- Fix maximum value of Y-axis
- Descending sort by count number or not
- Plot charts from user-customized
classifier_count.tsv
-
Features of plot_cog_classifier_piechart script
- Adjust figure width, height
- Descending sort by count number or not
- Show letter on piechart or not
- Plot charts from user-customized
classifier_count.tsv
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cogclassifier-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4a8074cacdbcee52e6e70d3e7fb45e87ea821806a4d5d23dde45fabcadb1a57 |
|
MD5 | db697ad34afab55ada554f08c5121be8 |
|
BLAKE2b-256 | 8dc1b346550e0aaec8ca5d7cffa6c85b95edfa3442567b02a5c675700cb459c0 |