Skip to main content

Classify prokaryote protein sequences into COG functional category

Project description

COGclassifier

Python3 License Latest PyPI version

Table of Contents

Overview

COGclassifier is a tool for classifying prokaryote protein sequences into COG functional category.

ecoli_barchart_fig
Fig.1: Barchart of COG funcitional category classification result for E.coli

ecoli_piechart_sort_fig
Fig.2: Piechart of COG funcitional category classification result for E.coli

Installation

COGclassifier is implemented in Python3 (Tested on Ubuntu20.04)

Install PyPI stable version with pip:

pip install cogclassifier

COGclassifier requires RPS-BLAST for COG database search.
Download latest BLAST executable binary from NCBI FTP site and add to PATH.

:warning: 'mt_mode' option has been added since v2.12.0 or newer versions of BLAST. 'mt_mode=1' option setting makes effective use of multi-threading and is faster, so it is recommended that you install the latest version. See NCBI's article Threading By Query for details.

Workflow

  1. Download COG & CDD resources

  2. RPS-BLAST query sequences against COG database

  3. Classify query sequences into COG functional category

Command Usage

Basic Command

COGclassifier -i [query protein fasta file] -o [output directory]

Options

-h, --help            show this help message and exit
-i , --infile         Input query protein fasta file
-o , --outdir         Output directory
-d , --download_dir   Download COG & CDD FTP data directory (Default: './cog_download')
-t , --thread_num     RPS-BLAST num_thread parameter (Default: MaxThread - 1)
-e , --evalue         RPS-BLAST e-value parameter (Default: 0.01)
-v, --version         Print version information

Example Command

Classify E.coli protein sequences into COG functional category (ecoli.faa):

COGclassifier -i ./example/input/ecoli.faa -o ./ecoli_cog_classifier

Output Contents

COGclassifier outputs 4 result text files and 3 html format chart files.

  • rpsblast_result.tsv (example)

    RPS-BLAST against COG database result (format = outfmt 6).

  • classifier_result.tsv (example)

    Query sequences classified into COG functional category result.
    This file contains all classified query sequences and associated COG information.

    Table of detailed tsv format information (9 columns)
    Columns Contents Example Value
    QUERY_ID Query ID NP_414544.1
    COG_ID COG ID of RPS-BLAST top hit result COG0083
    CDD_ID CDD ID of RPS-BLAST top hit result 223161
    EVALUE RPS-BLAST top hit evalue 2.5e-150
    IDENTITY RPS-BLAST top hit identity 45.806
    GENE_NAME Abbreviated gene name ThrB
    COG_NAME COG gene name Homoserine kinase
    COG_LETTER Letter of COG functional category E
    COG_DESCRIPTION Description of COG functional category Amino acid transport and metabolism
  • classifier_count.tsv (example)

    Count classified sequences per COG functional category result.

    Table of detailed tsv format information (4 columns)
    Columns Contents Example Value
    LETTER Letter of COG functional category J
    COUNT Count of COG classified sequence 259
    COLOR Symbol color of COG functional category #FCCCFC
    DESCRIPTION Description of COG functional category Translation, ribosomal structure and biogenesis
  • classifier_stats.txt (example)

    The percentages of the classified sequences are described as example below.

    86.35% (3575 / 4140) sequences classified into COG functional category.

  • classifier_count_barchart.html

    Barchart of COG funcitional category classification result.
    COGclassifier uses Altair visualization library for plotting html format charts.
    In web browser, Altair charts interactively display tooltips and can export image as PNG or SVG format.

    classifier_count_barchart

  • classifier_count_piechart.html

    Piechart of COG funcitional category classification result.
    Functional category with percentages less than 1% don't display letter on piechart.

    classifier_count_piechart

  • classifier_count_piechart_sort.html

    Piechart with descending sort by count.
    Functional category with percentages less than 1% don't display letter on piechart.

    classifier_count_piechart

Customize charts

COGclassifier also provides barchart & piechart plotting scripts to customize charts appearence. Each script can plot the following feature charts. See wiki for details.

  • Features of plot_cog_classifier_barchart script

    • Adjust figure width, height, barwidth
    • Plot charts with percentage style instead of count number style
    • Fix maximum value of Y-axis
    • Descending sort by count number or not
    • Plot charts from user-customized classifier_count.tsv
  • Features of plot_cog_classifier_piechart script

    • Adjust figure width, height
    • Descending sort by count number or not
    • Show letter on piechart or not
    • Plot charts from user-customized classifier_count.tsv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cogclassifier-0.1.0.tar.gz (13.0 kB view hashes)

Uploaded Source

Built Distribution

cogclassifier-0.1.0-py3-none-any.whl (12.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page