Skip to main content

CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. This package is maintained by Pranjal Pruthi, BioinformaticsOnLine organization.

Project description

CPC2 standalone

  • 2019-11-23 15:30, Yang Ding
    • Now CPC2 supports both Python 2 and Python 3 (thanks for help from HyperOdin)

1 Pre-requisite:

a. Biopython package: a local version could be downloaded from http://biopython.org/wiki/Download

2 Install

a. Unpack the tarball:

tom@linux$ gzip -dc CPC2-beta.tar.gz | tar xf -

b. Build third-part packages:

tom@linux$ cd CPC2-beta
tom@linux$ export CPC_HOME="$PWD"
tom@linux$ cd libs/libsvm
tom@linux$ gzip -dc libsvm-3.18.tar.gz | tar xf -
tom@linux$ cd libsvm-3.18
tom@linux$ make clean && make

3 Run the predict

tom@linux$ cd $CPC_HOME
tom@linux$ bin/CPC2.py -i (input_seq) -o (result_in_table)

example: tom@linux$ bin/CPC2.py -i data/example.fa -o example_output

4 Output result

The result is in table format (plain text delimited by tab).

Default output:
#ID transcript_length peptide_length Fickett_score pI ORF_integrity coding_probability label

Set '--ORF' to output the start position of longest ORF:
#ID transcript_length peptide_length Fickett_score pI ORF_integrity ORF_Start coding_probability label

Contact

See the website for tutorial and more details. (http://cpc2.cbi.pku.edu.cn)

This is a beta version of CPC2, if have any questions please report to us.

Contact: cpc@mail.cbi.pku.edu.cn

About CPC2

Here are some example commands:

  • To run a basic test: cpc2 -i data/example.fa -o test_output
  • To check the reverse strand: cpc2 -i data/example.fa -o test_output -r
  • To output the longest ORF: cpc2 -i data/example.fa -o test_output --ORF
  • To get help: cpc2 --help

Coding Potential Calculator distinguishes protein-coding from non-coding RNAs based on the sequence features of the input transcripts. CPC2 is an updated version of CPC1, designed to be faster and more accurate in discriminating coding and non-coding transcripts.

Input Requirements

CPC2 accepts RNA transcript sequences in both FASTA format and GTF/GFF/BED format.

FASTA format:

  • Size: Less than 100,000 lines in input box (online) and no line limitation in batch mode. Maximum upload file size is 50 Mb.
  • Name: Sequence names must begin with ‘>’. Characters after a blank space in the ID will be discarded.
  • Sequence: Only characters found in DNA and RNA sequences are allowed.

GTF/GFF/BED format:

  • Supported formats: BED6, BED12, GTF, and GFF.
  • Size: Less than 50,000 lines. Maximum upload file size is 50 Mb.
  • Supported genomes for GTF/GFF/BED: Human (hg38, hg19), Chimpanzee (panTro4), Mouse (mm10), Rat (rn6), Zebrafish (danRer7), Xenopus (xendTro3), Fruitfly (dm6).
  • Note: Inputting in BED format might slow down processing.

Features

  • Speed and Accuracy: CPC2 employs a novel discriminative model based on sequence intrinsic features, making it significantly faster than CPC1 and other popular tools, while also offering superior accuracy.
  • Species-Neutral: The model used in CPC2 is species-neutral, making it suitable for analyzing transcriptomes from a wide range of organisms, including non-model organisms.
  • Output: Results include sequence ID, coding/noncoding classification, coding probability, scores for putative peptide length, Fickett TESTCODE score, putative isoelectric point, and ORF integrity.

For more detailed information on the web server, input/output formats, and additional features like BLAST integration, please refer to the original CPC2 documentation and publication.

Maintained for PyPI by:

Pranjal Pruthi Project Scientist, BioinformaticsOnLine organization Email: mail@pranjal.work

Original Publication:

Kang Y. J., Yang D. C., Kong L., Hou M., Meng Y. Q., Wei L., Gao G. 2017. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Research 45(Web Server issue): W12–W16.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpc2_standalone-1.0.8.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cpc2_standalone-1.0.8-py3-none-any.whl (831.3 kB view details)

Uploaded Python 3

File details

Details for the file cpc2_standalone-1.0.8.tar.gz.

File metadata

  • Download URL: cpc2_standalone-1.0.8.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for cpc2_standalone-1.0.8.tar.gz
Algorithm Hash digest
SHA256 04f8b127ee5de0d853313d8b04ff438e4b3cb9c262b8919d00e8ab3c20a1be06
MD5 daecb902348df9393687cb33953a7993
BLAKE2b-256 05496b40720b548ee05c6ef5e1ff99df59ae61c02303330528cff7e17e8c1167

See more details on using hashes here.

File details

Details for the file cpc2_standalone-1.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for cpc2_standalone-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 1553a55043979e4f8649a42fc1e7a04babad30d1285f2d5293b065cdb5e1c90d
MD5 3638c4bc8c9c9f13150a4dec0afbb4fa
BLAKE2b-256 8ae121ecd4098c916de7c661f638e6e41ece70a16470c0e21618ba0d86294e6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page