CIRIquant

circular RNA quantification pipeline

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python

Project description

CIRIquant

GitHub release (latest by date) GitHub All Releases SourceForge

CIRIquant is a comprehensive analysis pipeline for circRNA detection and quantification in RNA-Seq data

Author

Authors: Jinyang Zhang(zhangjinyang@biols.ac.cn), Fangqing Zhao(zhfq@biols.ac.cn)

Maintainer: Jinyang Zhang

Release Notes

Version 1.0.1: Fixed bug, sync version number with pypi.org
Version 1.0: The first released version of CIRIquant

License

The code is released under the MIT License. See the LICENSE file for more detail.

Citing CIRIquant

Zhang, J., Chen, S., Yang, J. et al. Accurate quantification of circular RNAs identifies extensive circular isoform switching events. Nat Commun 11, 90 (2020) doi:10.1038/s41467-019-13840-9

Prerequisites

Softwares:
    bwa
    hisat2
    stringtie
    samtools>=1.9

Python packages:
    PyYAML
    argparse
    pysam
    numpy
    scipy
    scikit-learn

NOTES:

1. Only python2 is supported

2. Samtools version should be higher than 1.9, as older version of samtools may use deprecated parameters in sort and index commands

1. Installation

Please use the latest released version from GitHub or SourceForge

Use the setup.py for CIRIquant installation (clean install under virutalenv is highly recommended).

# create and activate virtual env
pip install virtualenv
virtualenv venv
source ./venv/bin/activate

# Install CIRIquant and its requirement automatically
tar zxvf CIRIquant.tar.gz
cd CIRIquant
python setup.py install

# Manual installation of required pacakges is also supported
pip install -r requirements.txt

The package should take approximately 40 seconds to install on a normal computer.

Update: installation using pip is now available!

pip install CIRIquant

2. Running CIRIquant For circRNA quantifcation

Usage:
  CIRIquant [options] --config <config> -1 <m1> -2 <m2>

  <config>          Config file
  <m1>              Input mate1 reads (for paired-end data)
  <m2>              Input mate2 reads (for paired-end data)


Options (defaults in parentheses):

  -v                Run in verbose mode
  -o, -out          Output directory (default: current directory)
  -e, --log         Specific log file (default: sample_prefix.log)
  -p, --prefix      Output sample prefix (default: input sample name)
  -t, --threads     Number of CPU threads to use (defualt: 4)

  --bed             User provided Back-Spliced Junction Site in BED format
  --circ            User provided circRNA prediction results
  --tool            User provided tool name for circRNA prediction

  --RNaseR          CIRIquant output file of RNase R data (required for RNase R correction)
  --bam             Specific hisat2 alignment bam file against reference genome
  --no-gene         Skip StringTie estimation of gene abundance

A YAML-formated config file is needed for CIRIquant to find software and reference needed. A valid example of config file is demonstrated below.

// Example of config file
name: hg19
tools:
  bwa: /home/zhangjy/bin/bwa
  hisat2: /home/zhangjy/bin/hisat2
  stringtie: /home/zhangjy/bin/stringtie
  samtools: /home/zhangjy/bin/samtools

reference:
  fasta: /home/zhangjy/Data/database/hg19.fa
  gtf: /home/zhangjy/Data/database/gencode.v19.annotation.gtf
  bwa_index: /home/zhangjy/Data/database/hg19/_BWAtmp/hg19
  hisat_index: /home/zhangjy/Data/database/hg19/_HISATtmp/hg19

Key	Description
name	the name of config file
bwa	the path of `bwa`
hisat2	the path of `hisat2`
stringtie	the path of `stringite`
samtools	the path of `samtools`, samtools version below 1.3.1 is not supported
fasta	reference genome fasta, a fai index by `samtools faidx` is also needed under the same directory
gtf	annotation file of reference genome
bwa_index	prefix of BWA index for reference genome
hisat_index	prefix of HISAT2 index for reference genome

For quantification of user-provided circRNAs, a list of junction sites in bed format is required, for example:

chr1    10000   10099   chr1:10000|10099    .   +
chr1    31000   31200   chr1:31000|31200    .   -

NOTE:

For now, --circ and --tool options can only parse CIRI2 results.
Gene expression values are needed for normalization, do not use --no-gene if you need to run DE analysis afterwards.

3. Output files

The main output of CIRIquant is a GTF file, that contains detailed information of BSJ and FSJ reads of circRNAs and annotation of circRNA back-spliced regions in the attribute columns

Description of each columns's value

column	name	description
1	chrom	chromosome / contig name
2	source	CIRIquant
3	type	circRNA
4	start	5' back-spliced junction site
5	end	3' back-spliced junction site
6	score	CPM of circRNAs (#BSJ / #Mapped reads)
7	strand	strand information
8	.	.
9	attributes	attributes seperated by semicolon

The attributes containing several pre-defined keys and values:

key	description
circ_id	name of circRNA
circ_type	circRNA types: exon / intron / intergenic
bsj	number of bsj reads
fsj	number of fsj reads
junc_ratio	circular to linear ratio: 2 * bsj / ( 2 * bsj + fsj)
rnaser_bsj	number of bsj reads in RNase R data (only when --RNaseR is specificed)
rnaser_fsj	number of fsj reads in RNase R data (only when --RNaseR is specificed)
gene_id	ensemble id of host gene
gene_name	HGNC symbol of host gene
gene_type	type of host gene in gtf file

4. Example Usage

Test data set can be retrived from test_data.tar.gz, you can replace the path of required software in the chr1.yml with your own version

tar zxvf test_data.tar.gz
cd test_data/quant
CIRIquant -t 4 \
          -1 ./test_1.fq.gz \
          -2 ./test_2.fq.gz \
          --config ./chr1.yml \
          --no-gene \
          -o ./test \
          -p test

The output file test.gtf should be located under test_data/quant/test

The demo dataset should take approximately 5 minutes on a personal computer. It has been tested on my PC with Intel i7-8700 processor and 16G of memory, running Ubuntu 18.04 LTS.

5. Generate RNase R effect corrected BSJ information

In order to remove effect for RNase R treatment, two steps of programs are needed

Run CIRIquant with RNase R treated sample
Use output gtf file in Step1 and run CIRIquant with --RNaseR option using output gtf in previous step

The output is in the same format as normal run, however the header line is appended with additional information of RNase R treatment

6. Run differential expression analysis for circRNAs

Study without biological replicate

For sample without replicate, the differential expression & differential splicing analysis is performed using CIRI_DE

Usage:
  CIRI_DE [options] -n <control> -c <case> -o <out>

  <control>         CIRIquant result of control sample
  <case>            CIRIquant result of treatment cases
  <out>             Output file

Options (defaults in parentheses):

  -p                p value threshold for DE and DS score calculation (default: 0.05)
  -t                numer of threads (default: 4)

Example usage:
  CIRI_DE -n control.gtf -c case.gtf -o CIRI_DE.csv

The output format CIRI_DE is in the format below:

column	name	description
1	circRNA_ID	circRNA identifier
2	Case_BSJ	number of BSJ reads in case
3	Case_FSJ	number of FSJ reads in case
4	Case_Ratio	junction ratio in case
5	Ctrl_BSJ	number of BSJ reads in control
6	Ctrl_FSJ	number of FSJ reads in control
7	Ctrl_Ratio	junction ratio in control
8	DE_score	differential expression score
9	DS_score	differential splicing score

Study with biological replicates

For study with biological replicates, a customed analysis pipeline of edgeR is recommended and we provide prep_CIRIquant to generate matrix of circRNA expression level / junction ratio and CIRI_DE_replicate for DE analysis

Step1: Prepare CIRIquant output files

One should provide a text file listing sample information and path to CIRIquant output GTF files

CONTROL1 ./c1/c1.gtf C 1
CONTROL2 ./c2/c2.gtf C 2
CONTROL3 ./c3/c3.gtf C 3
CASE1 ./t1/t1.gtf T 1
CASE2 ./t2/t2.gtf T 2
CASE3 ./t3/t3.gtf T 3

The first three columns is required by default. For paired samples, you could also add a column of subject name.

column	description
1	sample name
2	path to CIRIquant output gtf
3	group ("C" for control, "T" for treatment)
4	subject (optional, only for paired samples)

Then, run prep_CIRIquant to summarize the circRNA expression profile in all samples

Usage:
  prep_CIRIquant [options]

  -i                the file of sample list
  --lib             where to output library information
  --circ            where to output circRNA annotation information
  --bsj             where to output the circRNA expression matrix
  --ratio           where to output the circRNA junction ratio matrix

Example:
  prep_CIRIquant -i sample.lst \
                 --lib library_info.csv \
                 --circ circRNA_info.csv \
                 --bsj circRNA_bsj.csv \
                 --ratio circRNA_ratio.csv

These count matrices (CSV files) can then be imported into R for use by DESeq2 and edgeR (using the DESeqDataSetFromMatrix and DGEList functions, respectively).

Step2: Prepare StringTie output

The output of StringTie should locate under output_dir/gene/prefix_out.gtf. You need to use prepDE.py from stringTie to generate the gene count matrix for normalization.

For example, one can provide a text file sample_gene.lst containing sample IDs and path to StringTie outputs:

CONTROL1 ./c1/gene/c1_out.gtf
CONTROL2 ./c2/gene/c2_out.gtf
CONTROL3 ./c3/gene/c3_out.gtf
CASE1 ./t1/gene/t1_out.gtf
CASE2 ./t2/gene/t2_out.gtf
CASE3 ./t3/gene/t3_out.gtf

Then, run prepDE.py -i sample_gene.lst and use gene_count_matrix.csv generated under current working directory for further analysis.

Step3: Differential expression analysis

For differential analysis using CIRI_DE_replicate, you need to install a R environment and edgeR package from Bioconductor.

Usage:
  CIRI_DE_replicate [options]

  --lib             library information by CIRIquant
  --bsj             circRNA expression matrix
  --gene            gene expression matrix
  --out             output differential expression result

Example:
  CIRI_DE_replicate --lib  library_info.csv \
            --bsj  circRNA_bsj.csv \
            --gene gene_count_matrix.csv \
            --out  circRNA_de.csv

Please be noted that the output results is unfiltered, and you could apply a more stringent filter on expression values to get a more convincing result.

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python

Release history Release notifications | RSS feed

1.1.3

Nov 8, 2023

1.1.2

Mar 9, 2021

1.1.1

Jul 15, 2020

1.1

Apr 16, 2020

1.0.2

Apr 7, 2020

This version

1.0.1

Mar 4, 2020

1.0

Mar 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CIRIquant-1.0.1.tar.gz (41.7 kB view details)

Uploaded Mar 4, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

CIRIquant-1.0.1-py2-none-any.whl (46.6 kB view details)

Uploaded Mar 4, 2020 Python 2

File details

Details for the file CIRIquant-1.0.1.tar.gz.

File metadata

Download URL: CIRIquant-1.0.1.tar.gz
Upload date: Mar 4, 2020
Size: 41.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/2.7.5

File hashes

Hashes for CIRIquant-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`ff22fb20f05fa02fbbe3e3dc7a5373ba3fcf8ec259b8aa70911331ba44996b1f`
MD5	`a48e77d88b447a21aedffe9c3c2c5ed3`
BLAKE2b-256	`572010c1d01368e92767990adb1b3df40fc3230d4d913f1a7a35eb21c23918a5`

See more details on using hashes here.

File details

Details for the file CIRIquant-1.0.1-py2-none-any.whl.

File metadata

Download URL: CIRIquant-1.0.1-py2-none-any.whl
Upload date: Mar 4, 2020
Size: 46.6 kB
Tags: Python 2
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/2.7.5

File hashes

Hashes for CIRIquant-1.0.1-py2-none-any.whl
Algorithm	Hash digest
SHA256	`b0cdb8c373a82201183ae639c050aa793109fb76aa7bab64cf2bf5b8aca7a59d`
MD5	`e66d7bf14a996ddb9bbc8359be508fe2`
BLAKE2b-256	`4f517c2b69f294bef6ba21cddbd257adff4ff01fea07c63574fa407455e27a3a`

See more details on using hashes here.

CIRIquant 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CIRIquant

Author

Release Notes

License

Citing CIRIquant

Prerequisites

1. Installation

2. Running CIRIquant For circRNA quantifcation

3. Output files

4. Example Usage

5. Generate RNase R effect corrected BSJ information

6. Run differential expression analysis for circRNAs

Study without biological replicate

Study with biological replicates

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes