Skip to main content

A toolset for investigating the interactions between circRNA - miRNA - mRNA.

Project description

CircMiMi

A toolset for investigating the interactions between circRNA - miRNA - mRNA.

Table of Contents

Requirements

Installation

The recommended way is via conda, a package and environment management system. (https://docs.conda.io/en/latest/)

You may install circmimi by the following steps:

$ conda create -n circmimi python3
$ conda activate circmimi
$ pip install circmimi

For the external tools, they can also be installed via conda with the bioconda(https://bioconda.github.io/) channel:

$ conda install -c bioconda bedtools miranda blat

Now, you can try the following command to test the installation,

$ circmimi_tools --help

it should print out with the help messages.

Quick Start

  1. Generate the references
$ circmimi_tools genref --species hsa --source ensembl --version 98 ./refs
  1. Run the main pipeline of CircMiMi
$ circmimi_tools run -r ./refs -i circRNAs.tsv -o ./out/ -p 5 --checkAA --miranda-sc 175
  1. Create the network file for Cytoscape
$ circmimi_tools network create ./out/out.tsv ./out/out.xgmml

Usage

Generate the references

circmimi_tools genref --species SPECIES --source SOURCE [--version RELEASE_VER] REF_DIR

Parameters

Parameter Description
--species SPECIES Assign the species for references. Use the species code for SPECIES. [required]
--source SOURCE Available values for SOURCE: "ensembl", "ensembl_plants", "ensembl_metazoa", "gencode". [required]
--version RELEASE_VER The release version of the SOURCE. For examples, "98" for ("hsa", "ensembl"), "M24" for ("mmu", "gencode"). If the version is not assigned, the latest one will be used.

Available species and sources

Code Name E G EP EM MB MTB MDB
ath Arabidopsis thaliana V V V
bmo Bombyx mori V V V
bta Bos taurus V V V
cel Caenorhabditis elegans V V V V
cfa Canis familiaris V V V V
dre Danio rerio V V V
dme Drosophila melanogaster V V V
gga Gallus gallus V V V V
hsa Homo sapiens V V V V V
mmu Mus musculus V V V V V
osa Oryza sativa V V V
ola Oryzias latipes V V V
oar Ovis aries V V V
rno Rattus norvegicus V V V V
ssc Sus scrofa V V V
tgu Taeniopygia guttata V V V
xtr Xenopus tropicalis V V V
Gene annotation
Database for miRNAs
Databases for miRNA-mRNA interactions

Run the main pipeline

circmimi_tools run -r REF_DIR -i CIRC_FILE [-o OUT_PREFIX] [-p NUM_PROC] [--checkAA] \
[--miranda-sc SCORE] [--miranda-en ENERGY] [--miranda-scale SCALE] [--miranda-strict] [--miranda-go X] [--miranda-ge Y]

Parameters

Parameter Description
-r, --ref REF_DIR The directory of the pre-genereated reference files. [required]
-i, --circ CIRC_FILE The file of circRNAs. [required]
-o, --out-prefix OUT_PREFIX Assign the prefix for the output filenames. (default: "./out/")
-p, --num_proc NUM_PROC Assign the number of processes.
--checkAA Check the circRNAs if there are ambiguous alignments.

The miRanda parameters are also available (see the manual of miRanda).

Parameters Description
--miranda-sc SCORE Set the alignment score threshold to SCORE. Only alignments with scores >= SCORE will be used for further analysis. (default: 140.0)
--miranda-en ENERGY Set the energy threshold to ENERGY. Only alignments with energies <= ENERGY will be used for further analysis. A negative value is required for filtering to occur. (default: 1.0)
--miranda-scale SCALE Set the scaling parameter to SCALE. This scaling is applied to match / mismatch scores in the critical 7bp region near the 5' end of the microRNA. Many known examples of miRNA:Target duplexes are highly complementary in this region. This parameter can be thought of as a contrast function to more effectively detect alignments of this type. (default: 4.0)
--miranda-strict Require strict alignment in the seed region (offset positions 2-8). This option prevents the detection of target sites which contain gaps or non-cannonical base pairing in this region.
--miranda-go X Set the gap-opening penalty to X for alignments. This value must be negative. (default: -4.0)
--miranda-ge Y Set the gap-extend penalty to Y for alignments. This value must be negative. (default: -9.0)

Input file

The input file(CIRC_FILE) is a TAB-separated file with the following columns:

# Column Description
1 chr Chromosome name
2 pos1 One of the position of the circRNA junction site
3 pos2 Another position of the circRNA junction site
4 strand + / -

Note.

  • The chromosome name must be the same as the name in the SOURCE.
    • For example, "1" for "ensembl", and "chr1" for "gencode".

Output files

The main pipeline of CircMiMi outputs two main files: "summary_list.tsv" and "all_interactions.tsv".

summary_list.tsv

The summary list contains the counts of interactions and some checking results of the circRNAs.

# Column Description
1 chr Chromosome name
2 pos1 One of the position of the circRNA junction site
3 pos2 Another position of the circRNA junction site
4 strand + / -
5 pass 'yes' for the circRNA passing all of the checking items (column 9 to 13). Otherwise 'no'.
6 #circRNA_miRNA Count for the circRNA-miRNA interactions.
7 #circRNA_mRNA Count for the circRNA-mRNA interactions.
8 #circRNA_miRNA_mRNA Count for the circRNA-miRNA-mRNA interactions.
9 donor site not at the annotated boundary '1' if the donor site of the circRNA is NOT at the annotated exon boundary. Otherwise '0'.
10 acceptor site not at the annotated boundary '1' if the acceptor site of the circRNA is NOT at the annotated exon boundary. Otherwise '0'.
11 donor/acceptor sites not at the same transcript isoform '1' if the donor and acceptor are not at the same annotated transcript isoform. Otherwise '0'.
12 ambiguity with an co-linear explanation '1' if the merged flanking sequence of the circRNA junction sites has an co-linear explanation. Otherwise '0'.
13 ambiguity with multiple hits '1' if the merged flanking sequence of the circRNA junction sites is with multiple hits. Otherwise '0'.

all_interactions.tsv

# Column Description
1 chr Chromosome name
2 pos1 One of the position of the circRNA junction site
3 pos2 Another position of the circRNA junction site
4 strand + / -
5 host_gene Host gene of the circRNA
6 mirna The miRNA which may bind on the circRNA
7 max_score The maximum binding score reported by miRanda
8 count The number of the miRNA-binding sites on the circRNA
9 cross_boundary If there is a binding site across the junction of the circRNA
10 target_gene The miRNA-targeted gene
11 miRTarBase Is the miRNA-mRNA interaction reported from miRTarBase
12 miRDB Is the miRNA-mRNA interaction reported from miRDB
13 miRTarBase__ref_count The number of references reporting the interaction
14 miRDB__targeting_score The predicted target score from miRDB

Create the network file for Cytoscape

circmimi_tools network create IN_FILE OUT_FILE

Parameters

Parameter Description
IN_FILE Input the file "all_interactions.tsv" produced from the CircMiMi main pipeline.
OUT_FILE The output filename. The file extension should be ".xgmml" or ".xml", so that the Cytoscape would recognize this file as an XGMML network file.

This command can generate a Cytoscape-executable file (.xgmml) for visualization of the input circRNA-miRNA-mRNA regulatory axes in Cytoscape.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

circmimi-0.15.0.tar.gz (33.4 kB view hashes)

Uploaded Source

Built Distribution

circmimi-0.15.0-py3-none-any.whl (38.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page