A package for constructing CLIP-seq data-supported "circRNA - miRNA - mRNA" interactions
Project description
CircMiMi
A package for constructing CLIP-seq data-supported circRNA-miRNA-mRNA interactions
Table of Contents
- Requirements
- Installation
- Quick Start
- Usage
- Example
Requirements
- Python (3.6 or above)
- External tools
- bedtools (2.29.0) (https://github.com/arq5x/bedtools2)
- miranda (aug2010, 3.3a) (http://www.microrna.org/microrna/getDownloads.do)
- blat (https://genome.ucsc.edu/FAQ/FAQblat.html)
- blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
Installation
The recommended way is via conda
, a package and environment management system. (https://docs.conda.io/en/latest/)
You may install circmimi
by the following steps:
$ conda create -n circmimi python3
$ conda activate circmimi
$ pip install circmimi
For the external tools, they can also be installed via conda
with the bioconda
(https://bioconda.github.io/) channel:
$ conda install -c bioconda bedtools=2.29.0 miranda blat blast
Now, you can try the following command to test the installation,
$ circmimi_tools --help
it should print out with the help messages.
Quick Start
- Generate the references
$ circmimi_tools genref --species hsa --source ensembl --version 100 refs/
- Check the circRNAs and do some pre-filtering (optional)
$ circmimi_tools checking -r refs/ -i circRNAs.tsv -o out/ -p 5 --dist 10000
$ cat out/checking.results.tsv | awk -F'\t' '($9==1)&&($12==0)&&($16==1)' | cut -f '-5' > out/circRNAs.filtered.tsv
- Predict the interactions between circRNA-miRNA-mRNA
$ circmimi_tools interactions -r refs/ -i out/circRNAs.filtered.tsv -o out/ -p 5 --miranda-sc 175
- Visualize the interactions by creating a Cytoscape-acceptable XGMML file (optional)
$ circmimi_tools visualize out/all_interactions.miRNA.tsv out/all_interactions.miRNA.xgmml
Usage
Generate the references
circmimi_tools genref --species SPECIES --source SOURCE [--version RELEASE_VER] REF_DIR
Parameters
Parameter | Description |
---|---|
--species SPECIES | Assign the species for references. Use the species code for SPECIES. [required] |
--source SOURCE | Available values for SOURCE: "ensembl", "ensembl_plants", "ensembl_metazoa", "gencode". [required] |
--version RELEASE_VER | The release version of the SOURCE. For examples, "98" for ("hsa", "ensembl"), "M24" for ("mmu", "gencode"). If the version is not specified, the latest one will be used. |
REF_DIR | The directory for all generated references. |
Available species and sources
Code | Name | E | G | EP | EM | MB | MTB | MDB | ECR |
---|---|---|---|---|---|---|---|---|---|
ath | Arabidopsis thaliana | V | V | V | |||||
bmo | Bombyx mori | V | V | V | |||||
bta | Bos taurus | V | V | V | |||||
cel | Caenorhabditis elegans | V | V | V | V | ||||
cfa | Canis familiaris | V | V | V | V | ||||
cgr | Cricetulus griseus | V | V | V | |||||
dre | Danio rerio | V | V | V | |||||
dme | Drosophila melanogaster | V | V | V | |||||
gga | Gallus gallus | V | V | V | V | ||||
hsa | Homo sapiens | V | V | V | V | V | V | ||
mmu | Mus musculus | V | V | V | V | V | |||
osa | Oryza sativa | V | V | V | |||||
ola | Oryzias latipes | V | V | V | |||||
oar | Ovis aries | V | V | V | |||||
rno | Rattus norvegicus | V | V | V | V | ||||
ssc | Sus scrofa | V | V | V | |||||
tgu | Taeniopygia guttata | V | V | V | |||||
xtr | Xenopus tropicalis | V | V | V |
Gene annotation
- E: Ensembl (https://www.ensembl.org/index.html)
- G: Gencode (https://www.gencodegenes.org/)
- EP: Ensembl Plants (https://plants.ensembl.org/index.html)
- EM: Ensembl Metazoa (https://metazoa.ensembl.org/index.html)
Database for miRNAs
- MB: miRBase (v22) (http://www.mirbase.org/)
Databases for miRNA-mRNA interactions
- MTB: miRTarBase (v7.0)
(http://mirtarbase.mbc.nctu.edu.tw/php/index.php)(https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2019/php/index.php) - MDB: miRDB (v6.0) (http://mirdb.org/)
Databases for miRNA-mRNA interactions and RBP-related data
- ECR: ENCORI (http://starbase.sysu.edu.cn/index.php)
(Optional) Check the circRNAs
circmimi_tools checking -r REF_DIR -i CIRC_FILE [-o OUT_PREFIX] [-p NUM_PROC] [--dist INTEGER]
Parameters
Parameter | Description |
---|---|
-r, --ref REF_DIR | The directory of the pre-genereated reference files. [required] |
-i, --circ CIRC_FILE | The file of circRNAs. [required] |
-o, --out-prefix OUT_PREFIX | The prefix for the output filenames. (default: "./") |
-p, --num_proc NUM_PROC | The number of processes to use. |
-d, --dist INTEGER | The distance range for RCS checking. (default: 10000) |
Input file
The input file(CIRC_FILE) is a TAB-separated file with the following columns:
# | Column | Description |
---|---|---|
1 | chr | Chromosome name |
2 | pos1 | One of the positions of the circRNA junction site |
3 | pos2 | Another position of the circRNA junction site |
4 | strand | + / - |
5 | circ_id | (Optional) User-specified name/id of the circRNA |
Note.
- The chromosome name must be the same as the name in the SOURCE.
- For example, "1" for "ensembl", and "chr1" for "gencode".
Output file
checking.results.tsv
# | Column | Description |
---|---|---|
1 | chr | Chromosome name |
2 | pos1 | One of the position of the circRNA junction site |
3 | pos2 | Another position of the circRNA junction site |
4 | strand | + / - |
5 | circ_id | The user-specified or auto-generated name/id of the circRNA. |
6 | host_gene | The gene symbol of the host gene |
7 | donor_site_at_the_annotated_boundary | '1' if the donor site of the circRNA is at the annotated exon boundary. Otherwise '0'. |
8 | acceptor_site_at_the_annotated_boundary | '1' if the acceptor site of the circRNA is at the annotated exon boundary. Otherwise '0'. |
9 | donor_acceptor_sites_at_the_same_transcript_isoform | '1' if the donor and acceptor are at the same annotated transcript isoform. Otherwise '0'. |
10 | with an alternative co-linear explanation | '1' if the merged flanking sequence of the circRNA junction sites has an co-linear explanation. Otherwise '0'. |
11 | with multiple_hits | '1' if the merged flanking sequence of the circRNA junction sites is with multiple hits. Otherwise '0'. |
12 | alignment ambiguity (with an alternative co-linear explanation or multiple hits) | '1' if the merged flanking sequence of the circRNA junction sites is with an alternative co-linear explanation or with multiple hits. Otherwise '0'. |
13 | #RCS across flanking sequences | The number of RCS pairs of which across flanking sequences. |
14 | #RCS within the flanking sequence (the donor side) | The number of RCS pairs of which within the flanking sequences of donor site. |
15 | #RCS within the flanking sequence (the acceptor side) | The number of RCS pairs of which within the flanking sequences of acceptor site. |
16 | #RCS_across-#RCS_within>=1 (yes: 1; no: 0) |
Predict the interactions between circRNA-miRNA-mRNA
circmimi_tools interactions -r REF_DIR -i CIRC_FILE [-o OUT_PREFIX] [-p NUM_PROC] \
[--miranda-sc SCORE] [--miranda-en ENERGY] [--miranda-scale SCALE] [--miranda-strict] [--miranda-go X] [--miranda-ge Y]
Parameters
Parameter | Description |
---|---|
-r, --ref REF_DIR | The directory of the pre-genereated reference files. [required] |
-i, --circ CIRC_FILE | The file of circRNAs. [required] |
-o, --out-prefix OUT_PREFIX | The prefix for the output filenames. (default: "./") |
-p, --num_proc NUM_PROC | The number of processes. |
The miRanda parameters are also available (see the manual of miRanda).
Parameters | Description |
---|---|
--miranda-sc SCORE | Set the alignment score threshold to SCORE. Only alignments with scores >= SCORE will be used for further analysis. (default: 140.0) |
--miranda-en ENERGY | Set the energy threshold to ENERGY. Only alignments with energies <= ENERGY will be used for further analysis. A negative value is required for filtering to occur. (default: 1.0) |
--miranda-scale SCALE | Set the scaling parameter to SCALE. This scaling is applied to match / mismatch scores in the critical 7bp region near the 5' end of the microRNA. Many known examples of miRNA:Target duplexes are highly complementary in this region. This parameter can be thought of as a contrast function to more effectively detect alignments of this type. (default: 4.0) |
--miranda-strict | Require strict alignment in the seed region (offset positions 2-8). This option prevents the detection of target sites which contain gaps or non-cannonical base pairing in this region. |
--miranda-go X | Set the gap-opening penalty to X for alignments. This value must be negative. (default: -4.0) |
--miranda-ge Y | Set the gap-extend penalty to Y for alignments. This value must be negative. (default: -9.0) |
Input file
The input file(CIRC_FILE) is a TAB-separated file with the following columns:
# | Column | Description |
---|---|---|
1 | chr | Chromosome name |
2 | pos1 | One of the position of the circRNA junction site |
3 | pos2 | Another position of the circRNA junction site |
4 | strand | + / - |
5 | circ_id | (Optional) User-specified name/id of the circRNA |
Note.
- The chromosome name must be the same as the name in the SOURCE.
- For example, "1" for "ensembl", and "chr1" for "gencode".
Output files
There would output two main files:
- "summary_list.tsv"
- "all_interactions.miRNA.tsv"
summary_list.tsv
The summary list contains the counts of interactions and some checking results of the circRNAs.
# | Column | Description |
---|---|---|
1 | chr | Chromosome name |
2 | pos1 | One of the position of the circRNA junction site |
3 | pos2 | Another position of the circRNA junction site |
4 | strand | + / - |
5 | circ_id | The user-specified or auto-generated name/id of the circRNA. |
6 | host_gene | The gene symbol of the host gene |
7 | #circRNA_miRNA | Count for the circRNA-miRNA interactions. |
8 | #circRNA_mRNA | Count for the miRNAs-mediated circRNA-mRNA interactions. |
9 | #circRNA_miRNA_mRNA | Count for the circRNA-miRNA-mRNA interactions. |
10 | pass | 'yes' if the circRNA passing all of the checking items (column 11 to 15). Otherwise 'no'. |
11 | donor site not at the annotated boundary | '1' if the donor site of the circRNA is NOT at the annotated exon boundary. Otherwise '0'. |
12 | acceptor site not at the annotated boundary | '1' if the acceptor site of the circRNA is NOT at the annotated exon boundary. Otherwise '0'. |
13 | donor/acceptor sites not at the same transcript isoform | '1' if the donor and acceptor are not at the same annotated transcript isoform. Otherwise '0'. |
14 | ambiguity with an co-linear explanation | '1' if the merged flanking sequence of the circRNA junction sites has an co-linear explanation. Otherwise '0'. |
15 | ambiguity with multiple hits | '1' if the merged flanking sequence of the circRNA junction sites is with multiple hits. Otherwise '0'. |
all_interactions.miRNA.tsv
# | Column | Description |
---|---|---|
1 | chr | Chromosome name |
2 | pos1 | One of the position of the circRNA junction site |
3 | pos2 | Another position of the circRNA junction site |
4 | strand | + / - |
5 | circ_id | The user-specified or auto-generated name/id of the circRNA. |
6 | host_gene | Host gene of the circRNA |
7 | mirna | The miRNA which may bind on the circRNA |
8 | max_score | The maximum binding score reported by miRanda |
9 | num_binding_sites | The number of binding sites of the miRNA on the circRNA |
10 | cross_boundary | '1' if there is a binding site across the junction of the circRNA. Otherwise '0'. |
11 | MaxAgoExpNum | The maximum number of supporting CLIP-seq experiments |
12 | num_AGO_supported_binding_sites | The number of AGO-supported miRNA-binding sites |
13 | target_gene | The miRNA-targeted gene |
14 | miRTarBase | '1' if the miRNA-mRNA interaction is reported from miRTarBase. Otherwise '0'. |
15 | miRDB | '1' if the miRNA-mRNA interaction is reported from miRDB. Otherwise '0'. |
16 | ENCORI | '1' if the miRNA-mRNA interaction is reported from ENCORI. Otherwise '0'. |
17 | miRTarBase__ref_count | The number of references reporting the interaction |
18 | miRDB__targeting_score | The predicted target score from miRDB |
19 | ENCORI__geneID | The gene ID of the target gene |
20 | ENCORI__geneType | The gene type of the target gene |
21 | ENCORI__clipExpNum | The number of supporting CLIP-seq experiments |
22 | ENCORI__RBP | RBP name |
23 | ENCORI__PITA | The number of target sites predicted by PITA |
24 | ENCORI__RNA22 | The number of target sites predicted by RNA22 |
25 | ENCORI__miRmap | The number of target sites predicted by miRmap |
26 | ENCORI__microT | The number of target sites predicted by microT |
27 | ENCORI__miRanda | The number of target sites predicted by miRanda |
28 | ENCORI__PicTar | The number of target sites predicted by PicTar |
29 | ENCORI__TargetScan | The number of target sites predicted by TargetScan |
30 | ENCORI__pancancerNum | The number of cancer types |
Note.
For now, the ENCORI data are only work for 'human' and 'mouse'.
(Optional) Visualize the interactions
circmimi_tools visualize [options] IN_FILE OUT_FILE
Parameters
Parameter | Description |
---|---|
IN_FILE | Input the file "all_interactions.miRNA.tsv", which is the output file from 'interactions'. |
OUT_FILE | The output filename. The file extension should be ".xgmml" or ".xml", so that the Cytoscape could recognize this file as an XGMML network file. |
-1 INT | column key for circRNAs. |
-2 INT | column key for mediators. |
-3 INT | column key for mRNAs. |
This command can generate a Cytoscape-executable file (.xgmml) for visualization of the input circRNA-miRNA-mRNA regulatory axes in Cytoscape.
Example
Please see the "examples" directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for circmimi-0.17.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14fc08f809722de768a8146868a539d2d76ab90e9a702b4711b6ff6ceb747aca |
|
MD5 | 86ff00d4025b956f7bb7d410a71f3a15 |
|
BLAKE2b-256 | bc229965b4ffc03cc5aaaa65f5d6ebedc5b399e41818337a60f7a75a1227a98a |