Skip to main content

Python tools for extracting intron retention events

Project description

intron_retention_utils

License: GPL v3 Build Status

A software for calculating intron retention events genome-wide from RNA sequencing data.

Dependency

Python

pysam>=0.9.0,annot_utils>=0.3.0 packages.

Software

bedtools, hstlib

Install

pip install intron_retention_utils

Alternatively, you can install from the source code.

git clone  https://github.com/friend1ws/intron_retention_util.git
cd intron_retention_utils
python setup.py build install

This package has been tested on Python 2.7, 3.5, 3.6.

Preparation

For allele_count command, a Smith-Waterman shared library from Mengyao Zhao is necessary. Also, if your BAM file is aligned to reference genome using name convention other than UCSC, Create the libssw.so and add the path to the LD_LIBRARY_PATH environment variable.

Commands

simple_count

Simple intron retention count program. Calculate the number of reads covering each exon-intron boundary and putative intron retention reads (that covering enlarged reagion by specified margin size (e.g. -5bp and +5bp from that boundary).

intron_retention_utils simple_count [-h] 
                                    [--grc]
                                    [--genome_id {hg19,hg38,mm10}]
                                    [--intron_retention_check_size intron_retention_check_size]
                                    [--mapping_qual_thres mapping_qual_thres]
                                    [--keep_improper_pair] [--debug]
                                    sequence.bam output_file

About result

  • Chr: chromosome of the exon-intron boundary
  • Boundary_Pos: coordinate of the exon-intron boundary (the last exonic base)
  • Gene_Symbol: gene symbol from refGene.txt.gz
  • Motif_Type: splicing donor or acceptor
  • Strand: transcription starnd of the gene
  • Junction_List: cannonical splicing junction list from that exon-intron boundary
  • Gene_ID_List: refGene ID list with that exon-intron boundary
  • Exon_Num_List: exon numbers for each refGene IDs
  • Edge_Read_Count: the number of reads covering each exon-intron boundary
  • Intron_Retention_Read_Count: the number of putative intron retention reads

allele_count

intron_retention_utils allele_count [-h] 
                                    [--grc]
                                    [--genome_id {hg19,hg38,mm10}]
                                    [--donor_size donor_size]
                                    [--acceptor_size acceptor_size]
                                    [--template_size check_size]
                                    [--template_score_margin check_size]
                                    [--read_search_margin read_search_margin]
                                    [--debug]
                                    sequence.bam mutation.txt
                                    output.txt reference.fa

About result

  • Gene_Symbol: gene symbol
  • Chr_Mut: chromosome of the mutation
  • Start_Mut: start coordinate of the mutation
  • End_Mut: end coordinate of the the mutation
  • Ref_Mut: reference allele of the mutation
  • Alt_Mut: alternative allele of the mutation
  • Chr_Motif: chromosome of the splicing motif
  • Start_Motif: start coordinate of the splicing motif
  • End_Motif: end coordinate of the splicing motif
  • Type_Motif: donor or acceptor
  • Strand_Motif: transcription strand of the gene
  • Splice_Junction_Negative: the number of normaly spliced reads without the alternative allele
  • Splice_Junction_Positive: the number of normaly spliced reads with the alternative allele
  • Intron_Retention_Negative: the number of putative intron retention reads without the alternative allele
  • Intron_Retention_Positive: the number of putative intron retention reads with the alternative allele

merge_control

Merge the intron retention file of control data (typically) for later filtering.

intron_retention_utils merge_control [-h] 
                                     [--ratio_thres RATIO_THRES]
                                     [--sample_num_thres SAMPLE_NUM_THRES]
                                     intron_retention_list.txt
                                     output_file

filter

Filter out intron retentions that do not satisty specified conditions

intron_retention_utils filter [-h] 
                              [--num_thres NUM_THRES]
                              [--ratio_thres RATIO_THRES]
                              [--pooled_control_file POOLED_CONTROL_FILE]
                              intron_retention.txt output.txt

associate

Associate intron retention counts (typically output of simple_count commands) with mutations

intron_retention_utils associate [-h] [--donor_size donor_size]
                                        [--acceptor_size acceptor_size]
                                        [--mutation_format {vcf,anno}]
                                        [--reference reference.fa] [--sv]
                                        [--intron_margin intron_margin]
                                        [--debug]
                                        intron_retention.txt mutation.txt
                                        output_file

About result

The following columns are added to the input files:

  • Mutation_Key: vcf format mutation aggregated by commas
  • Motif_Pos: coordinate of motif positions
  • Mutation_Type: splicing donor disruption or splicing acceptor disruption
  • Is_Canonical: whether the mutation is disrupting cannonical splicing motifs (GT-AG) or not
  • Intron_Retention_Type: direct-impact or opposite-side-impact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intron_retention_utils-0.6.3.tar.gz (25.1 kB view details)

Uploaded Source

File details

Details for the file intron_retention_utils-0.6.3.tar.gz.

File metadata

  • Download URL: intron_retention_utils-0.6.3.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8

File hashes

Hashes for intron_retention_utils-0.6.3.tar.gz
Algorithm Hash digest
SHA256 a64f2cc79e82b7ab6210c6ddd1e4ab8e3890bd2b1d6f5f4a9e4778db1935b453
MD5 04feb1d567c4e50c96cf92d085752f73
BLAKE2b-256 5345e8b753d19a3630049c94fde4ee8144e30de237cb2ee3c668669b968e8914

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page