Skip to main content

Molecular Outlier DEtection from Rna-seq data

Project description

Github Release python Release system type other

Introduction

MODER(Molecular Outlier DEtection from Rna sequencing assays) is a comprehensive and user-friendly toolkit to detect aberrant gene expression, alternative splicing, and allele specific expression from multiple samples. MODER is built on python3 and easy to use. Users only need to provide a list of bam files, and MODER will do all complicated, error-prone processing automatically and return all three kinds of outliers (gene~sample pairs).

Framework

Documentation

Documentation can be found on here

Dependency

bioinfomatics software

  • samtools: samtools release
  • bedtools: bedtools release
  • bcfools: bcftools release

If you have installed conda, you can easily install samtools and bcftools by following command.

conda install -c bioconda samtools
conda install -c bioconda bedtools
conda install -c bioconda bcftools

If your are working with Debian-based linux system, it's convenient for you to install samtools and bctools by package manager -- apt

sudo apt install samtools
sudo apt install bedtools
sudo apt install bcftools

python package

  • Cython release
  • numpy release
  • pystan release
  • pysam release
  • pandas release
  • plotnine release
  • scipy release

Installation

For install MODER, you can use git to pull down all code to your linux system. Make sure samtools, bcttools and all dependency third-party python libraries has been installed, then you call use it easily by a python script named moder.py. Look for Usgae to get more information about how to use this program.

git clone -b singleTissue https://github.com/Xu-Dong/mOutlierPipe.git

Usage

mode argument

option description
--expression assign mode to analysis Gene Expression data
--splicing assign mode to analysis Splicing data
--ase assign mode to analysis ASE data

we provide three arguments to decide which analysis pipeline will be run, and all three analysis pipeline will be run if you don't provide any option of these, :
look module1 for more information of expression pipeline.
look module2 for more information of splicing pipeline.
look module3 for more information of ase pipeline.

basic argument

option description
-i , --input txt file with all input bam file path (required)
--gtf genome annotation file of GTF format (required)
-o , --output directory to store all resulting files
(optional and default output dir is current directory)
-p , --parallel parallel number
(optional and default value is 1)
--threshold threshold of z_score, used to get outliers which abs value larger than threshold defined by this arguments
(optional and default value is 2)

more arguments and their usage, you can refer to featureCounts, peer, leafcutter, SPOT, gtfToGenePred and genePredToBed

you can run all these pipeline by command as follow:

python moder.py -p 8
	--input file_path.txt
	--gtf genome_annotation.gtf
	--vcf example.vcf.gz
	--variation Vg_GTEx_v8.txt
	--tissue MSCLSK
	--threshold 2

module1: Expression Data Analysis

This module is designed to analysis gene expression data. The basic command line arguments and descriptions as follows. More available parameters refer to RNA-SeQC and PEER

command line arguments

option description
--expression assign mode to analysis Gene Expression data
-i , --input txt file with all input bam file path (required)
--gtf genome annotation file in GTF format (required)
-o , --output directory to store all resulting files
(optional and default output dir is current directory)
-p , --parallel parallel number
(optional and defalut value is 1)
--threshold threshold of z_score, used to filter results' value larger than threshold
(optional and default value is 2)

running example

python mOutlierPipe.py --expression 
	--parallel 8 
	--input file_path.txt
	--gtf sample_annotation.gtf
	--threshold 2

module2: Splicing Data Analysis

This module is designed to analysis splicing data. The basic command line arguments and descriptions as follows. More available parameters refer to leafcutter, SPOT and PEER

command line arguments

option description
--splicing assign mode to analysis Splicing data
-i , --input txt file with all input bam file path (required)
--gtf genome annotation file in GTF format, used to translate cluster id to gene id (required)
-o , --output directory to store all resulting files
(optional and default output dir is current directory)
-p , --parallel parallel number
(optional and default value is 1)
--threshold threshold of z_score, in splicing analysis pipeline, the value of z will be translated to p
(optional and default value is 0.0027)

running example

python mOutlierPipe.py --splicing 
	--parallel 8
	--input file_path.txt
	--gtf genome_annotation.gtf
	--threshold 2

module3: Allele Specific Expression Analysis

This module is designed to analysis allele specific expression data. The basic command line arguments and descriptions as follows. More available parameters refer to phASER

command line arguments

option description
--ase assign mode to analysis ASE data
-i , --input txt file with all input bam file path (required)
--gtf genome annotation file in GTF format, used to translate cluster id to gene id (required)
--vcf Variant Call Format file, include variation information about the genome (required)
--variant tissue-specific estimates of genetic variation in gene dosage (required)
-o , --output directory to store all resulting files
(optional and default output dir is current directory)
-p , --parallel parallel number
(optional and default value is 1)
--threshold threshold of z_score, in ase analysis pipeline, the value of z will be translated to p
(optional and default value is 0.0027)

running example

python mOutlierPipe.py --ase
	--parallel 8
	--input file_path.txt
	--gtf genome_annotation.gtf
	--vcf sample.vcf
	--variant Vg_GTEx_v8.txt
	--threshold 2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glh-test-0.0.1.tar.gz (8.0 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page