Outrigger detects *de novo* exons and quantifies their percent spliced-in
Project description
Large-scale detection and calculation of alternative splicing with Outrigger
Outrigger is a program which uses junction reads from RNA seq data, and a graph database to create a de novo alternative splicing annotation with a graph database, and quantify percent spliced-in (Psi) of the events.
Free software: BSD license
Documentation is available here: http://yeolab.github.io/outrigger/
Features
Finds novel splicing events, including novel exons! (outrigger index) from .bam files
(optional) Validates that exons have correct splice sites, e.g. GT/AG and AT/AC for mammalian systems (outrigger validate)
Calculate “percent spliced-in” (Psi/Ψ) scores for all your samples given the validated events (or the original events if you opted not to validate) via outrigger psi
Installation
To install outrigger, we recommend using the Anaconda Python Distribution and creating an environment.
You’ll want to add the bioconda channel to make installing bedtools and its Python wrapper, pybedtools easy (these programs are necessary for both outrigger index and outrigger validate).
conda config --add channels r conda config --add channels bioconda
Create an environment called outrigger-env. Python 2.7, Python 3.4, Python 3.5, and Python 3.6 are supported.
conda create --name outrigger-env outrigger
Now activate that environment:
source activate outrigger-env
To check that it installed properly, try the command with the help option (-h), outrigger -h. The output should look like this:
$ outrigger -h usage: outrigger [-h] [--version] {index,validate,psi} ... outrigger (1.0.0dev). Calculate "percent-spliced in" (Psi) scores of alternative splicing on a *de novo*, custom-built splicing index -- just for you! positional arguments: {index,validate,psi} Sub-commands index Build an index of splicing events using a graph database on your junction reads and an annotation validate Ensure that the splicing events found all have the correct splice sites psi Calculate "percent spliced-in" (Psi) values using the splicing event index built with "outrigger index" optional arguments: -h, --help show this help message and exit --version show program's version number and exit
Bleeding edge code from Github (here)
For advanced users, if you have git and Anaconda Python installed, you can:
Clone this repository
Change into that directory
Create an environment named outrigger-env with the necessary packages from Anaconda and the Python Package Index (PyPI).
Activate the environment
These steps are shown in code below.
git clone https://github.com/YeoLab/outrigger.git cd outrigger conda env create --file environment.yml source activate outrigger-env
Quick start
If you just want to know how to run this on your data with the default parameters, start here. Let’s say you performed your alignment in the folder called ~/projects/tasic2016/analysis/tasic2016_v1, and that’s where your SJ.out.tab files from the STAR aligner are (they’re output into the same folder as the .bam files). First you’ll need to change directories to that folder with cd.
cd ~/projects/tasic2016/analysis/tasic2016_v1
Then you need find all alternative splicing events, which you do by running outrigger index on the splice junction files and the gtf. Here is an example command:
Input: .SJ.out.tab files
outrigger index --sj-out-tab *SJ.out.tab \ --gtf /projects/ps-yeolab/genomes/mm10/gencode/m10/gencode.vM10.annotation.gtf
Input: .bam files
If you’re using .bam files instead of SJ.out.tab files, never despair! Below is an example command. Keep in mind that for this program to work, the events must be sorted and indexed.
outrigger index --bam *sorted.bam \ --gtf /projects/ps-yeolab/genomes/mm10/gencode/m10/gencode.vM10.annotation.gtf
Next, you’ll want to validate that the splicing events you found follow biological rules, such as being containing GT/AG (mammalian major spliceosome) or AT/AC (mammalian minor splicesome) sequences. To do that, you’ll need to provide the genome name (e.g. mm10) and the genome sequences. An example command is below:
outrigger validate --genome mm10 \ --fasta /projects/ps-yeolab/genomes/mm10/GRCm38.primary_assembly.genome.fa
Finally, you can calculate percent spliced in (Psi) of your splicing events! Thankfully this is very easy:
outrigger psi
It should be noted that ALL of these commands should be performed in the same directory, so no moving.
Quick start summary
Here is a summary the commands in the order you would use them for outrigger!
cd ~/projects/tasic2016/analysis/tasic2016_v1 outrigger index --sj-out-tab *SJ.out.tab \ --gtf /projects/ps-yeolab/genomes/mm10/gencode/m10/gencode.vM10.annotation.gtf outrigger validate --genome mm10 \ --fasta /projects/ps-yeolab/genomes/mm10/GRCm38.primary_assembly.genome.fa outrigger psi
This will create a folder called outrigger_output, which at the end should look like the one below. Each file and folder is annotated with which command produced it.
$ tree outrigger_output outrigger_output..........................................................index ├── index.................................................................index │ ├── gtf...............................................................index │ │ ├── gencode.vM10.annotation.gtf...................................index │ │ ├── gencode.vM10.annotation.gtf.db................................index │ │ └── novel_exons.gtf...............................................index │ ├── exon_direction_junction_triples.csv...............................index │ ├── mxe...............................................................index │ │ ├── event.bed.....................................................index │ │ ├── events.csv....................................................index │ │ ├── exon1.bed.....................................................index │ │ ├── exon2.bed.....................................................index │ │ ├── exon3.bed.....................................................index │ │ ├── exon4.bed.....................................................index │ │ ├── intron.bed....................................................index │ │ ├── splice_sites.csv...........................................validate │ │ └── validated..................................................validate │ │ └── events.csv.............................................validate │ └── se................................................................index │ ├── event.bed.....................................................index │ ├── events.csv....................................................index │ ├── exon1.bed.....................................................index │ ├── exon2.bed.....................................................index │ ├── exon3.bed.....................................................index │ ├── intron.bed....................................................index │ ├── splice_sites.csv...........................................validate │ └── validated..................................................validate │ └── events.csv.............................................validate ├── junctions.............................................................index │ ├── metadata.csv......................................................index │ └── reads.csv.........................................................index └── psi.....................................................................psi ├── mxe.................................................................psi | ├── psi.csv.........................................................psi │ └── summary.csv.....................................................psi ├── outrigger_psi.csv...................................................psi └── se..................................................................psi ├── psi.csv.........................................................psi └── summary.csv.....................................................psi 10 directories, 26 files
History
v1.1.0 (June 28th, 2017)
This is a minor release to `outrigger.
Bug fixes
Fixed UNIQUE ID error that happened somewhat stochastically when adding new exons to the database
Miscellaneous
Explicitly added Python 3.6 compatibility
Change logo location to logo/ instead of logo/v1 since there’s only one version anyway…
v1.0.0 (April 3rd, 2017)
This is the first major release of outrigger!!!
v1.0.0 New features
Parallelized event across chromosomes
Added --low-memory flag for index, validate, and psi commands to use a smaller memory footprint when reading CSV files.
Added --splice-types option to specify only one kind of splicing you’d like to find
So the user can double-check the Psi calculation, create a summary.csv file indicating the number of reads found at each junction, for all samples - This also shows which “Case” corresponds to each event in each sample, so you can see whether there were sufficient or insufficient reads on the junctions of each event, and how outrigger judged it.
Added functions to extract constitutive and alternative exons separately
v1.0.0 Bug fixes
Fixed a bug that stalled on .bam files while counting the junctions
v1.0.0 Miscellaneous
Added GC/AG to valid splice sites
v0.2.9 (November 11th, 2016)
This is a non-breaking release with many speed improvements, and upgrade is recommended.
v0.2.9 New features
Add bam alignment files as input option
Miscellaneous
Parallelized Psi calculation, the exact number of processors can be specified with --n-jobs, and by default, --n-jobs is -1, which means use as many processors as are available.
v0.2.8 (October 23rd, 2016)
Updated README/HISTORY files
v0.2.7 (October 23rd, 2016)
v0.2.7 New features
Added outrigger validate command to check for canonical splice sites by default: GT/AG (U1, major spliceosome) and AT/AC (U12, minor spliceosome). Both of these are user-adjustable as they are only the standard for mammalian genomes.
v0.2.7 API changes
Added --resume and --force options to outrigger index to prevent the overwriting of interrupted indexing operations, or to force overwriting. By default, outrigger complains and cowardly exits.
v0.2.7 Bug fixes
Support ENSEMBL gtf files which specify chromsome names with a number, e.g. 4 instead of chr4. Thank you to lcscs12345 for pointing this out!
v0.2.7 Miscellaneous
Added version info with outrigger --version
Sped up gffutils queries and event finding by running ANALYZE on SQLite databases.
v0.2.6 (September 15th, 2016)
This is a non-breaking patch release
v0.2.6 Bug fixes
Wasn’t concatenating exons properly after parallelizing
v0.2.6 Miscellaneous
Clarified .gtf file example for directory output
v0.2.5 (September 14th, 2016)
v0.2.5 Bug fixes
Added joblib to requirements
v0.2.4 (September 14th, 2016)
This is a non-breaking patch release of outrigger.
v0.2.4 New features
Actually parallelized exon finding for novel exons. Before had written the code and tested the non-parallelized version but now using actually parallelized version!
v0.2.4 Bug fixes
Don’t need to turn on --debug command for outrigger to even run
v0.2.3 (September 13th, 2016)
This is a patch release of outrigger, with non-breaking changes from the previous one.
Bug fixes
Subfolders get copied when installing
Add test for checking that outrigger -h command works
v0.2.2 (September 12th, 2016)
This is a point release which includes the index submodule in the __all__ statement.
v0.2.1 (September 12th, 2016)
This is a point release which actually includes the requirements.txt file that specifies which packages outrigger depends on.
v0.2.0 (September 9th, 2016)
This is the second release of outrigger!
New features
Parallelized exon finding for novel exons
Added outrigger validate command to check that your new exons have proper splice sites (e.g. GT/AG and AT/AC)
Added more test data for other event types (even though we don’t detect them yet)
v0.1.0 (May 25, 2016)
This is the initial release of outrigger
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file outrigger-1.1.1.tar.gz
.
File metadata
- Download URL: outrigger-1.1.1.tar.gz
- Upload date:
- Size: 57.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 456f882f2562cab6543aec4b8e177815384b72b7b33363b6982d9fd752fe2526 |
|
MD5 | ac911a16daa78480902ae7cdcde655b0 |
|
BLAKE2b-256 | ca03860e2dda28aeba28f9fc552c6af9e9e2ff13e806a4437726c8c024c0efd2 |