Python package for testing strandedness of RNA-Seq fastq files
Project description
Python package for testing strandedness of RNA-Seq fastq files
Ever get RNA-Seq data where the library prep or strandedness has been omitted in the methods?
This should save some headaches later in your pipeline and analysis when you realise you’ve used the wrong strandedness setting (RF/fr-firststrand, FR/fr-secondstrand, unstranded)
Requirements
how_are_we_stranded_here requires the following packages be installed:
kallisto == 0.44.x
python >= 3.6.0
RSeQC
It also requires a transcriptome annotation (.fasta file - e.g. ensembl’s .cdna.fasta, or a prebuilt kallisto index), and a corresponding gtf.
Sometimes pseudoalignments will not work with newer versions of kallisto. If this is an issue, we suggest downgrading to 0.44.0.
Installation
pip install how_are_we_stranded_here
Usage
For basic usage, run check_strandedness with a gtf transcript annotation, transcripts fasta file and fastq read files from one sample.
check_strandedness --gtf Yeast.gtf --transcripts Yeast_cdna.fasta --reads_1 Sample_A_1.fq.gz --reads_2 Sample_A_2.fq.gz
Output
check_strandedness will print to console the results of infer_experiment.py (http://rseqc.sourceforge.net/#infer-experiment-py), along with an interpretation.
checking strandedness
Reading reference gene model stranded_test_WT_yeast_rep1_1_val_1_1/Saccharomyces_cerevisiae.R64-1-1.98.bed ... Done
Loading SAM/BAM file ... Total 20000 usable reads were sampled
This is PairEnd Data
Fraction of reads failed to determine: 0.0595
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0073 (0.8% of explainable reads)
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9332 (99.2% of explainable reads)
Over 90% of reads explained by "1+-,1-+,2++,2--"
Data is likely RF/fr-firststrand
Any intermediate files are written to a folder in your current working directory derived from the name of the reads_1 file.
How it Works
check_strandedness.py runs a series of commands to check which direction reads align once mapped in transcripts.
It first creates a kallisto index (or uses a pre-made index) of your organisms transcriptome.
It then maps a small subset of reads (default 200000) to the transcriptome, and uses kallisto’s –genomebam argument to project pseudoalignments to genome sorted BAM file.
It finally runs RSeQC’s infer_experiment.py to check which direction reads from the first and second pairs are aligned in relation to the transcript strand, and provides output with the likely strandedness of your data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file how_are_we_stranded_here-1.0.1.tar.gz
.
File metadata
- Download URL: how_are_we_stranded_here-1.0.1.tar.gz
- Upload date:
- Size: 32.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.6.0.post20210108 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5db2fde61409e1c37ef65b9065c3935c5a462130c939e64c810dc022f47f559a |
|
MD5 | a3456c87409e9d1b388df02919353873 |
|
BLAKE2b-256 | 7b69779749cdcc8f059b6f578849a0f4f13c362c236f880feab57d6930638a0e |
File details
Details for the file how_are_we_stranded_here-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: how_are_we_stranded_here-1.0.1-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.6.0.post20210108 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e09d80f3c849f5a93ac8f516beec376373f2e9135d7fbe146ef95bca777c221 |
|
MD5 | bdeac091f9eeadf1fa455a6359425407 |
|
BLAKE2b-256 | 12b5483a02769e127eba72873cf537e5673841c93a98ac75f4ca38f843353c03 |