Estimate abundances of genomic features from read densities
Rnacounter estimates abundances of genes and their different transcripts from read alignments. Exons and introns can also be quantified.
It provides fast read counting in annotated genomic features as well as a simple, yet efficient solution to the quantification of isoforms from RNA-seq data. The method used is described in [<ref>]. A typical run is expected to take less than 2 minutes for a 1Gb BAM file from mouse RNA sequencing, increasing linearly with the BAM size.
For all these tasks it only requires a BAM file from a read mapping on the genome, and a single GTF/GFF file describing the exon structure such as those provided by Ensembl or GenRep.
It is not meant to be used as a library, but through its command-line tool “rnacounter”.
The code project is hosted in Github (https://github.com/delafont/rnacounter), GPL-2 licensed.
See “rnacounter –help” and the tutorial at http://bbcf.epfl.ch/bbcflib/tutorial_rnacounter.html, also available in the doc/ folder.
rnacounter test.bam test.gtf
First ensure that you have numpy installed, then install rnacounter. With easy_install:
sudo easy_install numpy sudo easy_install rnacounter
Or better yet, with pip:
sudo pip install numpy sudo pip install rnacounter
It installs as a standard Python library but includes the executable and puts it somewhere in your $PATH. Dependencies will be added automatically.
Check that it works with the test command:
It should display something similar to this:
ID Count RPKM Chrom Start End Strand GeneName Type Sense Synonym ENSMUSG00000038271 0.0 0.0 chr6 125095258 125111800 1 Iffo1 Gene . . ENSMUSG00000057666 3956.87179487 434612.223694 chr6 125111870 125116485 -1 Gapdh Gene . . ENSMUSG00000038252 0.0 0.0 chr6 125118026 125141613 -1 Ncapd2 Gene . .
To uninstall with pip:
sudo pip uninstall rnacounter
The code is fully compatible with Python 2.7 and Python 3.
Building from source:
This allows to modify the Cython source code (rnacounter.pyx) before rebuilding.
Clone or download the repository from https://github.com/delafont/rnacounter .
You need cython installed (pip install cython).
From where rnacounter.pyx lies (rnacounter/rnacounter/), run:
sudo python setup.py build_ext
It will recompile to create rnacounter.c, and build it. Then add the executable (rnacounter/bin/rnacounter) to your $PATH, or install from the package root (rnacounter/) with:
sudo python setup.py install
Tests run with the library versions below, but may work with earlier versions.
- setuptools 7.0+ (installation)
- pysam 0.7.5+ (samtools wrapper)
- numpy 1.6.2+ (efficient numeric arrays)
- scipy 0.9.0+ (NNLS algorithm)
- docopt 0.6.1+ (command-line args parsing)
- cython 0.20+ (translate Python code to C)
Testing files in the testfiles/ folder: - gapdhKO.bam: alignment on mm9 with only Gapdh covered. - mm9_3genes_renamed.gtf: extract of the Ensembl GTF with Gapdh, the gene before and the gene after it. - mm9_Gapdh_renamed.gtf: extract of the Ensembl GTF with Gapdh only.
rnacounter testfiles/gapdhKO.bam testfiles/mm9_3genes_renamed.gtf
(which is equivalent to what the test command does):
The BAM contains 4041 reads all aligning perfectly on Gapdh (ENSMUSG00000057666) exons, mostly on ENSMUSE00000487077 but also ENSMUSE00000751942 and ENSMUSE00000886744. Nothing on other exons, which makes it a good example of badly conditioned input data…
The least squares method returns counts on the following transcripts: ENSMUST00000117757, ENSMUST00000118875, ENSMUST00000147954 and nothing on ENSMUST00000073605, ENSMUST00000144205, ENSMUST00000144588 .