Normalization of RNA-seq gene expression.
Project description
Normalization of RNA-seq gene expression data. Supported methods:
Counts per million (CPM)
Transcript per kilobase million (TPM)
Quantile normalization to average distribution
The TPM normalization can either accept pre-computed gene lengths on the input or compute gene lengths from gene annotation in GTF format, using the union exon-based approach. The computed gene lengths are identical to the lengths reported by featureCounts (validated for Homo sapiens, Mus musculus, Rattus norvegicus, and Macaca mulatta of ENSEMBL and UCSC annotations).
Quantile normalization is implemented as described on Wikipedia. First, we compute an average distribution by sorting each sample (column) and taking the mean over rows to determine the rank values. Second, we compute ranks over columns (samples) and substitute the rank with the rank value (average expression for each rank).
Usage
Install rnanorm Python package:
pip install rnanorm
See rnanorm command help:
rnanorm --help
Run rnanorm with pre-computed gene lengths:
rnanorm expr.tsv --cpm-output=expr.cpm.tsv --tpm-output=expr.tpm.tsv --gene-lengths=lengths.tsv
Run rnanorm with genome annotation - gene lengths will be computed on the fly:
rnanorm expr.tsv --cpm-output=expr.cpm.tsv --tpm-output=expr.tpm.tsv --annotation=annot.gtf
For quantile normalization we suggest using TPM expressions on the input:
rnanorm expr.tpm.tsv --quantile-output=expr.quantile.tsv
Contributing
Install rnanorm Python package for development:
flit install --deps=all --symlink
Run all tests and linters:
tox
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.