Hierarchical non-parametric Bayesian clustering of digital expression data
DGEclust is a program for clustering and differential expression analysis of digital expression data generated by next-generation sequencing assays, such as RNA-seq, CAGE and others. It takes as input a table of count data and it estimates the number and parameters of the clusters supported by the data. At a later stage, these can be used for identifying differentially expressed genes and for gene- and sample-wise clustering of the original data matrix. Internally, DGEclust uses a Hierarchical Dirichlet Process Mixture Model for modeling over-dispersed count data, combined with a blocked Gibbs sampler for efficient Bayesian learning.
This program is part of the software collection of the [Computational Genomics Group](http://bioinformatics.bris.ac.uk/) at the University of Bristol and it is under continuous development. You can find more technical details on the statistical methodologies used in this software in the following papers:
- http://www.genomebiology.com/2015/16/1/39 (Vavoulis et al., Genome Biology 16:39, 2015)
- http://arxiv.org/abs/1301.4144 (Vavoulis & Gough, J Comput Sci Syst Biol 7:001-009, 2013)