A python3 bokeh based categorical dendrogram and heatmap plotting library.
Project description
# BokehHeat
## Abstract
Bokehheat provides a python3, bokeh based, interactive
categorical dendrogram and heatmap plotting implementation.
+ Minimal requirement: python 3.6
+ Dependencies: bokeh, pandas, scipy
+ Programmer: bue, jenny
+ Date origin: 2018-08
+ License: >= GPLv3
+ User manual: this README file
+ Source code: [https://gitlab.com/biotransistor/bokehheat](https://gitlab.com/biotransistor/bokehheat)
Available bokehheat plots are:
+ heat.cdendro: an interactive categorical dendrogram plot implementation.
+ heat.bbar: an interactive boolean bar plot implementation.
+ heat.cbar: an interactive categorical bar plot implementation.
+ heat.qbar: an interactive quantitative bar plot implementation.
+ heat.heatmap: an interactive heatmap implementation.
+ heat.clustermap: an interactive cluster heatmap implementation which combines
heat.cdendro, heat.bbar, heat.cbar, heat.qbar and heat.heatmap under the hood.
## Example Results
For the real interactive experience please clone or download this repository
and open theclustermap.html file with your favorite web browser
(we recommend [FireFox](https://www.mozilla.org/en-US/firefox/developer/)).
![heat.clustermap image](theclustermap.png)
**Figure:** This is a poor, static heat.clustermap html result screenshot.
## HowTo Guide
How to install bokehheat?
```bash
pip3 install bokehheat
```
How to load the bokehheat library?
```python
from bokehheat import heat
```
Howto get reference information about how to use each bokehheat module?
```python
from bokehheat import heat
help(heat.cdendro)
help(heat.bbar)
help(heat.cbar)
help(heat.qbar)
help(heat.heatmap)
help(heat.clustermap)
```
Howto integrate bokehheat plots into [pweave](https://github.com/mpastell/Pweave)
documents?
```python
from pweave.bokeh import output_pweave, show
output_pweave()
o_clustermap, ls_xaxis, ls_yaxis = heat.clustermap(...)
show(o_clustermap)
```
## Tutorial
This tutorial guides you through a cluster heatmap generation process.
1. Load libraries needed for this tutorial:
```python
# library
from bokehheat import heat
from bokeh.io import show
from bokeh.palettes import Reds9, YlGn8, Colorblind8
import numpy as np
import pandas as pd
```
1. Prepare data:
```python
# generate test data
ls_sample = ['sampleA','sampleB','sampleC','sampleD','sampleE','sampleF','sampleG','sampleH']
ls_variable = ['geneA','geneB','geneC','geneD','geneE','geneF','geneG','geneH', 'geneI']
ar_z = np.random.rand(8,9)
df_matrix = pd.DataFrame(ar_z)
df_matrix.index = ls_sample
df_matrix.columns = ls_variable
df_matrix.index.name = 'y'
df_matrix.columns.name = 'x'
# generate some sample annotation
df_sample = pd.DataFrame({
'y': ls_sample,
'age_year': list(np.random.randint(0,101, 8)),
'sampletype': ['LumA','LumA','LumA','LumB','LumB','Basal','Basal','Basal'],
'sampletype_color': ['Cyan','Cyan','Cyan','Blue','Blue','Red','Red','Red'],
})
df_sample.index = df_sample.y
# generate some gene annotation
df_variable = pd.DataFrame({
'x': ls_variable,
'genereal': list(np.random.random(9) * 2 - 1),
'genetype': ['Lig','Lig','Lig','Lig','Lig','Lig','Rec','Rec','Rec'],
'genetype_color': ['Yellow','Yellow','Yellow','Yellow','Yellow','Yellow','Brown','Brown','Brown'],
})
df_variable.index = df_variable.x
```
1. Generate categorical and quantitative sample and gene
annotation tuple of tuples:
```python
t_ycat = (df_sample, ['sampletype'], ['sampletype_color'])
t_yquant = (df_sample, ['age_year'], [0], [128], [YlGn8])
t_xcat = (df_variable, ['genetype'], ['genetype_color'])
t_xquant = (df_variable, ['genereal'], [-1], [1], [Colorblind8])
tt_catquant = (t_ycat, t_yquant, t_xquant, t_xcat)
```
1. Generate the cluster heatmap:
```python
s_file = "theclustermap.html"
o_clustermap, ls_xaxis, ls_yaxis = heat.clustermap(
df_matrix = df_matrix,
ls_color_palette = Reds9,
r_low = 0,
r_high = 1,
s_z = "log2",
tt_axis_annot = tt_catquant,
b_ydendo = True,
b_xdendo = True,
#s_method='single',
#s_metric='euclidean',
#b_optimal_ordering=True,
#i_px = 80,
#i_height = 8,
#i_width = 8,
s_filename=s_file,
s_filetitel="the Clustermap",
)
```
1. Display the result:
```python
print(f"check out: {s_file}")
print(f"y axis is: {ls_yaxis}")
print(f"x axis is: {ls_xaxis}")
show(o_clustermap)
```
The resulting clustermap should look something like the example result
in the section above.
## Discussion
In bioinformatics a clustered heatmap is a common plot to present
gene expression data from many patient samples.
There are well established open source clustering software kits like
[Cluster and TreeView](http://bonsai.hgc.jp/%7Emdehoon/software/cluster/index.html)
for producing and investigating such heatmaps.
### Static cluster heaptmap implementations
There exist a wealth of
[R](https://cran.r-project.org/) and R/[bioconductor](https://www.bioconductor.org/)
packages with static cluster heatmaps functions (e.g. heatmap.2 from the gplots library),
each one with his own pros and cons.
In Python the static cluster heatmap landscape looks much more deserted.
There are some ancient [mathplotlib](https://matplotlib.org/) based implementations
like this [active state recipe](https://code.activestate.com/recipes/578175-hierarchical-clustering-heatmap-python/)
or the [heatmapcluster](https://github.com/WarrenWeckesser/heatmapcluster) library.
There is the [seaborn clustermap](https://seaborn.pydata.org/generated/seaborn.clustermap.html) implementation,
which looks good but might need hours of tweaking to get an agreeable plot with all the needed information out.
So, static heatmaps are not really a tool for exploring data.
### Interactive cluster heatmap implementations
There exist d3heatmap a R/d3.js based interactive cluster heatmap packages.
And heatmaply, a R/plotly based package.
Or on a more basic level R/plotly based cluster heatmaps can be written
with the ggdendro and ggplot2 library.
But I have not found a full fledged python based interactive cluster heatmap library.
Neither Python/[plottly](https://plot.ly/) nor Python/[bokeh](https://bokeh.pydata.org/en/latest/) based.
The only Python/bokeh based cluster heatmap implementation I found was this
[listing](https://russodanielp.github.io/plotting-a-heatmap-with-a-dendrogram-using-bokeh.html)
from Daniel Russo.
### Synopsis
All in all, all of this implementations were not really what I was looking for.
That is why I rolled my own.
Bokehheat is a Python/[bokeh](https://bokeh.pydata.org/en/latest/) based interactive cluster heatmap library.
The challenges this implementation tried to solve are,
the library should be:
+ easy to use with [pandas](https://pandas.pydata.org/) datafarmes.
+ interactive, this means the results should be hover and zoomable plots.
+ output should be in computer platform independent and easy accessible format,
like java script spiced up html file, which can be opened in any webbrowser.
+ possibility to add as many categorical and quantitative y and x annotation bars as wished.
+ possibility to cluster y and/or x axis.
+ snappy interactivity, even with big datasets with lot of samples and genes.
(It turns out bokehheat is ok with hundreds of samples and genes but noth with thousends.)
#### Future directions
An [altair](https://altair-viz.github.io/) based cluster heatmap implementation.
I think that this will be the future. Check out Jake VanderPlas talk
[Python Visualization Landscape](https://www.youtube.com/watch?v=FytuB8nFHPQ)
from the PyCon 2017 in Portland Oregon (USA).
## Contributions
+ Implementation: Elmar Bucher
+ Documentation: Jennifer Eng, Elmar Bucher
+ Helpfull discussion: Mark Dane, Daniel Derrick, Hongmei Zhang,
Annette Kolodize, Jim Korkola, Laura Heiser
## Abstract
Bokehheat provides a python3, bokeh based, interactive
categorical dendrogram and heatmap plotting implementation.
+ Minimal requirement: python 3.6
+ Dependencies: bokeh, pandas, scipy
+ Programmer: bue, jenny
+ Date origin: 2018-08
+ License: >= GPLv3
+ User manual: this README file
+ Source code: [https://gitlab.com/biotransistor/bokehheat](https://gitlab.com/biotransistor/bokehheat)
Available bokehheat plots are:
+ heat.cdendro: an interactive categorical dendrogram plot implementation.
+ heat.bbar: an interactive boolean bar plot implementation.
+ heat.cbar: an interactive categorical bar plot implementation.
+ heat.qbar: an interactive quantitative bar plot implementation.
+ heat.heatmap: an interactive heatmap implementation.
+ heat.clustermap: an interactive cluster heatmap implementation which combines
heat.cdendro, heat.bbar, heat.cbar, heat.qbar and heat.heatmap under the hood.
## Example Results
For the real interactive experience please clone or download this repository
and open theclustermap.html file with your favorite web browser
(we recommend [FireFox](https://www.mozilla.org/en-US/firefox/developer/)).
![heat.clustermap image](theclustermap.png)
**Figure:** This is a poor, static heat.clustermap html result screenshot.
## HowTo Guide
How to install bokehheat?
```bash
pip3 install bokehheat
```
How to load the bokehheat library?
```python
from bokehheat import heat
```
Howto get reference information about how to use each bokehheat module?
```python
from bokehheat import heat
help(heat.cdendro)
help(heat.bbar)
help(heat.cbar)
help(heat.qbar)
help(heat.heatmap)
help(heat.clustermap)
```
Howto integrate bokehheat plots into [pweave](https://github.com/mpastell/Pweave)
documents?
```python
from pweave.bokeh import output_pweave, show
output_pweave()
o_clustermap, ls_xaxis, ls_yaxis = heat.clustermap(...)
show(o_clustermap)
```
## Tutorial
This tutorial guides you through a cluster heatmap generation process.
1. Load libraries needed for this tutorial:
```python
# library
from bokehheat import heat
from bokeh.io import show
from bokeh.palettes import Reds9, YlGn8, Colorblind8
import numpy as np
import pandas as pd
```
1. Prepare data:
```python
# generate test data
ls_sample = ['sampleA','sampleB','sampleC','sampleD','sampleE','sampleF','sampleG','sampleH']
ls_variable = ['geneA','geneB','geneC','geneD','geneE','geneF','geneG','geneH', 'geneI']
ar_z = np.random.rand(8,9)
df_matrix = pd.DataFrame(ar_z)
df_matrix.index = ls_sample
df_matrix.columns = ls_variable
df_matrix.index.name = 'y'
df_matrix.columns.name = 'x'
# generate some sample annotation
df_sample = pd.DataFrame({
'y': ls_sample,
'age_year': list(np.random.randint(0,101, 8)),
'sampletype': ['LumA','LumA','LumA','LumB','LumB','Basal','Basal','Basal'],
'sampletype_color': ['Cyan','Cyan','Cyan','Blue','Blue','Red','Red','Red'],
})
df_sample.index = df_sample.y
# generate some gene annotation
df_variable = pd.DataFrame({
'x': ls_variable,
'genereal': list(np.random.random(9) * 2 - 1),
'genetype': ['Lig','Lig','Lig','Lig','Lig','Lig','Rec','Rec','Rec'],
'genetype_color': ['Yellow','Yellow','Yellow','Yellow','Yellow','Yellow','Brown','Brown','Brown'],
})
df_variable.index = df_variable.x
```
1. Generate categorical and quantitative sample and gene
annotation tuple of tuples:
```python
t_ycat = (df_sample, ['sampletype'], ['sampletype_color'])
t_yquant = (df_sample, ['age_year'], [0], [128], [YlGn8])
t_xcat = (df_variable, ['genetype'], ['genetype_color'])
t_xquant = (df_variable, ['genereal'], [-1], [1], [Colorblind8])
tt_catquant = (t_ycat, t_yquant, t_xquant, t_xcat)
```
1. Generate the cluster heatmap:
```python
s_file = "theclustermap.html"
o_clustermap, ls_xaxis, ls_yaxis = heat.clustermap(
df_matrix = df_matrix,
ls_color_palette = Reds9,
r_low = 0,
r_high = 1,
s_z = "log2",
tt_axis_annot = tt_catquant,
b_ydendo = True,
b_xdendo = True,
#s_method='single',
#s_metric='euclidean',
#b_optimal_ordering=True,
#i_px = 80,
#i_height = 8,
#i_width = 8,
s_filename=s_file,
s_filetitel="the Clustermap",
)
```
1. Display the result:
```python
print(f"check out: {s_file}")
print(f"y axis is: {ls_yaxis}")
print(f"x axis is: {ls_xaxis}")
show(o_clustermap)
```
The resulting clustermap should look something like the example result
in the section above.
## Discussion
In bioinformatics a clustered heatmap is a common plot to present
gene expression data from many patient samples.
There are well established open source clustering software kits like
[Cluster and TreeView](http://bonsai.hgc.jp/%7Emdehoon/software/cluster/index.html)
for producing and investigating such heatmaps.
### Static cluster heaptmap implementations
There exist a wealth of
[R](https://cran.r-project.org/) and R/[bioconductor](https://www.bioconductor.org/)
packages with static cluster heatmaps functions (e.g. heatmap.2 from the gplots library),
each one with his own pros and cons.
In Python the static cluster heatmap landscape looks much more deserted.
There are some ancient [mathplotlib](https://matplotlib.org/) based implementations
like this [active state recipe](https://code.activestate.com/recipes/578175-hierarchical-clustering-heatmap-python/)
or the [heatmapcluster](https://github.com/WarrenWeckesser/heatmapcluster) library.
There is the [seaborn clustermap](https://seaborn.pydata.org/generated/seaborn.clustermap.html) implementation,
which looks good but might need hours of tweaking to get an agreeable plot with all the needed information out.
So, static heatmaps are not really a tool for exploring data.
### Interactive cluster heatmap implementations
There exist d3heatmap a R/d3.js based interactive cluster heatmap packages.
And heatmaply, a R/plotly based package.
Or on a more basic level R/plotly based cluster heatmaps can be written
with the ggdendro and ggplot2 library.
But I have not found a full fledged python based interactive cluster heatmap library.
Neither Python/[plottly](https://plot.ly/) nor Python/[bokeh](https://bokeh.pydata.org/en/latest/) based.
The only Python/bokeh based cluster heatmap implementation I found was this
[listing](https://russodanielp.github.io/plotting-a-heatmap-with-a-dendrogram-using-bokeh.html)
from Daniel Russo.
### Synopsis
All in all, all of this implementations were not really what I was looking for.
That is why I rolled my own.
Bokehheat is a Python/[bokeh](https://bokeh.pydata.org/en/latest/) based interactive cluster heatmap library.
The challenges this implementation tried to solve are,
the library should be:
+ easy to use with [pandas](https://pandas.pydata.org/) datafarmes.
+ interactive, this means the results should be hover and zoomable plots.
+ output should be in computer platform independent and easy accessible format,
like java script spiced up html file, which can be opened in any webbrowser.
+ possibility to add as many categorical and quantitative y and x annotation bars as wished.
+ possibility to cluster y and/or x axis.
+ snappy interactivity, even with big datasets with lot of samples and genes.
(It turns out bokehheat is ok with hundreds of samples and genes but noth with thousends.)
#### Future directions
An [altair](https://altair-viz.github.io/) based cluster heatmap implementation.
I think that this will be the future. Check out Jake VanderPlas talk
[Python Visualization Landscape](https://www.youtube.com/watch?v=FytuB8nFHPQ)
from the PyCon 2017 in Portland Oregon (USA).
## Contributions
+ Implementation: Elmar Bucher
+ Documentation: Jennifer Eng, Elmar Bucher
+ Helpfull discussion: Mark Dane, Daniel Derrick, Hongmei Zhang,
Annette Kolodize, Jim Korkola, Laura Heiser
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bokehheat-0.0.3.tar.gz
(15.7 kB
view hashes)
Built Distribution
bokehheat-0.0.3-py3-none-any.whl
(11.6 kB
view hashes)
Close
Hashes for bokehheat-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 280e40d728299b055f5a73a5f41ac3859c2972c8e0608f74111069f83268b40d |
|
MD5 | 18b06fc4d108596f749898417e08a0a4 |
|
BLAKE2b-256 | a5c7f87c5297a9db1c214b7facbe084c76fe56e255925c843f9d89ee45b0dfda |