A python3 bokeh based categorical dendrogram and heatmap plotting library.
Project description
# BokehHeat
## Abstract
Bokehheat provides a python3, bokeh based, interactive
categorical dendrogram and heatmap plotting implementation.
+ Minimal requirement: python 3.6
+ Dependencies: bokeh, pandas, scipy
+ Programmer: bue, jenny
+ Date origin: 2018-08
+ License: >= GPLv3
+ User manual: this README file
+ Result example: [clustermap](theclustermap.html) plot
+ Source code: [https://gitlab.com/biotransistor/bokehheat](https://gitlab.com/biotransistor/bokehheat)
Available bokehheat plots are:
+ heat.cdendro: a interactive categorical dendrogram plot implementation.
+ heat.cabar: an interactive categorical bar plot implementation.
+ heat.qabar: an interactive quantitative bar plot implementation.
+ heat.heatmap: an interactive heatmap implementation.
+ heat.clustermap: an interactive cluster heatmap implementation which combines
heat.cdendro, heat.cabar, heat.qabar and heat.heatmap under the hood.
## HowTo Guide
How to install bokehheat?
```
pip install bokehheat
```
How to load the bokehheat library?
```
from bokehheat import heat
```
Howto get reference information about how to use each bokehheat module?
```
from bokehheat import heat
help(heat.cdendro)
help(heat.cabar)
help(heat.qabar)
help(heat.heatmap)
help(heat.clustermap)
```
## Tutorial
This tutorial guides you through a cluster heatmap generation process.
1. Load libraries needed for this tutorial:
```
# library
from bokehheat import heat
from bokeh.palettes import Reds9, YlGn8, Colorblind8
import numpy as np
import pandas as pd
```
1. Prepare data:
```
# generate test data
ls_sample = ['sampleA','sampleB','sampleC','sampleD','sampleE','sampleF','sampleG','sampleH']
ls_variable = ['geneA','geneB','geneC','geneD','geneE','geneF','geneG','geneH', 'geneI']
ar_z = np.random.rand(8,9)
df_matrix = pd.DataFrame(ar_z)
df_matrix.index = ls_sample
df_matrix.columns = ls_variable
df_matrix.index.name = 'y'
df_matrix.columns.name = 'x'
# generate some sample annotation
df_sample = pd.DataFrame({
'y': ls_sample,
'age_year': list(np.random.randint(0,101, 8)),
'sampletype': ['LumA','LumA','LumA','LumB','LumB','Basal','Basal','Basal'],
'sampletype_color': ['Cyan','Cyan','Cyan','Blue','Blue','Red','Red','Red'],
})
df_sample.index = df_sample.y
# generate some gene annotation
df_variable = pd.DataFrame({
'x': ls_variable,
'genereal': list(np.random.random(9) * 2 - 1),
'genetype': ['Lig','Lig','Lig','Lig','Lig','Lig','Rec','Rec','Rec'],
'genetype_color': ['Yellow','Yellow','Yellow','Yellow','Yellow','Yellow','Brown','Brown','Brown'],
})
df_variable.index = df_variable.x
```
1. Generate categorical and quantitative sample and gene
annotation tuple of tuples:
```
t_ycat = (df_sample, ['sampletype'], ['sampletype_color'])
t_yquant = (df_sample, ['age_year'], [0], [128], [YlGn8])
t_xcat = (df_variable, ['genetype'], ['genetype_color'])
t_xquant = (df_variable, ['genereal'], [-1], [1], [Colorblind8])
tt_catquant = (t_ycat, t_yquant, t_xquant, t_xcat)
```
1. Generate the cluster heatmap:
```
s_file = "theclustermap.html"
o_clustermap, ls_xaxis, ls_yaxis = clustermap(
df_matrix = df_matrix,
ls_color_palette = Reds9,
r_low = 0,
r_high = 1,
s_z = "log2",
tt_axis_annot = tt_catquant,
b_ydendo = True,
b_xdendo = True,
#s_method='single',
#s_metric='euclidean',
#b_optimal_ordering=True,
#i_px = 80,
s_filename=s_file,
s_filetitel="the Clustermap",
)
```
1. Display the result:
```
print(f"check out: {s_file}")
print(f"y axis is: {ls_yaxis}")
print(f"x axis is: {ls_xaxis}")
show(o_clustermap)
```
The resulting clustermap should look something like [this](theclustermap.html).
<!--
bue 2018-08-29: would be good to have a png from the result in the readme markdown document


-->
## Discussion
In bioinformatics a clustered heatmap is a common plot to present gene expression data
form many patient samples.
There are well established open source clusteing software kits like
[Cluster and TreeView](http://bonsai.hgc.jp/%7Emdehoon/software/cluster/index.html)
for producing and investigating such heatmaps.
There exist a wealth of
[R](https://cran.r-project.org/) and R/[bioconductor](https://www.bioconductor.org/)
packages who do this (e.g. heatmap.2), each one with his own pros and cons.
In Python the cluster heatmap landscape looks much more deserted.
There are some ancient [mathplotlib](https://matplotlib.org/) based implementations
like this [active state recipe](https://code.activestate.com/recipes/578175-hierarchical-clustering-heatmap-python/)
or the [heatmapcluster](https://github.com/WarrenWeckesser/heatmapcluster) library.
There is the [seaborn clustermap](https://seaborn.pydata.org/generated/seaborn.clustermap.html) implementation,
which looks good but might need hours of tweaking to get a static plot with all the needed information out.
So it is not really a tool for exploring data.
There are R based interactive heatmaps like d3heatmap and
R/plotly based implementations like ggplot2 and heatmaply.
But I have not found any python based interactive clustermap library.
Neither Python/[plottly](https://plot.ly/) nor Python/[bokeh](https://bokeh.pydata.org/en/latest/) based.
The only Python/bokeh based implementation I found was this
[listing](https://russodanielp.github.io/plotting-a-heatmap-with-a-dendrogram-using-bokeh.html)
from Daniel Russo.
All in all, all of this implementations were not really what I was looking for.
That is why I rolled my own.
Bokehheat is a Python/[bokeh](https://bokeh.pydata.org/en/latest/) based interactive cluster heatmap library.
The challenges this implementation tried to solve are,
the library should be:
+ easy to use with [pandas](https://pandas.pydata.org/) datafarmes.
+ interactive, this means the results should be hover and zoomable plots.
+ output should be in computer platform independent and easy accessible format like java script spiced up html file,
which can be opened in any webbrowser.
+ possibility to add as many categorical and quantitative annotation bars on y and x axis as wished.
+ possibility to cluster y and/or x axis.
+ snappy interactivity, even with big datasets with thousands of samples and genes.
#### Future directions
An [altair](https://altair-viz.github.io/) based cluster heatmap implementation.
I think that this will be the future. Check out Jake VanderPlas talk
[Python Visualization Landscape](https://www.youtube.com/watch?v=FytuB8nFHPQ)
from the PyCon 2017 in Portland Oregon (USA).
## Contributions
+ Implementation: Elmar Bucher
+ Documentation: Jennifer Eng, Elmar Bucher
+ Helpfull discussion: Mark Dane, Daniel Derrick, Hongmei Zhang,
Annette Kolodize, Jim Korkola, Laura Heiser
## Abstract
Bokehheat provides a python3, bokeh based, interactive
categorical dendrogram and heatmap plotting implementation.
+ Minimal requirement: python 3.6
+ Dependencies: bokeh, pandas, scipy
+ Programmer: bue, jenny
+ Date origin: 2018-08
+ License: >= GPLv3
+ User manual: this README file
+ Result example: [clustermap](theclustermap.html) plot
+ Source code: [https://gitlab.com/biotransistor/bokehheat](https://gitlab.com/biotransistor/bokehheat)
Available bokehheat plots are:
+ heat.cdendro: a interactive categorical dendrogram plot implementation.
+ heat.cabar: an interactive categorical bar plot implementation.
+ heat.qabar: an interactive quantitative bar plot implementation.
+ heat.heatmap: an interactive heatmap implementation.
+ heat.clustermap: an interactive cluster heatmap implementation which combines
heat.cdendro, heat.cabar, heat.qabar and heat.heatmap under the hood.
## HowTo Guide
How to install bokehheat?
```
pip install bokehheat
```
How to load the bokehheat library?
```
from bokehheat import heat
```
Howto get reference information about how to use each bokehheat module?
```
from bokehheat import heat
help(heat.cdendro)
help(heat.cabar)
help(heat.qabar)
help(heat.heatmap)
help(heat.clustermap)
```
## Tutorial
This tutorial guides you through a cluster heatmap generation process.
1. Load libraries needed for this tutorial:
```
# library
from bokehheat import heat
from bokeh.palettes import Reds9, YlGn8, Colorblind8
import numpy as np
import pandas as pd
```
1. Prepare data:
```
# generate test data
ls_sample = ['sampleA','sampleB','sampleC','sampleD','sampleE','sampleF','sampleG','sampleH']
ls_variable = ['geneA','geneB','geneC','geneD','geneE','geneF','geneG','geneH', 'geneI']
ar_z = np.random.rand(8,9)
df_matrix = pd.DataFrame(ar_z)
df_matrix.index = ls_sample
df_matrix.columns = ls_variable
df_matrix.index.name = 'y'
df_matrix.columns.name = 'x'
# generate some sample annotation
df_sample = pd.DataFrame({
'y': ls_sample,
'age_year': list(np.random.randint(0,101, 8)),
'sampletype': ['LumA','LumA','LumA','LumB','LumB','Basal','Basal','Basal'],
'sampletype_color': ['Cyan','Cyan','Cyan','Blue','Blue','Red','Red','Red'],
})
df_sample.index = df_sample.y
# generate some gene annotation
df_variable = pd.DataFrame({
'x': ls_variable,
'genereal': list(np.random.random(9) * 2 - 1),
'genetype': ['Lig','Lig','Lig','Lig','Lig','Lig','Rec','Rec','Rec'],
'genetype_color': ['Yellow','Yellow','Yellow','Yellow','Yellow','Yellow','Brown','Brown','Brown'],
})
df_variable.index = df_variable.x
```
1. Generate categorical and quantitative sample and gene
annotation tuple of tuples:
```
t_ycat = (df_sample, ['sampletype'], ['sampletype_color'])
t_yquant = (df_sample, ['age_year'], [0], [128], [YlGn8])
t_xcat = (df_variable, ['genetype'], ['genetype_color'])
t_xquant = (df_variable, ['genereal'], [-1], [1], [Colorblind8])
tt_catquant = (t_ycat, t_yquant, t_xquant, t_xcat)
```
1. Generate the cluster heatmap:
```
s_file = "theclustermap.html"
o_clustermap, ls_xaxis, ls_yaxis = clustermap(
df_matrix = df_matrix,
ls_color_palette = Reds9,
r_low = 0,
r_high = 1,
s_z = "log2",
tt_axis_annot = tt_catquant,
b_ydendo = True,
b_xdendo = True,
#s_method='single',
#s_metric='euclidean',
#b_optimal_ordering=True,
#i_px = 80,
s_filename=s_file,
s_filetitel="the Clustermap",
)
```
1. Display the result:
```
print(f"check out: {s_file}")
print(f"y axis is: {ls_yaxis}")
print(f"x axis is: {ls_xaxis}")
show(o_clustermap)
```
The resulting clustermap should look something like [this](theclustermap.html).
<!--
bue 2018-08-29: would be good to have a png from the result in the readme markdown document


-->
## Discussion
In bioinformatics a clustered heatmap is a common plot to present gene expression data
form many patient samples.
There are well established open source clusteing software kits like
[Cluster and TreeView](http://bonsai.hgc.jp/%7Emdehoon/software/cluster/index.html)
for producing and investigating such heatmaps.
There exist a wealth of
[R](https://cran.r-project.org/) and R/[bioconductor](https://www.bioconductor.org/)
packages who do this (e.g. heatmap.2), each one with his own pros and cons.
In Python the cluster heatmap landscape looks much more deserted.
There are some ancient [mathplotlib](https://matplotlib.org/) based implementations
like this [active state recipe](https://code.activestate.com/recipes/578175-hierarchical-clustering-heatmap-python/)
or the [heatmapcluster](https://github.com/WarrenWeckesser/heatmapcluster) library.
There is the [seaborn clustermap](https://seaborn.pydata.org/generated/seaborn.clustermap.html) implementation,
which looks good but might need hours of tweaking to get a static plot with all the needed information out.
So it is not really a tool for exploring data.
There are R based interactive heatmaps like d3heatmap and
R/plotly based implementations like ggplot2 and heatmaply.
But I have not found any python based interactive clustermap library.
Neither Python/[plottly](https://plot.ly/) nor Python/[bokeh](https://bokeh.pydata.org/en/latest/) based.
The only Python/bokeh based implementation I found was this
[listing](https://russodanielp.github.io/plotting-a-heatmap-with-a-dendrogram-using-bokeh.html)
from Daniel Russo.
All in all, all of this implementations were not really what I was looking for.
That is why I rolled my own.
Bokehheat is a Python/[bokeh](https://bokeh.pydata.org/en/latest/) based interactive cluster heatmap library.
The challenges this implementation tried to solve are,
the library should be:
+ easy to use with [pandas](https://pandas.pydata.org/) datafarmes.
+ interactive, this means the results should be hover and zoomable plots.
+ output should be in computer platform independent and easy accessible format like java script spiced up html file,
which can be opened in any webbrowser.
+ possibility to add as many categorical and quantitative annotation bars on y and x axis as wished.
+ possibility to cluster y and/or x axis.
+ snappy interactivity, even with big datasets with thousands of samples and genes.
#### Future directions
An [altair](https://altair-viz.github.io/) based cluster heatmap implementation.
I think that this will be the future. Check out Jake VanderPlas talk
[Python Visualization Landscape](https://www.youtube.com/watch?v=FytuB8nFHPQ)
from the PyCon 2017 in Portland Oregon (USA).
## Contributions
+ Implementation: Elmar Bucher
+ Documentation: Jennifer Eng, Elmar Bucher
+ Helpfull discussion: Mark Dane, Daniel Derrick, Hongmei Zhang,
Annette Kolodize, Jim Korkola, Laura Heiser
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bokehheat-0.0.0.tar.gz
(13.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
bokehheat-0.0.0-py3-none-any.whl
(10.4 kB
view details)
File details
Details for the file bokehheat-0.0.0.tar.gz.
File metadata
- Download URL: bokehheat-0.0.0.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.19.7 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3de1c19919357ff9bae58c76e56f8646fed08c2fbc4022f7c9d72b325ffca7ab
|
|
| MD5 |
b585817eefb9dcd024b6a8af085e4227
|
|
| BLAKE2b-256 |
acf986cdff59d3cb3f27e63d11a44e362079a10a1ebb23c29215e93364e1b24b
|
File details
Details for the file bokehheat-0.0.0-py3-none-any.whl.
File metadata
- Download URL: bokehheat-0.0.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.19.7 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c85524780be2802206d9e059421b21e82822914168273b8338ec2fbdba61a5ec
|
|
| MD5 |
b1ad4ab50569decab5d5272332916352
|
|
| BLAKE2b-256 |
dabe7e0cd720e1489123ae2d459a78b901a9fbbe7fa4e1571c3190219b9e529d
|