Visualize relationship between clusterings at different resolutions
Project description
clustree
Status
Functionality: Implemented
- Directed graph representing clustree. Nodes are parsed images and node information is encoded by a border surrounding the image.
- Loading: Data provided directly or through a path to parent directory. Images provided through a path to parent directory.
- Appearance: Edge and node color can correspond to one of: #samples that pass through edge/node, cluster resolution
K, or a fixed color. In the case of node color, a column name in the data and aggregate function can be used too. Use of column name and #samples creates a continuous colormap, whilst the other options result in discrete colors. - Layout: Reingold-Tilford algorithm used for node positioning. Not recommended for kk > 12 due to memory bottleneck in igraph dependency.
- Legend: demonstration of node / edge color.
Functionality: To Add
- Legend: demonstration of transparency of edges.
- Layout: Bespoke implementation of Reingold-Tilford algorithm to overcome dependency's memory bottleneck.
Usage
Installation
Install the package with pip:
pip install clustree
Quickstart
The powerhouse function of the library is clustree. Use
from clustree import clustree
to import the function. A detailed description of the parameters is provided below.
def clustree(
data: Union[Path, str],
prefix: str,
images: Union[Path, str],
output_path: Optional[Union[Path, str]] = None,
draw: bool = True,
node_color: str = "prefix",
node_color_aggr: Optional[Union[Callable, str]] = None,
node_cmap: Union[mpl.colors.Colormap, str] = "inferno",
edge_color: str = "samples",
edge_cmap: Union[mpl.colors.Colormap, str] = "viridis",
orientation: Literal["vertical", "horizontal"] = "vertical",
layout_reingold_tilford: bool = None,
min_cluster_number: Literal[0, 1] = 1,
border_size: float = 0.05,
figsize: tuple[float, float] = None,
arrows: bool = None,
node_size: float = 300,
node_size_edge: Optional[float] = None,
dpi: float = 500,
kk: Optional[int] = None,
) -> DiGraph:
"""
data: Path of csv or DataFrame object.prefix: String indicating columns containing clustering information.images: Path of directory that contains images.output_path: Absolute path to save clustree drawing at. If file extension is supplied, must be .png. If None, then output not written to file.draw: Whether to draw the clustree. Defaults to True. If False and output_path supplied, will be overridden.node_color: For continuous colormap, use 'samples' or the name of a metadata column to color nodes by. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set equal to value of prefix to color by resolution.node_color_aggr: If node_color is a column name then a function or string giving the name of a function to aggregate that column for samples in each cluster.node_cmap: If node_color is 'samples' or a column name then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).edge_color: For continuous colormap, use 'samples'. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set to 'samples'.edge_cmap: If edge_color is 'samples' then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).orientation: Orientation of clustree drawing. Defaults to 'vertical'.layout_reingold_tilford: Whether to use the Reingold-Tilford algorithm for node positioning. Defaults to True if (kk <= 12), False otherwise. Setting True not recommended if (kk > 12) due to memory bottleneck in igraph dependency.min_cluster_number: Cluster number can take values (0, ..., K-1) or (1, ..., K). If the former option is preferred, parameter should take value 0, and 1 otherwise. Defaults to None, in which case, minimum cluster number is found automatically.border_size: Border width as proportion of image width. Defaults to 0.05.figsize: Parsed to matplotlib to determine figure size. Defaults to (kk/2, kk/2), clipped to a minimum of (3,3) and maximum of (10,10).arrows: Whether to add arrows to graph edges. Removing arrows alleviates appearance issue caused by arrows overlapping nodes. Defaults to True.node_size: Size of nodes in clustree graph drawing. Parsed directly to networkx.draw_networkx_nodes. Default to 300.node_size_edge: Controls edge start and end point. Parsed directly to networkx.draw_networkx_edges.dpi: Controls resolution of output if saved to file.kk: Choose custom depth of clustree graph.
Glossary
- cluster resolution: Upper case
K. For example, at cluster resolutionK=2data is clustered into 2 distinct clusters. - cluster number: Lower case
k. For example, at cluster resolution 2 data is clustered into 2 distinct clustersk=1andk=2. - kk: highest value of
K(cluster resolution) shown in clustree. - cluster membership: The association between data points and cluster numbers for fixed cluster resolution. For example,
[1, 1, 2, 2, 2]would mean the first 2 data points belong to cluster number1and the following 3 data points belong to cluster number2.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
clustree-0.2.1.tar.gz
(24.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
clustree-0.2.1-py3-none-any.whl
(25.5 kB
view details)
File details
Details for the file clustree-0.2.1.tar.gz.
File metadata
- Download URL: clustree-0.2.1.tar.gz
- Upload date:
- Size: 24.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.9.12 Darwin/22.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d8d26279c65bc15ec11c3c1111ab6922eb32ed759aa3ac4d027328bfacfe346
|
|
| MD5 |
c87374c4b7d562cf41373310c2b55016
|
|
| BLAKE2b-256 |
40327da6e2c5ad94915b09847984cb1c441a8b1284e50fde257e23a743617c42
|
File details
Details for the file clustree-0.2.1-py3-none-any.whl.
File metadata
- Download URL: clustree-0.2.1-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.9.12 Darwin/22.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0845225a41a008e6f26535546c3090e03055daaea865691122b56b00b69c0e26
|
|
| MD5 |
363556ccc50cf33abb66beb6e7347a58
|
|
| BLAKE2b-256 |
4ef50f4c458c357ba2cfd245a5a9db58b12899a64a1afa236fa5b7e86cfc824a
|