Skip to main content

Visualize relationship between clusterings at different resolutions

Project description

clustree

Status

Functionality: Implemented

  • Directed graph representing clustree. Nodes are parsed images and node information is encoded by a border surrounding the image.
  • Loading: Data provided directly or through a path to parent directory. Images provided through a path to parent directory.
  • Appearance: Edge and node color can correspond to one of: #samples that pass through edge/node, cluster resolution K, or a fixed color. In the case of node color, a column name in the data and aggregate function can be used too. Use of column name and #samples creates a continuous colormap, whilst the other options result in discrete colors.
  • Layout: Reingold-Tilford algorithm used for node positioning. Not recommended for kk > 12 due to memory bottleneck in igraph dependency.
  • Legend: demonstration of node / edge color.

Functionality: To Add

  • Legend: demonstration of transparency of edges.
  • Layout: Bespoke implementation of Reingold-Tilford algorithm to overcome dependency's memory bottleneck.

Usage

Installation

Install the package with pip:

pip install clustree

Quickstart

The powerhouse function of the library is clustree. Use

from clustree import clustree

to import the function. A detailed description of the parameters is provided below.

def clustree(
    data: Union[Path, str],
    prefix: str,
    images: Union[Path, str],
    output_path: Optional[Union[Path, str]] = None,
    draw: bool = True,
    node_color: str = "prefix",
    node_color_aggr: Optional[Union[Callable, str]] = None,
    node_cmap: Union[mpl.colors.Colormap, str] = "inferno",
    edge_color: str = "samples",
    edge_cmap: Union[mpl.colors.Colormap, str] = "viridis",
    orientation: Literal["vertical", "horizontal"] = "vertical",
    layout_reingold_tilford: bool = None,
    min_cluster_number: Literal[0, 1] = 1,
    border_size: float = 0.05,
    figsize: tuple[float, float] = None,
    arrows: bool = None,
    node_size: float = 300,
    node_size_edge: Optional[float] = None,
    dpi: float = 500,
    kk: Optional[int] = None,
) -> DiGraph:
    """

  • data : Path of csv or DataFrame object.
  • prefix : String indicating columns containing clustering information.
  • images : Path of directory that contains images.
  • output_path : Absolute path to save clustree drawing at. If file extension is supplied, must be .png. If None, then output not written to file.
  • draw : Whether to draw the clustree. Defaults to True. If False and output_path supplied, will be overridden.
  • node_color : For continuous colormap, use 'samples' or the name of a metadata column to color nodes by. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set equal to value of prefix to color by resolution.
  • node_color_aggr : If node_color is a column name then a function or string giving the name of a function to aggregate that column for samples in each cluster.
  • node_cmap : If node_color is 'samples' or a column name then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).
  • edge_color : For continuous colormap, use 'samples'. For discrete colors, use 'prefix' to color by resolution or specify a fixed color (see Specifying colors in Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colors.html). If None, default set to 'samples'.
  • edge_cmap : If edge_color is 'samples' then a colourmap to use (see Colormap Matplotlib tutorial here: https://matplotlib.org/stable/tutorials/colors/colormaps.html).
  • orientation : Orientation of clustree drawing. Defaults to 'vertical'.
  • layout_reingold_tilford : Whether to use the Reingold-Tilford algorithm for node positioning. Defaults to True if (kk <= 12), False otherwise. Setting True not recommended if (kk > 12) due to memory bottleneck in igraph dependency.
  • min_cluster_number : Cluster number can take values (0, ..., K-1) or (1, ..., K). If the former option is preferred, parameter should take value 0, and 1 otherwise. Defaults to None, in which case, minimum cluster number is found automatically.
  • border_size : Border width as proportion of image width. Defaults to 0.05.
  • figsize : Parsed to matplotlib to determine figure size. Defaults to (kk/2, kk/2), clipped to a minimum of (3,3) and maximum of (10,10).
  • arrows : Whether to add arrows to graph edges. Removing arrows alleviates appearance issue caused by arrows overlapping nodes. Defaults to True.
  • node_size : Size of nodes in clustree graph drawing. Parsed directly to networkx.draw_networkx_nodes. Default to 300.
  • node_size_edge: Controls edge start and end point. Parsed directly to networkx.draw_networkx_edges.
  • dpi : Controls resolution of output if saved to file.
  • kk : Choose custom depth of clustree graph.

Glossary

  • cluster resolution: Upper case K. For example, at cluster resolution K=2 data is clustered into 2 distinct clusters.
  • cluster number: Lower case k. For example, at cluster resolution 2 data is clustered into 2 distinct clusters k=1 and k=2.
  • kk: highest value of K (cluster resolution) shown in clustree.
  • cluster membership: The association between data points and cluster numbers for fixed cluster resolution. For example, [1, 1, 2, 2, 2] would mean the first 2 data points belong to cluster number 1 and the following 3 data points belong to cluster number 2.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustree-0.2.1.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clustree-0.2.1-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file clustree-0.2.1.tar.gz.

File metadata

  • Download URL: clustree-0.2.1.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.12 Darwin/22.4.0

File hashes

Hashes for clustree-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4d8d26279c65bc15ec11c3c1111ab6922eb32ed759aa3ac4d027328bfacfe346
MD5 c87374c4b7d562cf41373310c2b55016
BLAKE2b-256 40327da6e2c5ad94915b09847984cb1c441a8b1284e50fde257e23a743617c42

See more details on using hashes here.

File details

Details for the file clustree-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: clustree-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.12 Darwin/22.4.0

File hashes

Hashes for clustree-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0845225a41a008e6f26535546c3090e03055daaea865691122b56b00b69c0e26
MD5 363556ccc50cf33abb66beb6e7347a58
BLAKE2b-256 4ef50f4c458c357ba2cfd245a5a9db58b12899a64a1afa236fa5b7e86cfc824a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page