Skip to main content

Visualization tool for collective categorical data

Project description

Palette diagram

A palette diagram is a visualization tool for a large number of categorical datasets, each comprising several categories.

schmatic

Linear palette diagram

linear

This is a stream plot, which is usually used for plotting time series data. Each categorical dataset is stacked vertically, and these stacked plots are aligned horizontally so that the neighboring datasets have similar vertical patterns.

Circular palette diagram

circula

Each categorical dataset is represented along the radial coordinate: Each layer corresponds to a category and the thickness represents the normalized (or unnormalized) quantity within a category. The set of categorical data is aligned along the angular coordinate.

The central part shows indicates the color of the dominant category within a categorical dataset (i.e., the maximum a posteriori estimate).

Installation

palette-diagram can be installed from PyPI:

pip install palette-diagram

Usage

This function generates a linear or circular palette diagram from a data table.

palette_diagram(df,
	palette_type='circular',
	n_neighbors=100,
	n_epochs=100,
	lr=0.0005,
	norm=True,
	export=True,
	export_table=True,
	group_names = None,
	cmap_name = None,
	remove_empty_groups=-1)

input

A data table df in pandas DataFrame. Each row represents a categorical data of a data element. Here is how the DataFrame should look like:

category A category B category C category C
0 15 31 2 8
1 24 nan 45 112
2 9 11 83 0
... ... ... ... ...
  • The (i,k) element in the DataFrame represents a quantity for kth category in ith dataset.
  • The DataFrame must have column names representing the category labels.
  • The indices (the first column) will be used as the dataset ID in data_ordering.csv
  • The value of each cell has to be non-negative.
  • A dataset is allowed to have missing cells (nan). The missing cells are filled with zeros.

output

A linear palette diagram or circular palette diagram.

Optional parameters

argument type description
palette_type 'circular', 'linear' 'circular': circular palette diagram
'linear': linear palette diagram
n_neighbors integer A hyperparameter for the linear palette diagram (see below for details)
n_epochs integer A hyperparameter for the circular palette diagram (see below for details)
lr float A hyperparameter for the circular palette diagram (see below for details)
norm boolean If True, each categorical dataset (row in the datamatrix) will be normalized to unity. The diagram has non-uniform layer thickness when norm=False.
export boolean If True, the palette diagram will be saved as a PDF file in ./output/.
export_table boolean If True, the ordering of the datasets will be saved as a csv in ./output/.
group_names list of category names If provided, you can manually control the color assignment of each category.
cmap_name string If a qualitative color palette in matplotlib is provided, the specified color map is will be used.
remove_empty_groups {0,1,2,-1} 0: Remove all empty (zero-valued) rows (data)
1: Remove all empty (zero-valued) columns (categories)
2: Remove all empty (zero-valued) rows and columns
-1: Ignored

Order optimization

linear palette diagram

In the linear palette diagram, the order of the datasets are optimized through ISOMAP. n_neighbors is a hyperparameter used to construct a k-nearest neighbor graph in ISOMAP.

circular palette diagram

In the circular palette diagram, the stochastic gradient descent (SGD) method is used for the order optimization. n_epochs and lr are hyperparameters for the SGD: n_epochs is the number of epochs and lr is the learning rate.

We strongly recommend users to try various values of these hyperparameters, as the appropriate value varies depending on the input data table.

References

  • Please cite the following paper when you use the palette diagram:

Chihiro Noguchi and Tatsuro Kawamoto, "Palette diagram: A Python package for visualization of collective categorical data," in preparation, (2020).

  • You can find more details about the (linear) palette diagram in the following article:

Chihiro Noguchi and Tatsuro Kawamoto, "Evaluating network partitions through visualization," arXiv:1906.00699, unpublished (2019).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

palette_diagram-1.0.1.tar.gz (6.5 kB view hashes)

Uploaded Source

Built Distribution

palette_diagram-1.0.1-py3-none-any.whl (6.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page