Visualization tool for collective categorical data
Project description
Palette diagram
A palette diagram is a visualization tool for a large number of categorical datasets, each comprising several categories.
Linear palette diagram
This is a stream plot, which is usually used for plotting time series data. Each categorical dataset is stacked vertically, and these stacked plots are aligned horizontally so that the neighboring datasets have similar vertical patterns.
Circular palette diagram
Each categorical dataset is represented along the radial coordinate: Each layer corresponds to a category and the thickness represents the normalized (or unnormalized) quantity within a category. The set of categorical data is aligned along the angular coordinate.
The central part shows indicates the color of the dominant category within a categorical dataset (i.e., the maximum a posteriori estimate).
Installation
palette-diagram
can be installed from PyPI:
pip install palette-diagram
Usage
This function generates a linear or circular palette diagram from a data table.
palette_diagram(df,
palette_type='circular',
n_neighbors=100,
n_epochs=100,
lr=0.0005,
norm=True,
export=True,
export_table=True,
group_names = None,
cmap_name = None,
remove_empty_groups=-1)
input
A data table df
in pandas DataFrame.
Each row represents a categorical data of a data element.
Here is how the DataFrame should look like:
category A | category B | category C | category C | |
---|---|---|---|---|
0 | 15 | 31 | 2 | 8 |
1 | 24 | nan | 45 | 112 |
2 | 9 | 11 | 83 | 0 |
... | ... | ... | ... | ... |
- The (i,k) element in the DataFrame represents a quantity for kth category in ith dataset.
- The DataFrame must have column names representing the category labels.
- The indices (the first column) will be used as the dataset ID in
data_ordering.csv
- The value of each cell has to be non-negative.
- A dataset is allowed to have missing cells (
nan
). The missing cells are filled with zeros.
output
A linear palette diagram or circular palette diagram.
Optional parameters
argument | type | description |
---|---|---|
palette_type | 'circular', 'linear' | 'circular': circular palette diagram 'linear': linear palette diagram |
n_neighbors | integer | A hyperparameter for the linear palette diagram (see below for details) |
n_epochs | integer | A hyperparameter for the circular palette diagram (see below for details) |
lr | float | A hyperparameter for the circular palette diagram (see below for details) |
norm | boolean | If True, each categorical dataset (row in the datamatrix) will be normalized to unity. The diagram has non-uniform layer thickness when norm=False . |
export | boolean | If True, the palette diagram will be saved as a PDF file in ./output/ . |
export_table | boolean | If True, the ordering of the datasets will be saved as a csv in ./output/ . |
group_names | list of category names | If provided, you can manually control the color assignment of each category. |
cmap_name | string | If a qualitative color palette in matplotlib is provided, the specified color map is will be used. |
remove_empty_groups | {0,1,2,-1} | 0: Remove all empty (zero-valued) rows (data) 1: Remove all empty (zero-valued) columns (categories) 2: Remove all empty (zero-valued) rows and columns -1: Ignored |
Order optimization
linear palette diagram
In the linear palette diagram, the order of the datasets are optimized through ISOMAP.
n_neighbors
is a hyperparameter used to construct a k-nearest neighbor graph in ISOMAP.
circular palette diagram
In the circular palette diagram, the stochastic gradient descent (SGD) method is used for the order optimization.
n_epochs
and lr
are hyperparameters for the SGD: n_epochs
is the number of epochs and lr
is the learning rate.
We strongly recommend users to try various values of these hyperparameters, as the appropriate value varies depending on the input data table.
References
- Please cite the following paper when you use the palette diagram:
Chihiro Noguchi and Tatsuro Kawamoto, "Palette diagram: A Python package for visualization of collective categorical data," in preparation, (2020).
- You can find more details about the (linear) palette diagram in the following article:
Chihiro Noguchi and Tatsuro Kawamoto, "Evaluating network partitions through visualization," arXiv:1906.00699, unpublished (2019).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for palette_diagram-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2162a97cfd78b90e45e05a46a58fff91dfb1d17c7e976b494616deb4666a2f26 |
|
MD5 | 9814408b814b6f378cce1b3d7934bf2a |
|
BLAKE2b-256 | 338765abbca6f80fa765974447dbbf9de954e918b1f0fccb7d4dedd51087c4f6 |