A Python Toolkit for Evaluating the Reliability of Dimensionality Reduction Embeddings
Project description
ZADU
A-to-Z python library for evalUating Dimensionality reduction
ZADU is a Python library that provides distortion measures for evaluating and analyzing dimensionality reduction (DR) embeddings. The library supports a diverse set of local, cluster-level, and global distortion measures, allowing users to assess DR techniques from various structural perspectives. By offering an optimized execution and pointwise local distortions, ZADU enables efficient and in-depth analysis of DR embeddings.
Installation
You can install ZADU via pip:
pip install zadu
Use ZADU with Context7
This repository is prepared for Context7 indexing so AI coding assistants can fetch up-to-date ZADU docs and examples.
For Users (Cursor / Claude Code)
Install Context7 MCP (recommended: remote HTTP with API key) to your coding assistant.
Refer to Context7 Installation for detailed instructions.
Add the following rule (so users do not have to type use context7 each time) to agent instructions like CLAUDE.md or AGENTS.md:
Always use Context7 MCP when I need library/API documentation, code generation, setup, or configuration steps. When users want to evaluate dimensionality reduction embeddings or you think that the evalaution is necessary, use Context7 to access ZADU documentation and examples. You can access via "/hj-n/zadu" path.
Supported Distortion Measures
ZADU currently supports a total of 20 distortion measures, including:
- 7 local measures
- 5 cluster-level measures
- 8 global measures
For a complete list of supported measures, refer to measures. The library initially provided 17 measures when it was first introduced by our academic paper. We added three more measures (label trustworthiness & continuity, non-metric stress, and scale-normalized stress) to the library.
How To Use ZADU
ZADU provides two different interfaces for executing distortion measures. You can either use the main class that wraps the measures, or directly access and invoke the functions that define each distortion measure.
Using the Main Class
Use the main class of ZADU to compute distortion measures. This approach benefits from the optimization, providing faster performance when executing multiple measures.
from zadu import zadu
hd, ld = load_datasets()
spec = [{
"id" : "tnc",
"params": { "k": 20 },
}, {
"id" : "snc",
"params": { "k": 30, "clustering_strategy": "dbscan" }
}]
scores = zadu.ZADU(spec, hd).measure(ld)
print("T&C:", scores[0])
print("S&C:", scores[1])
hd represents high-dimensional data, ld represents low-dimensional data
You can also use a typed helper for better IDE autocomplete:
from zadu import ZADU, MEASURE, make_spec
spec = [
make_spec(MEASURE.TNC, k=20),
make_spec(MEASURE.SNC, k=30, clustering_strategy="dbscan"),
]
scores = ZADU(spec, hd).measure(ld)
ZADU Class
The ZADU class provides the main interface for the Zadu library, allowing users to evaluate and analyze dimensionality reduction (DR) embeddings effectively and reliably.
Class Constructor
The ZADU class constructor has the following signature:
class ZADU(spec: List[Dict[str, Union[str, dict]]], hd: np.ndarray, return_local: bool = False)
Parameters:
spec
A list of dictionaries that define the distortion measures to execute and their hyperparameters. Each dictionary must contain the following keys:
-
"id": The identifier of the distortion measure, such as"tnc"or"snc". -
"params": A dictionary containing hyperparameters specific to the chosen distortion measure.
List of ID/Parameters for Each Function
Warning: While using dsc, ivm, c_evm, nh, and ca_tnc, please be aware that these measures assume that class labels are well-separated in the original high-dimensional space. If the class labels are not well-separated, the measures may produce unreliable results. Use the measure only if you are confident that the class labels are well-separated. Please refer to the related academic paper for more detail.
Local Measures
Measure ID Parameters Range Optimum Trustworthiness & Continuity tnc k=20[0.5, 1] 1 Mean Relative Rank Errors mrre k=20[0, 1] 1 Local Continuity Meta-Criteria lcmc k=20[0, 1] 1 Neighborhood hit nh k=20[0, 1] 1 Neighbor Dissimilarity nd k=20R+ 0 Class-Aware Trustworthiness & Continuity ca_tnc k=20[0.5, 1] 1 Procrustes Measure proc k=20R+ 0 Cluster-level Measures
Measure ID Parameters Range Optimum Steadiness & Cohesiveness snc iteration=150, walk_num_ratio=0.3, alpha=0.1, k=50, clustering_strategy="dbscan"[0, 1] 1 Distance Consistency dsc [0.5, 1] 0.5 Internal Validation Measures ivm measure="silhouette"Depends on IVM Depends on IVM Clustering + External Clustering Validation Measures c_evm measure="arand", clustering="kmeans", clustering_args=NoneDepends on EVM Depends on EVM Label Trustworthiness & Continuity[^1] l_tnc cvm="dsc"[0, 1] 1
[^1]: The current implementation does not apply the rescaling step from the original paper on the cvm score when cvm='dsc'. The original transformation was intended to map the DSC score into the [0,1] range, but it is not needed here.
Global Measures
Measure ID Parameters Range Optimum Stress stress R+ 0 Non-metric stress nm_stress R+ 0 Scale-normalized stress sn_stress R+ 0 Kullback-Leibler Divergence kl_div sigma=0.1R+ 0 Distance-to-Measure dtm sigma=0.1R+ 0 Topographic Product topo k=20R 0 Pearson’s correlation coefficient pr [-1, 1] 1 Spearman’s rank correlation coefficient srho [-1, 1] 1
hd
A high-dimensional dataset (numpy array) to register and reuse during the evaluation process.
return_local
A boolean flag that, when set to True, enables the computation of local pointwise distortions for each data point. The default value is False.
Directly Accessing Functions
You can also directly access and invoke the functions defining each distortion measure for greater flexibility.
from zadu.measures import *
mrre = mean_relative_rank_error.measure(hd, ld, k=20)
pr = pearson_r.measure(hd, ld)
nh = neighborhood_hit.measure(ld, label, k=20)
Advanced Features
Optimizing the Execution
ZADU automatically optimizes the execution of multiple distortion measures. It minimizes the computational overhead associated with preprocessing stages such as pairwise distance calculation, pointwise distance ranking determination, and k-nearest neighbor identification.
Computing Pointwise Local Distortions
Users can obtain local pointwise distortions by setting the return_local flag. If a specified distortion measure produces local pointwise distortion as intermediate results, it returns a list of pointwise distortions when the flag is raised.
from zadu import zadu
spec = [{
"id" : "dtm",
"params": {}
}, {
"id" : "mrre",
"params": { "k": 30 }
}]
zadu_obj = zadu.ZADU(spec, hd, return_local=True)
global_, local_ = zadu_obj.measure(ld)
print("MRRE local distortions:", local_[1])
Visualizing Local Distortions
With the pointwise local distortions obtained from ZADU, users can visualize the distortions using various distortion visualizations. We provide ZADUVis, a python library that enables the rendering of two disotortion visualizations: CheckViz and the Reliability Map.
from zadu import zadu
from zaduvis import zaduvis
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.datasets import fetch_openml
hd = fetch_openml('mnist_784', version=1, cache=True).data.to_numpy()[::7]
ld = TSNE().fit_transform(hd)
## Computing local pointwise distortions
spec = [{
"id": "tnc",
"params": {"k": 25}
},{
"id": "snc",
"params": {"k": 50}
}]
zadu_obj = zadu.ZADU(spec, hd, return_local=True)
scores, local_list = zadu_obj.measure(ld)
tnc_local = local_list[0]
snc_local = local_list[1]
local_trustworthiness = tnc_local["local_trustworthiness"]
local_continuity = tnc_local["local_continuity"]
local_steadiness = snc_local["local_steadiness"]
local_cohesiveness = snc_local["local_cohesiveness"]
fig, ax = plt.subplots(1, 4, figsize=(50, 12.5))
zaduvis.checkviz(ld, local_trustworthiness, local_continuity, ax=ax[0])
zaduvis.reliability_map(ld, local_trustworthiness, local_continuity, k=10, ax=ax[1])
zaduvis.checkviz(ld, local_steadiness, local_cohesiveness, ax=ax[2])
zaduvis.reliability_map(ld, local_steadiness, local_cohesiveness, k=10, ax=ax[3])
The above code snippet demonstrates how to visualize local pointwise distortions using CheckViz and Reliability Map plots, where the results are shown below.
Documentation
For more information about the available distortion measures, their use cases, and examples, please refer to our paper (IEEE VIS 2023 Short).
Citation
Hyeon Jeon, Aeri Cho, Jinhwa Jang, Soohyun Lee, Jake Hyun, Hyung-Kwon Ko, Jaemin Jo, and Jinwook Seo. Zadu: A python library for evaluating the reliability of dimensionality reduction embeddings. In 2023 IEEE Visualization and Visual Analytics (VIS), 2023. to appear.
@INPROCEEDINGS{jeon23vis,
author={Jeon, Hyeon and Cho, Aeri and Jang, Jinhwa and Lee, Soohyun and Hyun, Jake and Ko, Hyung-Kwon and Jo, Jaemin and Seo, Jinwook},
booktitle={2023 IEEE Visualization and Visual Analytics (VIS)},
title={ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings},
year={2023},
volume={},
number={},
pages={196-200},
keywords={Dimensionality reduction;Visual analytics;Design methodology;Distortion;Libraries;Time measurement;Distortion measurement;Human-centered computing;Visualization;Visualization design and evaluation methods},
doi={10.1109/VIS54172.2023.00048}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zadu-0.4.1.tar.gz.
File metadata
- Download URL: zadu-0.4.1.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda44aa91950c8866b522430a08b5fc1768be8e16b5fc5a7ab09ef0794138d8b
|
|
| MD5 |
553c9148b221cf8c4fb76ede03c88656
|
|
| BLAKE2b-256 |
117edcb692a2d09c42ee9c64d1c713fad4ac943ba38e05783d215c6893e0a7d2
|
Provenance
The following attestation bundles were made for zadu-0.4.1.tar.gz:
Publisher:
publish-pypi.yml on hj-n/zadu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zadu-0.4.1.tar.gz -
Subject digest:
fda44aa91950c8866b522430a08b5fc1768be8e16b5fc5a7ab09ef0794138d8b - Sigstore transparency entry: 926062134
- Sigstore integration time:
-
Permalink:
hj-n/zadu@4d3929bcb9ffb39039e873952dc353d27baf9cd4 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/hj-n
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4d3929bcb9ffb39039e873952dc353d27baf9cd4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file zadu-0.4.1-py3-none-any.whl.
File metadata
- Download URL: zadu-0.4.1-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de31fd5693bbaf40a57fea53993f1dc530d53a08ee14fd4252404f4a1062d731
|
|
| MD5 |
881ee5c3498ae77ad37f64fb63a0b5f5
|
|
| BLAKE2b-256 |
f4dce6a6073dc783d193d323ba49daf4ef3288f2a5df54f02a7aa9b37d11e4d6
|
Provenance
The following attestation bundles were made for zadu-0.4.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on hj-n/zadu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zadu-0.4.1-py3-none-any.whl -
Subject digest:
de31fd5693bbaf40a57fea53993f1dc530d53a08ee14fd4252404f4a1062d731 - Sigstore transparency entry: 926062153
- Sigstore integration time:
-
Permalink:
hj-n/zadu@4d3929bcb9ffb39039e873952dc353d27baf9cd4 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/hj-n
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4d3929bcb9ffb39039e873952dc353d27baf9cd4 -
Trigger Event:
push
-
Statement type: