Skip to main content

A Python Toolkit for Evaluating the Reliability of Dimensionality Reduction Embeddings

Project description

ZADU: A Python Toolkit for Evaluating the Reliability of Dimensionality Reduction Embeddings

ZADU is a Python library that provides a comprehensive suite of distortion measures for evaluating and analyzing dimensionality reduction (DR) embeddings. The library supports a diverse set of local, cluster-level, and global distortion measures, allowing users to assess DR techniques from various structural perspectives. By offering an optimized scheduling scheme and pointwise local distortions, ZADU enables efficient and in-depth analysis of DR embeddings.

Installation

You can install ZADU via pip:

pip install zadu

Supported Distortion Measures

ZADU currently supports a total of 17 distortion measures, including:

  • 7 local measures
  • 4 cluster-level measures
  • 6 global measures

For a complete list of supported measures, refer to measures.

How To Use ZADU

ZADU provides two different interfaces for executing distortion measures. You can either use the main class that wraps the measures, or directly access and invoke the functions that define each distortion measure.

Using the Main Class

Use the main class of ZADU to compute distortion measures. This approach benefits from the scheduling scheme, providing faster performance.

from zadu import zadu

spec = {
    "tnc": { "k": 20 },
    "snc": { "k": 30, "clustering": "hdbscan" }
}
scores = zadu.ZADU(spec).run(hd, ld)
print("T&C:", scores[0])
print("S&C:", scores[1])

hd represents high-dimensional data, ld represents low-dimensional data

Directly Accessing Functions

You can also directly access and invoke the functions defining each distortion measure for greater flexibility.

from zadu.measures import *

mrre = mean_relative_rank_error.run(hd, ld, k=20)
pr  = pearson_r.run(hd, ld)
nh  = neighborhood_hit.run(ld, label, k=20)

Advanced Features

Scheduling the Execution

ZADU optimizes the execution of multiple distortion measures through an effective scheduling scheme. It minimizes the computational overhead associated with preprocessing stages such as pairwise distance calculation, pointwise distance ranking determination, and k-nearest neighbor identification.

Computing Pointwise Local Distortions

Users can obtain local pointwise distortions by setting the return_local flag. If a specified distortion measure produces local pointwise distortion as intermediate results, it returns a list of pointwise distortions when the flag is raised.

from zadu import zadu

spec = {
    "dtm" : {},
    "mrre": { "k": 30 }
}
zadu_obj = zadu.ZADU(spec, return_local=True)
global_, local_ = zadu_obj.run(hd, ld)
print("MRRE local distortions:", local_["mrre"])

Visualizing Local Distortions

With the pointwise local distortions obtained from ZADU, users can visualize the distortions using various distortion visualizations. For example, CheckViz and the Reliability Map can be implemented using a Python visualization library with zaduvis.

from zadu import zadu
from zaduvis import zaduvis
import matplotlib.pyplot as plt

# Computing local pointwise distortions
specs = [{"measure": "snc", "params": {"k": 50}}]
zadu_obj = zadu.ZADU(spec, return_local=True)
global_, local_ = zadu_obj.run(hd, ld)
l_s = local_["local_steadiness"]
l_c = local_["local_cohesiveness"]

# Visualizing local distortions
fig, ax = plt.subplots(1, 2, figsize=(20, 10))
zaduvis.checkviz(ld, l_s, l_c, ax=ax[0])
zaduvis.reliability_map(ld, l_s, l_c, ax=ax[1])

The above code snippet demonstrates how to visualize local pointwise distortions using CheckViz and Reliability Map plots. zaduvis.checkviz generates a CheckViz plot, which shows local Steadiness (x-axis) vs. local Cohesiveness (y-axis) for each point in the embedding. zaduvis.reliability_map creates a Reliability Map plot, which colors each point in the embedding according to its local distortion scores, providing a spatial representation of the distortions in the DR embeddings.

Documentation

For more information about the available distortion measures, their use cases, and examples, please refer to our paper.

##Citation

##License

##Contributing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zadu-0.0.7.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

zadu-0.0.7-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file zadu-0.0.7.tar.gz.

File metadata

  • Download URL: zadu-0.0.7.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for zadu-0.0.7.tar.gz
Algorithm Hash digest
SHA256 3004576bceef2285d59d8c720fff0234a401c2d0418f5615d17bc872af2b8193
MD5 c764db55e697edbde6bb9fda86e135aa
BLAKE2b-256 e4514cc0a673f9e7c9f7d7cbbc538f2b4dbc82b77fee193036ae2b6bf13c7a26

See more details on using hashes here.

File details

Details for the file zadu-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: zadu-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for zadu-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 75bf89053e786a52613040fcc40e4e30245600ce0d84da9c0c71a866a23ab326
MD5 aee94475256b3327016d8b3475256c73
BLAKE2b-256 1ee237ddaffbd727cfa91cb0f1eb1e91334dae0dc3d26252fa765a75373b9f3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page