A package with fast, TensorFlow-based implementations of projection (i.e., dimensionality reduction) quality metrics.
Project description
Accelerated Projection Quality Metrics
When evaluating Dimensionality Reduction (AKA Projection) techniques, a number of quality metrics are usually employed.
These quality metrics are numeric ways of evaluating a projection, and might be useful to determine whether a sane projection has been produced by an algorithm (e.g, t-SNE, or UMAP).
In this repository, I aim to provide a comprehensive set of implementations of projection quality metrics that are fast and use idiomatic TensorFlow in their implementation.
Quality Metrics
A quality metric is a function $\mathcal{M}_\eta$ with two arguments: a dataset $\mathbf{X} \in \mathbb{R}^{n\times D}$ of $D$-dimensional data points, and a corresponding projection $\mathbf{Y} = \mathcal{P}(\mathbb{X}) \in \mathbb{R}^{n\times d}$ where $d$ is usually 2 or 3. We represent by $\eta$ the hyperparameters associated with $\mathcal{M}$ -- for instance, the size $k$ of the neighborhood in neighborhood-based metrics such as trustworthiness, continuity, and neighborhood hit.
Projection algorithms can generate $\mathbf{Y}$ in many ways. Of course, not all such projections are equally useful and/or truthful to the data they are based on. While some techniques might be better at representing global aspects of the original dataset $\mathbf{X}$, others might instead favor local neighborhood preservation.
Each $\mathcal{M}_\eta(\mathbf{X}, \mathbf{Y})$ returns a single score representing the quality of $\mathbf{Y}$ as a projection for $\mathbb{X}$. Different quality metrics aim to evaluate different aspects of data pattern preservation. For example, Trustworthiness is a metric that aims to evaluate the amount of false neighbors introduced in a projection -- that is to say, points that were not close in $D$-dimensional space and have been wrongfully brought together by $\mathcal{P}$. Stress is another metric, aimed at measuring discrepancies in pairwise distances in $\mathbf{X}$ when compared to pairwise distances in $\mathbf{Y}$.
Installation
Installation is possible using pip directly:
pip install tensorflow-projection-qm
If you have CUDA available, explicitly enable the CUDA-capable TensorFlow dependency by installing with
pip install tensorflow-projection-qm[and-cuda]
Using
The functions that calculate the quality metrics all sit in the tensorflow_projection_qm.metrics package.
from tensorflow_projection_qm.metrics import continuity, trustworthiness
# Set up some fake data
import numpy as np
X = np.random.randn(100, 5) # 100 data points with 5 dimensions.
# Project to 2-D with TSNE
from sklearn.manifold import TSNE
X_proj = TSNE(n_components=2).fit_transform(X).astype(X.dtype)
# Evaluate the projection:
C = continuity.continuity(X, X_proj, k=21).numpy()
T = trustworthiness.trustworthiness(X, X_proj, k=21).numpy()
print(f"Continuity: {C}")
print(f"Trustworthiness: {T}")
# Compute per-point value of a metric (not all metrics support this)
C_i = continuity.continuity_with_local(X, X_proj, k=21)[1].numpy()
T_i = trustworthiness.trustworthiness_with_local(X, X_proj, k=21)[1].numpy()
print(f"Per-point Continuity: {C_i}")
print(f"Per-point Trustworthiness: {T_i})
Implemented metrics
- Average Local Error [7]
- Class-Aware Continuity [3]
- Class-Aware Trustworthiness [3]
- Continuity [1]
- Distance Consistency [5]
- False Neighbors [7]
- Jaccard Dissimilarity of Neighbor Sets
- Mean Relative Rank Errors [2]
- Missing Neighbors [7]
- Neighborhood Hit [9]
- Normalized Stress [8]
- Pearson Correlation of Distances [11]
- Procrustes Statistic [4]
- Scale-Normalized Stress [10]
- Shepard Goodness [6]
- True Neighbors [7]
- Trustworthiness [1]
[1] Venna and Kaski. Local Multidimensional Scaling. 2006.
[2] Lee and Verleysen. Quality assessment of dimensionality reduction: Rank-based criteria. 2008.
[3] Colange et al. Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction. 2020.
[4] Goldberg and Ritov. Local procrustes for manifold embedding: a measure of embedding quality and embedding algorithms. 2009.
[5] Sips et al. Selecting good views of high-dimensional data using class consistency. 2009.
[6] Sidney. Nonparametric statistic for the behavioral sciences. 1957.
[7] Martins et al. Visual analysis of dimensionality reduction quality for parameterized projections. 2014.
[8] Joia et al. Local Affine Multidimensional Projection. 2011.
[9] Paulovich et al. Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping. 2007.
[10] Smelser et al. "Normalized Stress" is Not Normalized: How to Interpret Stress Correctly. Preprint, 2024.
[11] Geng et al. Supervised nonlinear dimensionality reduction for visualization and classification. 2005.
Why this package?
I have a recurring need in my research (see About Me below) to evaluate different projection algorithms with respect to different quality metrics. While there are some libraries for this, and I am grateful for their authors' work in gathering and implementing different quality metrics (see, for example, ZADU), I have found some implementations to not be as performant as I need them to be (keep in mind I evaluate thousands of projections at a time), and sometimes buggy.
At some point I noticed I had been re-implementing the same quality metrics over and over again, sometimes introducing bugs myself due to mistakes when copying and adapting code from a public source, such as Espadoto's comprehensive survey.
Instead, I have chosen to start this package with the goals of:
- Having easy access to standard implementations of projection quality metrics;
- Implementing quality metrics in vectorized manners as often as possible, taking advantage of parallel execution for speeding up calculations;
- Sharing this code openly as my first package to be published on PyPi.org;
- Using an easily-available framework (TensorFlow) to back up my implementations and seamlessly take advantage of GPUs when available.
About
This package is under active development, and is very much in its early stages. Please feel free to report bugs, but also be mindful that this is a best-effort attempt to generalize/speed up my own implementations of quality metrics.
About Me
My name is Alister Machado, I am a PhD Candidate researching Data Visualization (more specifically focused in dimensionality reduction and explainable AI) with the Visualization and Graphics Group (VIG) of Utrecht University, in the Netherlands. I am the person behind ShaRP and the Differentiable DBMs. You can check out my research here. I am currently in the 4th year of my PhD (out of 5 total), and am expected to graduate in 2026. Feel free to reach out!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tensorflow_projection_qm-0.3.0.post1.tar.gz.
File metadata
- Download URL: tensorflow_projection_qm-0.3.0.post1.tar.gz
- Upload date:
- Size: 118.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b68202ba49ea3e75198c09737eb2695df97abf69ddf6153e83a30f850b83810b
|
|
| MD5 |
f6cdb474a461c55efdd1aed7cecce89b
|
|
| BLAKE2b-256 |
7d49cdd77b3faa1e7e2048a56f913396a5dfd6279fccb339df35ed99cb4c84b4
|
Provenance
The following attestation bundles were made for tensorflow_projection_qm-0.3.0.post1.tar.gz:
Publisher:
publish.yml on amreis/tf-projection-qm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tensorflow_projection_qm-0.3.0.post1.tar.gz -
Subject digest:
b68202ba49ea3e75198c09737eb2695df97abf69ddf6153e83a30f850b83810b - Sigstore transparency entry: 658512707
- Sigstore integration time:
-
Permalink:
amreis/tf-projection-qm@177fd5d67b19fbf2a0033bb544ee58d630704614 -
Branch / Tag:
refs/tags/v0.3.0.post1 - Owner: https://github.com/amreis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@177fd5d67b19fbf2a0033bb544ee58d630704614 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tensorflow_projection_qm-0.3.0.post1-py3-none-any.whl.
File metadata
- Download URL: tensorflow_projection_qm-0.3.0.post1-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7df5d59cef85c06f3494e37dd311fc65b584b57259d2b7cffefc39dd241797c3
|
|
| MD5 |
0612da0971f88a9ae5cdecf44defefbe
|
|
| BLAKE2b-256 |
1e953d7921d4d8bb0450a485ebd86274a941a224feea88719e54675ad73e006c
|
Provenance
The following attestation bundles were made for tensorflow_projection_qm-0.3.0.post1-py3-none-any.whl:
Publisher:
publish.yml on amreis/tf-projection-qm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tensorflow_projection_qm-0.3.0.post1-py3-none-any.whl -
Subject digest:
7df5d59cef85c06f3494e37dd311fc65b584b57259d2b7cffefc39dd241797c3 - Sigstore transparency entry: 658512734
- Sigstore integration time:
-
Permalink:
amreis/tf-projection-qm@177fd5d67b19fbf2a0033bb544ee58d630704614 -
Branch / Tag:
refs/tags/v0.3.0.post1 - Owner: https://github.com/amreis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@177fd5d67b19fbf2a0033bb544ee58d630704614 -
Trigger Event:
push
-
Statement type: