Measures of projection quality
Project description
sortedness
sortedness
is a measure of quality of data transformation, often dimensionality reduction.
It is less sensitive to irrelevant distortions and return values in a more meaningful interval than Kruskal stress formula I.
This Python library / code provides a reference implementation for the functions presented here (paper unavailable until publication).
Overview
Local variants return a value for each provided point. The global variant returns a single value for all points. Any local variant can be used as a global measure by taking the mean value.
Local variants: sortedness(X, X_)
, pwsortedness(X, X_)
, rsortedness(X, X_)
.
Global variant: global_sortedness(X, X_)
.
Python installation
from package through pip
# Set up a virtualenv.
python3 -m venv venv
source venv/bin/activate
# Install from PyPI
pip install -U sortedness
from source
git clone https://github.com/sortedness/sortedness
cd sortedness
poetry install
Examples
Sortedness
import numpy as np
from numpy.random import permutation
from sklearn.decomposition import PCA
from sortedness import sortedness
# Some synthetic data.
mean = (1, 2)
cov = np.eye(2)
rng = np.random.default_rng(seed=0)
original = rng.multivariate_normal(mean, cov, size=12)
projected2 = PCA(n_components=2).fit_transform(original)
projected1 = PCA(n_components=1).fit_transform(original)
np.random.seed(0)
projectedrnd = permutation(original)
# Print `min`, `mean`, and `max` values.
s = sortedness(original, original)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = sortedness(original, projected2)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = sortedness(original, projected1)
print(min(s), sum(s) / len(s), max(s))
"""
0.432937128932 0.7813889452999166 0.944810120534
"""
s = sortedness(original, projectedrnd)
print(min(s), sum(s) / len(s), max(s))
"""
-0.578096068617 -0.06328160775358334 0.396112816715
"""
Pairwise sortedness
import numpy as np
from numpy.random import permutation
from sklearn.decomposition import PCA
from sortedness import pwsortedness
# Some synthetic data.
mean = (1, 2)
cov = np.eye(2)
rng = np.random.default_rng(seed=0)
original = rng.multivariate_normal(mean, cov, size=12)
projected2 = PCA(n_components=2).fit_transform(original)
projected1 = PCA(n_components=1).fit_transform(original)
np.random.seed(0)
projectedrnd = permutation(original)
# Print `min`, `mean`, and `max` values.
s = pwsortedness(original, original)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = pwsortedness(original, projected2)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = pwsortedness(original, projected1)
print(min(s), sum(s) / len(s), max(s))
"""
0.730078995423 0.7744573488776667 0.837310352695
"""
s = pwsortedness(original, projectedrnd)
print(min(s), sum(s) / len(s), max(s))
"""
-0.198780473657 -0.0645984203715 0.147224384381
"""
Sortedness
import numpy as np
from numpy.random import permutation
from sklearn.decomposition import PCA
from sortedness import global_pwsortedness
# Some synthetic data.
mean = (1, 2)
cov = np.eye(2)
rng = np.random.default_rng(seed=0)
original = rng.multivariate_normal(mean, cov, size=12)
projected2 = PCA(n_components=2).fit_transform(original)
projected1 = PCA(n_components=1).fit_transform(original)
np.random.seed(0)
projectedrnd = permutation(original)
# Print measurement result and p-value.
s = global_pwsortedness(original, original)
print(list(s))
"""
[1.0, 3.6741408919675163e-93]
"""
s = global_pwsortedness(original, projected2)
print(list(s))
"""
[1.0, 3.6741408919675163e-93]
"""
s = global_pwsortedness(original, projected1)
print(list(s))
"""
[0.7715617715617715, 5.240847664048334e-20]
"""
s = global_pwsortedness(original, projectedrnd)
print(list(s))
"""
[-0.06107226107226107, 0.46847188611226276]
"""
** Copyright (c) 2023. Davi Pereira dos Santos and Tacito Neves**
Grants
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sortedness-0.230706.3-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dba2e22d1a55642895f5f956d76aaf0f93842234d8378e95c3a4f8808bd0b09a |
|
MD5 | d06005b1b533bdb690f384a82a112f68 |
|
BLAKE2b-256 | 04a4321a3e5b48c07a5967e2afb4d32ebc9fe5b5c7e7c52b4b82fa21aa278c99 |