Skip to main content

Measures of projection quality

Project description

test codecov github Python version license: GPL v3

API documentation DOI Downloads PyPI - Downloads

sortedness

sortedness is the level of agreement between two points regarding to how they rank all remaining points in a dataset. This is valid even for points from different spaces, enabling the measurement of the quality of data transformation processes, often dimensionality reduction. It is less sensitive to irrelevant distortions, and return values in a more meaningful interval, than Kruskal stress formula I.
This Python library / code provides a reference implementation for the functions presented here (paper unavailable until publication).

Overview

Local variants return a value for each provided point. The global variant returns a single value for all points. Any local variant can be used as a global measure by taking the mean value.

Local variants: sortedness(X, X_), pwsortedness(X, X_), rsortedness(X, X_).

Global variant: global_sortedness(X, X_).

Python installation

from package through pip

# Set up a virtualenv. 
python3 -m venv venv
source venv/bin/activate

# Install from PyPI
pip install -U sortedness

from source

git clone https://github.com/sortedness/sortedness
cd sortedness
poetry install

Examples

Sortedness

import numpy as np
from numpy.random import permutation
from sklearn.decomposition import PCA

from sortedness import sortedness

# Some synthetic data.
mean = (1, 2)
cov = np.eye(2)
rng = np.random.default_rng(seed=0)
original = rng.multivariate_normal(mean, cov, size=12)
projected2 = PCA(n_components=2).fit_transform(original)
projected1 = PCA(n_components=1).fit_transform(original)
np.random.seed(0)
projectedrnd = permutation(original)

# Print `min`, `mean`, and `max` values.
s = sortedness(original, original)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = sortedness(original, projected2)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = sortedness(original, projected1)
print(min(s), sum(s) / len(s), max(s))
"""
0.393463224666 0.7565797804351666 0.944810120534
"""
s = sortedness(original, projectedrnd)
print(min(s), sum(s) / len(s), max(s))
"""
-0.648305479567 -0.09539895194975 0.397019507592
"""
# Single point fast calculation.
s = sortedness(original, projectedrnd, 2)
print(s)
"""
0.231079547491
"""

Pairwise sortedness

import numpy as np
from numpy.random import permutation
from sklearn.decomposition import PCA

from sortedness import pwsortedness

# Some synthetic data.
mean = (1, 2)
cov = np.eye(2)
rng = np.random.default_rng(seed=0)
original = rng.multivariate_normal(mean, cov, size=12)
projected2 = PCA(n_components=2).fit_transform(original)
projected1 = PCA(n_components=1).fit_transform(original)
np.random.seed(0)
projectedrnd = permutation(original)

# Print `min`, `mean`, and `max` values.
s = pwsortedness(original, original)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = pwsortedness(original, projected2)
print(min(s), sum(s) / len(s), max(s))
"""
1.0 1.0 1.0
"""
s = pwsortedness(original, projected1)
print(min(s), sum(s) / len(s), max(s))
"""
0.649315577592 0.7534291438323333 0.834601601062
"""
s = pwsortedness(original, projectedrnd)
print(min(s), sum(s) / len(s), max(s))
"""
-0.168611098044 -0.07988253899799999 0.14442446342
"""
# Single point fast calculation.
s = pwsortedness(original, projectedrnd, 2)
print(s)
"""
0.036119718802
"""

Global pairwise sortedness

import numpy as np
from numpy.random import permutation
from sklearn.decomposition import PCA

from sortedness import global_pwsortedness

# Some synthetic data.
mean = (1, 2)
cov = np.eye(2)
rng = np.random.default_rng(seed=0)
original = rng.multivariate_normal(mean, cov, size=12)
projected2 = PCA(n_components=2).fit_transform(original)
projected1 = PCA(n_components=1).fit_transform(original)
np.random.seed(0)
projectedrnd = permutation(original)

# Print measurement result and p-value.
s = global_pwsortedness(original, original)
print(list(s))
"""
[1.0, 3.6741408919675163e-93]
"""
s = global_pwsortedness(original, projected2)
print(list(s))
"""
[1.0, 3.6741408919675163e-93]
"""
s = global_pwsortedness(original, projected1)
print(list(s))
"""
[0.7715617715617715, 5.240847664048334e-20]
"""
s = global_pwsortedness(original, projectedrnd)
print(list(s))
"""
[-0.06107226107226107, 0.46847188611226276]
"""

** Copyright (c) 2023. Davi Pereira dos Santos and Tacito Neves**

TODO

Future work address handling large datasets: approximate sortedness value, and size-insensitive weighting scheme.

Reference

Please use the following reference to cite this work:

@inproceedings {10.2312:eurova.20231093,
booktitle = {EuroVis Workshop on Visual Analytics (EuroVA)},
editor = {Angelini, Marco and El-Assady, Mennatallah},
title = {{Nonparametric Dimensionality Reduction Quality Assessment based on Sortedness of Unrestricted Neighborhood}},
author = {Pereira-Santos, Davi and Neves, Tácito Trindade Araújo Tiburtino and Carvalho, André C. P. L. F. de and Paulovich, Fernando V.},
year = {2023},
publisher = {The Eurographics Association},
ISSN = {2664-4487},
ISBN = {978-3-03868-222-6},
DOI = {10.2312/eurova.20231093}
}

Grants

This work was supported by Wellcome Leap 1kD Program; São Paulo Research Foundation (FAPESP) - grant 2020/09835-1; Cana- dian Institute for Health Research (CIHR) Canadian Research Chairs (CRC) stipend [award number 1024586]; Canadian Foun- dation for Innovation (CFI) John R. Evans Leaders Fund (JELF) [grant number 38835]; Dalhousie Medical Research Fund (DMRF) COVID-19 Research Grant [grant number 603082]; and the Cana- dian Institute for Health Research (CIHR) Project Grant [award number 177968].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sortedness-1.230712.2.tar.gz (733.1 kB view details)

Uploaded Source

Built Distribution

sortedness-1.230712.2-cp310-cp310-manylinux_2_35_x86_64.whl (747.6 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

File details

Details for the file sortedness-1.230712.2.tar.gz.

File metadata

  • Download URL: sortedness-1.230712.2.tar.gz
  • Upload date:
  • Size: 733.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-73-generic

File hashes

Hashes for sortedness-1.230712.2.tar.gz
Algorithm Hash digest
SHA256 9a9b202f9f3fe6e1a30c5885d6d4e39697e816fa25ddbba1d9a25f2900c427e6
MD5 09c4c7a4f75449036798b1b88d29b623
BLAKE2b-256 a8c0c25f5799c57185827076f7fa3efdd619c5e50328684f6862b841da067071

See more details on using hashes here.

File details

Details for the file sortedness-1.230712.2-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for sortedness-1.230712.2-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 9ede623e28e669d409bbb0a71219a18a63feb51e2999ff73dd1dbdeaebc09dea
MD5 357c95ad75fb0c8e5fa1dd6745e21577
BLAKE2b-256 3fdea2a86563ca51c5537e8df0fc2c42da0ddae70f565d98a083f8b7d738713b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page