Skip to main content

Projection Separability Indices - Python wrapper

Project description

Projection Separability Indices

This python package is based on the MATLAB project named projection-separability-indices.

Description

The projection separability indices (PSIs) are projection-based statistical measures specifically designed to assess and quantify the group separability of data samples in a geometrical space. For instance, PSIs can be used to evaluate the quality of the dimensionality reduction analyses produced by embedding algorithms. Currently, this package implements four different PSIs for evaluating group separability and a statistical test termed trustworthiness, which is based on a null model to assess the statistical significance of each PSI by a p-value.

For more details see Measuring group separability in geometrical space for evaluation of pattern recognition and dimension reduction algorithms.

PSI measures

  • psi-p: Based on the Mann-Whitney U-test p-value [1]
  • psi-roc: Based on the Area Under the ROC-Curve [2]
  • psi-pr: Based on the Area Under the Precision-Recall Curve [3]
  • psi-mcc: Based on the Matthews Correlation Coefficient [4]

[1] H. B. Mann and D. R. Whitney, “On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other,” Ann. Math. Stat., vol. 18, no. 1, pp. 50–60, 1947, doi: 10.1214/aoms/1177730491.

[2] J. S. Hanley and B. J. McNeil, “The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve,” Radiology, vol. 143, no. 1, pp. 29–36, 1982.

[3] V. Raghavan, P. Bollmann, and G. S. Jung, “A critical investigation of recall and precision as measures of retrieval system performance,” ACM Trans. Inf. Syst., vol. 7, no. 3, pp. 205–229, 1989, doi: 10.1145/65943.65945.

[4] B. W. Matthews, “Comparison of the predicted and observed secondary structure of T4 phage lysozyme,” BBA - Protein Struct., vol. 405, no. 2, pp. 442–451, 1975, doi: 10.1016/0005-2795(75)90109-9.

Installation

Run the following to install:

pip install psis

Usage

Compute PSIs

import numpy as np
from psis import indices

"""
Simulated embedding obtained by a dimension reduction method.
In this example, only two dimensions are used. However, an arbitrary 
number of dimensions can be evaluated.
Note: It is expected to receive samples as rows and the features/variables as columns.
"""
embedding = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [10, 11], [12, 13], [14, 15], [16, 17]])

"""
List of sample labels (groups/classes).
In this example, only two different groups are used. However, an arbitrary
number of classes can be evaluated.
"""
labels = np.array(['group1', 'group1', 'group1', 'group1', 'group2', 'group2', 'group2', 'group2'])

"""
List of positive samples.
Depending on the study, positive classes are usually ranked as
the labels for which a particular prediction is desired.

For instance:
- sick patients (positive class) versus controls (negative class)
- burnout (positive class), depression (positive class), versus control (negative class)

If you are not sure which are your positive classes, then omit this input and the
algorithm will take the groups with the lower number of samples as positive
"""
positives = np.array(['group1'])

"""
Base approach for projecting the points.

Available options are:
- centroid [default]
- lda
"""
projection_type = 'centroid'

"""
Base approach for defining the groups' centroids. 
NOTE: Only applicable if projection_type is centroid, ignored otherwise.

Available options are:
- mean
- median [default]
- mode
"""
center_formula = 'median'

# Group separability evaluation
psi_p, psi_roc, psi_pr, psi_mcc = indices.compute_psis(embedding, labels, positives, projection_type, center_formula)

print(psi_p)
print(psi_roc)
print(psi_pr)
print(psi_mcc)

Compute trustworthiness of PSIs

from sklearn.datasets import load_iris
from psis import indices

# Sample data. Details at https://scikit-learn.org/stable/datasets/toy_dataset.html
data = load_iris()

# Number of iterations for the Null model
iterations = 50

# Random seed (for reproducibility)
seed = 10

# Linear Discriminant Analysis (LDA) based projection
projection = 'lda'

# Group separability evaluation.
# In this example, the evaluation of group separability is directly
# assessed in the High-Dimensional (HD) space, and the trustworthiness of
# each PSI (together with details about their null model evaluation) is returned
results = indices.compute_trustworthiness(data.data, data.target, iterations=iterations, projection_type=projection, seed=seed)

# Accessing the results
# In this example only 'psi_roc' is evaluated. The other indices' results can be
# accessed in the same way
print(results['psi_roc']['value'])  # Initial index value
print(results['psi_roc']['min'])  # Minimum permuted value
print(results['psi_roc']['max'])  # Maximum permuted value
print(results['psi_roc']['p_value'])  # Separability significance (p-value)

Issues

Please, report any issue at psis/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psis-0.3.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

psis-0.3.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file psis-0.3.0.tar.gz.

File metadata

  • Download URL: psis-0.3.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for psis-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4f1c562bcd4031f1364bfcebc467257e3dfe42c28a92245da249806fb64372a9
MD5 993c5268d38a657801491e3cb64a3752
BLAKE2b-256 61507b0d18e2ca8f520e55b69a5f842094f759fb52b29625a356d8a7859d5f93

See more details on using hashes here.

File details

Details for the file psis-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: psis-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for psis-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91ee8a838b8882eac91833b493f6b86e57513a3a21e3508cc87e9b244f4b5816
MD5 a4e22e128e387ee607f83d56d1beab68
BLAKE2b-256 7c455676c3a97f3f9ef5e318fab5984264174fb0c77943a6e4f50862d05f574d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page