Visualizing and propagating uncertainty in PCA

# VIPurPCA VIPurPCA offers a visualization of uncertainty propagated through the dimensionality reduction technique Principal Component Analysis (PCA) by automatic differentiation.

### Installation

VIPurPCA requires Python 3.7.3 or later and can be installed via:

``````pip install vipurpca
``````

A website showing results and animations can be found here.

### Usage

#### Propagating uncertainty through PCA and visualize output uncertainty as animated scatter plot

In order to propagate uncertainty through PCA the class `PCA` can be used, which has the following parameters, attributes, and methods:

Parameters
matrix : array_like
Array of size [n, p] containing mean numbers to which VIPurPCA should be applied.
n_components : int or float, default=None, optional
Number of components to keep.
axis : {0, 1} , default=0, optional
The default expects samples in rows and features in columns.
cov_data : array_like of shape [np] or [np, n*p] , default=None, optional
Uncertainties attached to the numbers in matrix. If cov_data is one-dimensional it is assumend to be the diagonal of a diagonal matrix. If None
compute_jacobian : Boolean, default=False, optional
Whether or whether not to propagate uncertainty through PCA.
Attributes
size : [n, p]
Dimension of matrix (n: number of samples, p: number of dimensions)
covariance : ndarray of size [p, p]
Features' covariance matrix.
eigenvalues : ndarray of size [n_components]
Eigenvalues obtained from eigenvalue decomposition of the covariance matrix.
eigenvectors : ndarray of size [n_componentsp, np]
Eigenvectors obtained from eigenvalue decomposition of the covariance matrix.
jacobian : ndarray of size [n_componentsp, np]
Jacobian containing derivatives of eigenvectors w.r.t. input matrix.
jacobian_eigenvalues : ndarray of size [n_componentsp, np]
Jacobian containing derivatives of eigenvalues w.r.t. input matrix.
cov_eigenvectors : ndarray of size [n_componentsp, n_componentsp]
Propagated uncertainties of eigenvectors.
cov_eigenvalues : ndarray of size [n_components*n_components]
Propagaged uncertainties of eigenvalues.
transformed data : ndarray of size [n, n_components]
Low dimensional representation of data after applying PCA.
Methods
pca_value() Apply PCA to the matrix.
pca_grad(center=True) Apply PCA to the matrix and compute the jacobian and jacobian_eigenvalues using automatic differentiation.
transform_data() Transform matrix according to eigenvectors and reduce dimensionality according to n_components.
compute_cov_eigenvectors() Compute uncertainties of eigenvectors.
compute_cov_eigenvalues() Compute uncertainties of eigenvalues.
animate(n_frames=10, labels=None, outfile='animation.html') Generate animation with n_frames number of frames with plotly. labels (list, 1d array) indicate labelling of individual samples. Save animation (as html) at outfile.

#### Example datasets

Three example datasets can be loaded after installing VIPurPCA providing mean, covariance and labels.

``````from vipurpca import load_data
``````

#### Example

``````from vipurpca import load_data
from vipurpca import PCA

# load mean (Y), uncertainty estimates (cov_Y) and labels (y)
pca_student_grades = PCA(matrix=Y, cov_data=cov_Y, n_components=2, axis=0, compute_jacobian=True)
# compute PCA with backprop
# Bayesian inference
# Transform data
``````

The resulting animation can be found here here.

