Visualizing and propagating uncertainty in PCA
Project description
VIPurPCA
VIPurPCA offers a visualization of uncertainty propagated through the dimensionality reduction technique Principal Component Analysis (PCA) by automatic differentiation.
Installation
VIPurPCA requires Python 3.7.3 or later and can be installed via:
pip install vipurpca
A website showing results and animations can be found here.
Usage
Propagating uncertainty through PCA and visualize output uncertainty as animated scatter plot
In order to propagate uncertainty through PCA the class PCA
can be used, which has the following parameters, attributes, and methods:
Parameters | |
---|---|
matrix : array_like Array of size [n, p] containing mean numbers to which VIPurPCA should be applied. |
|
sample_cov : array_like of shape [n, n] or [n], default=None, optional Input uncertainties in terms of the sample covariance matrix. If sample_cov is one-dimensional its values are assumed to be the diagonal of a diagonal matrix. Used to compute the total covariance matrix over the input using the Kronecker product of sample_cov and feature_cov. |
|
feature_cov : array_like of shape [p, p] or [p], default=None, optional Input uncertainties in terms of the feature covariance matrix. If feature_cov is one-dimensional its values are assumed to be the diagonal of a diagonal matrix. Used to compute the total covariance matrix over the input using the Kronecker product of sample_cov and feature_cov. |
|
full_cov : array_like of shape [np, np] or [np], default=None, optional Input uncertainties in terms of the full covariance matrix. If full_cov is one-dimensional its values are assumed to be the diagonal of a diagonal matrix. Used alternatively to the Kronecker product of sample_cov and feature_cov. Requires more memory. |
|
n_components : int or float, default=None, optional Number of components to keep. |
|
axis : {0, 1} , default=0, optional The default expects samples in rows and features in columns. |
Attributes | |
---|---|
size : [n, p] Dimension of matrix (n: number of samples, p: number of dimensions) |
|
eigenvalues : ndarray of size [n_components] Eigenvalues obtained from eigenvalue decomposition of the covariance matrix. |
|
eigenvectors : ndarray of size [n_componentsp, np] Eigenvectors obtained from eigenvalue decomposition of the covariance matrix. |
|
jacobian : ndarray of size [n_componentsp, np] Jacobian containing derivatives of eigenvectors w.r.t. input matrix. |
|
cov_eigenvectors : ndarray of size [n_componentsp, n_componentsp] Propagated uncertainties of eigenvectors. |
|
transformed data : ndarray of size [n, n_components] Low dimensional representation of data after applying PCA. |
Methods | |
---|---|
pca_value() | Apply PCA to the matrix. |
compute_cov_eigenvectors(save_jacobian=False) | Compute uncertainties of eigenvectors. |
animate(pcx=1, pcy=2, n_frames=10, labels=None, outfile='animation.html') | Generate animation of PCA-plot of PC pcx vs. PC pcy with n_frames number of frames. labels (list, 1d array) indicate labelling of individual samples. > |
Example datasets
Two example datasets can be loaded after installing VIPurPCA providing mean, covariance and labels.
from vipurpca import load_data
Y, cov_Y, y = load_data.load_studentgrades_dataset()
Y, cov_Y, y = load_data.load_estrogen_dataset()
More information on the datasets can be found here
Example
from vipurpca import load_data
from vipurpca import PCA
# load mean (Y), uncertainty estimates (cov_Y) and labels (y)
Y, cov_Y, y = load_data.load_estrogen_dataset()
pca = PCA(matrix=Y, sample_cov=None, feature_cov=None,
full_cov=cov_Y, n_components=3, axis=0)
# compute PCA
pca.pca_value()
# Bayesian inference
pca.compute_cov_eigenvectors(save_jacobian=**False**)# Create animation
pca.animate(1, 2, labels=y)
The resulting animation can be found here here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.