Skip to main content

Sparse Principal Component Analysis in Python

Project description

pypi versionpython version pypi downloads

Uses an alternating manifold proximal gradient (A-ManPG) method to find sparse principal component loadings from the given data or covariance matrix.

Requires numpy to be installed.

The GitHub repository can be found here.

Usage

spca(z, lambda1, lambda2, 
     x0=None, y0=None, k=0, gamma=0.5, type=0, 
     maxiter=1e4, tol=1e-5, f_palm=1e5,
	 normalize=True, verbose=False):

Arguments

Name Type Description
z numpy.ndarray Either the data matrix or sample covariance matrix
lambda1 float list List of parameters of length n for L1-norm penalty
lambda2 float or numpy.inf L2-norm penalty term
x0 numpy.ndarray Initial x-values for the gradient method, default value is the first n right singular vectors
y0 numpy.ndarray Initial y-values for the gradient method, default value is the first n right singular vectors
k int Number of principal components desired, default is 0 (returns min(n-1, p) principal components)
gamma float Parameter to control how quickly the step size changes in each iteration, default is 0.5
type int If 0, b is expected to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0
maxiter int Maximum number of iterations allowed in the gradient method, default is 1e4
tol float Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5
f_palm float Upper bound for the F-value to reach convergence, default is 1e5
normalize bool Center and normalize rows to Euclidean length 1 if True, default is True
verbose bool Function prints progress between iterations if True, default is False

Value

Returns a dictionary with the following key-value pairs:

Key Value Type Value
loadings numpy.ndarray Loadings of the sparse principal components
f_manpg float Final F-value
x numpy.ndarray Corresponding ndarray in subproblem to the loadings
iter int Total number of iterations executed
sparsity float Number of sparse loadings (loadings == 0) divided by number of all loadings
time float Execution time in seconds

Authors

Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou

References

Chen, S., Ma, S., Xue, L., and Zou, H. (2020) "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis" INFORMS Journal on Optimization 2:3, 192-208.

Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286.

Zou, H., & Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320.

Example

See sparsepca.py for a more in-depth example.

import numpy as np
from sparsepca import spca_amanpg

k = 4  # columns
d = 500  # dimensions
m = 1000  # sample size
lambda1 = 0.1 * np.ones((k, 1))
lambda2 = 1

np.random.seed(10)
a = np.random.normal(0, 1, size=(m, d))  # generate random normal 1000x500 matrix
fin_sprout = spca_amanpg(a, lambda1, lambda2, k=k)
print(f"Finite: {fin_sprout['iter']} iterations with final value 
		{fin_sprout['f_manpg']}, sparsity {fin_sprout['sparsity']}, 
		timediff {fin_sprout['time']}.")

fin_sprout['loadings']

inf_sprout = spca_amanpg(a, lambda1, np.inf, k=4)
print(f"Infinite: {inf_sprout['iter']} iterations with final value 
		{inf_sprout['f_manpg']}, sparsity {inf_sprout['sparsity']}, 
		timediff {inf_sprout['time']}.")

inf_sprout['loadings']

History

0.2.3

  • Doc fixes

0.2.2

  • Doc fixes
  • PyPI metadata fixes

0.2.1

  • Doc fixes

0.2.0

  • Initial PyPI release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparsepca-0.2.3.tar.gz (6.2 kB view hashes)

Uploaded Source

Built Distribution

sparsepca-0.2.3-py3-none-any.whl (6.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page