Skip to main content

Sparse Principal Component Analysis in Python

Project description

pypi versionpython version pypi downloads

Uses an alternating manifold proximal gradient (A-ManPG) method to find sparse principal component loadings from the given data or covariance matrix.

Requires numpy to be installed.

The GitHub repository can be found here.

Usage

spca(z, lambda1, lambda2, 
     x0=None, y0=None, k=0, gamma=0.5, type=0, 
     maxiter=1e4, tol=1e-5, f_palm=1e5,
	 normalize=True, verbose=False):

Arguments

Name Type Description
z numpy.ndarray Either the data matrix or sample covariance matrix
lambda1 float list List of parameters of length n for L1-norm penalty
lambda2 float or numpy.inf L2-norm penalty term
x0 numpy.ndarray Initial x-values for the gradient method, default value is the first n right singular vectors
y0 numpy.ndarray Initial y-values for the gradient method, default value is the first n right singular vectors
k int Number of principal components desired, default is 0 (returns min(n-1, p) principal components)
gamma float Parameter to control how quickly the step size changes in each iteration, default is 0.5
type int If 0, b is expected to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0
maxiter int Maximum number of iterations allowed in the gradient method, default is 1e4
tol float Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5
f_palm float Upper bound for the F-value to reach convergence, default is 1e5
normalize bool Center and normalize rows to Euclidean length 1 if True, default is True
verbose bool Function prints progress between iterations if True, default is False

Value

Returns a dictionary with the following key-value pairs:

Key Value Type Value
loadings numpy.ndarray Loadings of the sparse principal components
f_manpg float Final F-value
x numpy.ndarray Corresponding ndarray in subproblem to the loadings
iter int Total number of iterations executed
sparsity float Number of sparse loadings (loadings == 0) divided by number of all loadings
time float Execution time in seconds

Authors

Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou

References

Chen, S., Ma, S., Xue, L., and Zou, H. (2020) "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis" INFORMS Journal on Optimization 2:3, 192-208.

Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286.

Zou, H., & Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320.

Example

See sparsepca.py for a more in-depth example.

import numpy as np
from sparsepca import spca_amanpg

k = 4  # columns
d = 500  # dimensions
m = 1000  # sample size
lambda1 = 0.1 * np.ones((k, 1))
lambda2 = 1

np.random.seed(10)
a = np.random.normal(0, 1, size=(m, d))  # generate random normal 1000x500 matrix
fin_sprout = spca_amanpg(a, lambda1, lambda2, k=k)
print(f"Finite: {fin_sprout['iter']} iterations with final value 
		{fin_sprout['f_manpg']}, sparsity {fin_sprout['sparsity']}, 
		timediff {fin_sprout['time']}.")

fin_sprout['loadings']

inf_sprout = spca_amanpg(a, lambda1, np.inf, k=4)
print(f"Infinite: {inf_sprout['iter']} iterations with final value 
		{inf_sprout['f_manpg']}, sparsity {inf_sprout['sparsity']}, 
		timediff {inf_sprout['time']}.")

inf_sprout['loadings']

History

0.2.3

  • Doc fixes

0.2.2

  • Doc fixes
  • PyPI metadata fixes

0.2.1

  • Doc fixes

0.2.0

  • Initial PyPI release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparsepca-0.2.3.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparsepca-0.2.3-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file sparsepca-0.2.3.tar.gz.

File metadata

  • Download URL: sparsepca-0.2.3.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for sparsepca-0.2.3.tar.gz
Algorithm Hash digest
SHA256 f205793118d51cbae7657eb2bdc6395f306aaf6e769588eaf06fb2eda9002ef1
MD5 5305f627fa2f492e03ac7bc451a72f5d
BLAKE2b-256 c80484f33f04280daf14e5d6229fc2f1aa2e855d146da2af1cd2f5fe894fc179

See more details on using hashes here.

File details

Details for the file sparsepca-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: sparsepca-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for sparsepca-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5c603024461057157472b16c7e9d51d0edb19af6ba31a159246dd91a25b41f4d
MD5 9b1f06f9301efc6cb2d8b9b5c8977ef3
BLAKE2b-256 cdafc75d3e0b4d0cdaed89bfc6b2e5df15f7b3bbe067a6260f7fd46acab63a6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page