Sparse Principal Component Analysis in Python
Project description
Uses an alternating manifold proximal gradient (A-ManPG) method to find sparse principal component loadings from the given data or covariance matrix.
Requires numpy to be installed.
The GitHub repository can be found here.
Usage
spca(z, lambda1, lambda2,
x0=None, y0=None, k=0, gamma=0.5, type=0,
maxiter=1e4, tol=1e-5, f_palm=1e5,
normalize=True, verbose=False):
Arguments
Name | Type | Description |
---|---|---|
z |
numpy.ndarray | Either the data matrix or sample covariance matrix |
lambda1 |
float list | List of parameters of length n for L1-norm penalty |
lambda2 |
float or numpy.inf | L2-norm penalty term |
x0 |
numpy.ndarray | Initial x-values for the gradient method, default value is the first n right singular vectors |
y0 |
numpy.ndarray | Initial y-values for the gradient method, default value is the first n right singular vectors |
k |
int | Number of principal components desired, default is 0 (returns min(n-1, p) principal components) |
gamma |
float | Parameter to control how quickly the step size changes in each iteration, default is 0.5 |
type |
int | If 0, b is expected to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0 |
maxiter |
int | Maximum number of iterations allowed in the gradient method, default is 1e4 |
tol |
float | Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5 |
f_palm |
float | Upper bound for the F-value to reach convergence, default is 1e5 |
normalize |
bool | Center and normalize rows to Euclidean length 1 if True, default is True |
verbose |
bool | Function prints progress between iterations if True, default is False |
Value
Returns a dictionary with the following key-value pairs:
Key | Value Type | Value |
---|---|---|
loadings |
numpy.ndarray | Loadings of the sparse principal components |
f_manpg |
float | Final F-value |
x |
numpy.ndarray | Corresponding ndarray in subproblem to the loadings |
iter |
int | Total number of iterations executed |
sparsity |
float | Number of sparse loadings (loadings == 0) divided by number of all loadings |
time |
float | Execution time in seconds |
Authors
Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou
References
Chen, S., Ma, S., Xue, L., and Zou, H. (2020) "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis" INFORMS Journal on Optimization 2:3, 192-208.
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286.
Zou, H., & Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320.
Example
See sparsepca.py
for a more in-depth example.
import numpy as np
from sparsepca import spca_amanpg
k = 4 # columns
d = 500 # dimensions
m = 1000 # sample size
lambda1 = 0.1 * np.ones((k, 1))
lambda2 = 1
np.random.seed(10)
a = np.random.normal(0, 1, size=(m, d)) # generate random normal 1000x500 matrix
fin_sprout = spca_amanpg(a, lambda1, lambda2, k=k)
print(f"Finite: {fin_sprout['iter']} iterations with final value
{fin_sprout['f_manpg']}, sparsity {fin_sprout['sparsity']},
timediff {fin_sprout['time']}.")
fin_sprout['loadings']
inf_sprout = spca_amanpg(a, lambda1, np.inf, k=4)
print(f"Infinite: {inf_sprout['iter']} iterations with final value
{inf_sprout['f_manpg']}, sparsity {inf_sprout['sparsity']},
timediff {inf_sprout['time']}.")
inf_sprout['loadings']
History
0.2.3
- Doc fixes
0.2.2
- Doc fixes
- PyPI metadata fixes
0.2.1
- Doc fixes
0.2.0
- Initial PyPI release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sparsepca-0.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c603024461057157472b16c7e9d51d0edb19af6ba31a159246dd91a25b41f4d |
|
MD5 | 9b1f06f9301efc6cb2d8b9b5c8977ef3 |
|
BLAKE2b-256 | cdafc75d3e0b4d0cdaed89bfc6b2e5df15f7b3bbe067a6260f7fd46acab63a6c |