PyTorch implementation of PCA (similar to sklearn PCA).
Project description
Pytorch Principal Component Analysis (PCA)
Principal Component Anlaysis (PCA) in PyTorch. The intention is to provide a
simple and easy to use implementation of PCA in PyTorch, the most similar to
the sklearn
's PCA as possible (in terms of API and, of course, output).
Plus, this implementation is fully differentiable and faster (thanks to GPU parallelization)!
Links
Github repository: https://github.com/valentingol/torch_pca
Pypi project: https://pypi.org/project/torch_pca/
Documentation: https://torch-pca.readthedocs.io/en/latest/
Installation
Simply install it with pip:
pip install torch-pca
How to use
Exactly like sklearn.decomposition.PCA
but it uses PyTorch tensors as input and output!
from torch_pca import PCA
# Create like sklearn.decomposition.PCA, e.g.:
pca_model = PCA(n_components=None, svd_solver='full')
# Use like sklearn.decomposition.PCA, e.g.:
>>> new_train_data = pca_model.fit_transform(train_data)
>>> new_test_data = pca_model.transform(test_data)
>>> print(pca.explained_variance_ratio_)
[0.756, 0.142, 0.062, ...]
More details and features in the API documentation.
Gradient backward pass
Use the pytorch framework allows the automatic differentiation of the PCA!
The PCA transform method is always differentiable so it is always possible to compute gradient like that:
pca = PCA()
for ep in range(n_epochs):
optimizer.zero_grad()
out = neural_net(inputs)
with torch.no_grad():
pca.fit(out)
out = pca.transform(out)
loss = loss_fn(out, targets)
loss.backward()
If you want to compute the gradient over the full PCA model (including the
fitted pca.n_components
), you can do it by using the "full" SVD solver
and removing the part of the fit
method that enforce the deterministic
output by passing determinist=False
in fit
or fit_transform
method.
This part sort the components using the singular values and change their sign
accordingly so it is not differentiable by nature but may be not necessary if
you don't care about the determinism of the output:
pca = PCA(svd_solver="full")
for ep in range(n_epochs):
optimizer.zero_grad()
out = neural_net(inputs)
out = pca.fit_transform(out, determinist=False)
loss = loss_fn(out, targets)
loss.backward()
Comparison of execution time with sklearn's PCA
As we can see below the PyTorch PCA is faster than sklearn's PCA, in all the configs tested with the parameter by default (for each PCA model):
Implemented features
-
fit
,transform
,fit_transform
methods. - All attributes from sklean's PCA are available:
explained_variance_(ratio_)
,singular_values_
,components_
,mean_
,noise_variance_
, ... - Full SVD solver
- SVD by covariance matrix solver
- Randomized SVD solver
- (absent from sklearn) Decide how to center the input data in
transform
method (default is like sklearn's PCA) - Find number of components with explained variance proportion
- Automatically find number of components with MLE
-
inverse_transform
method - Whitening option
-
get_covariance
method -
get_precision
method andscore
/score_samples
methods
To be implemented
- Support sparse matrices with ARPACK solver
Contributing
Feel free to contribute to this project! Just fork it and make an issue or a pull request.
See the CONTRIBUTING.md file for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torch_pca-1.0.0.tar.gz
.
File metadata
- Download URL: torch_pca-1.0.0.tar.gz
- Upload date:
- Size: 91.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 138fe51bed8935cbb9a9042915558f1e98779b78c38dddfe8f9d95a8cd9cf464 |
|
MD5 | c0114c9a9df5abec6df89337938d3b5f |
|
BLAKE2b-256 | 173b1d7555229d4ff6aa136fc88f1de55fbcb682cc430f39dee931f34e382020 |
File details
Details for the file torch_pca-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: torch_pca-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e49f7dc031528749c4564682c38e919ec8286565073059699aeda420838c0186 |
|
MD5 | b4c8d67c7a45b7d00ebb67d928475c21 |
|
BLAKE2b-256 | 272bf55212458f8b0349683beb0c6c6284ef4548771148a3fd03f520817a3888 |