Skip to main content

PyTorch implementation of PCA (similar to sklearn PCA).

Project description

Pytorch Principal Component Analysis (PCA)

Principal Component Anlaysis (PCA) in PyTorch. The intention is to provide a simple and easy to use implementation of PCA in PyTorch, the most similar to the sklearn's PCA as possible (in terms of API and, of course, output). Plus, this implementation is fully differentiable and faster (thanks to GPU parallelization)!

Release PythonVersion PytorchVersion

GitHub User followers GitHub User's User stars

Ruff_logo Black_logo

Ruff Flake8 MyPy PyLint

Tests Coverage Documentation Status

Links

Github repository: https://github.com/valentingol/torch_pca

Pypi project: https://pypi.org/project/torch_pca/

Documentation: https://torch-pca.readthedocs.io/en/latest/

Installation

Simply install it with pip:

pip install torch-pca

How to use

Exactly like sklearn.decomposition.PCA but it uses PyTorch tensors as input and output!

from torch_pca import PCA

# Create like sklearn.decomposition.PCA, e.g.:
pca_model = PCA(n_components=None, svd_solver='full')

# Use like sklearn.decomposition.PCA, e.g.:
>>> new_train_data = pca_model.fit_transform(train_data)
>>> new_test_data = pca_model.transform(test_data)
>>> print(pca.explained_variance_ratio_)
[0.756, 0.142, 0.062, ...]

More details and features in the API documentation.

Gradient backward pass

Use the pytorch framework allows the automatic differentiation of the PCA!

The PCA transform method is always differentiable so it is always possible to compute gradient like that:

pca = PCA()
for ep in range(n_epochs):
    optimizer.zero_grad()
    out = neural_net(inputs)
    with torch.no_grad():
        pca.fit(out)
    out = pca.transform(out)
    loss = loss_fn(out, targets)
    loss.backward()

If you want to compute the gradient over the full PCA model (including the fitted pca.n_components), you can do it by using the "full" SVD solver and removing the part of the fit method that enforce the deterministic output by passing determinist=False in fit or fit_transform method. This part sort the components using the singular values and change their sign accordingly so it is not differentiable by nature but may be not necessary if you don't care about the determinism of the output:

pca = PCA(svd_solver="full")
for ep in range(n_epochs):
    optimizer.zero_grad()
    out = neural_net(inputs)
    out = pca.fit_transform(out, determinist=False)
    loss = loss_fn(out, targets)
    loss.backward()

Comparison of execution time with sklearn's PCA

As we can see below the PyTorch PCA is faster than sklearn's PCA, in all the configs tested with the parameter by default (for each PCA model):

include

Implemented features

  • fit, transform, fit_transform methods.
  • All attributes from sklean's PCA are available: explained_variance_(ratio_), singular_values_, components_, mean_, noise_variance_, ...
  • Full SVD solver
  • SVD by covariance matrix solver
  • Randomized SVD solver
  • (absent from sklearn) Decide how to center the input data in transform method (default is like sklearn's PCA)
  • Find number of components with explained variance proportion
  • Automatically find number of components with MLE
  • inverse_transform method
  • Whitening option
  • get_covariance method
  • get_precision method and score/score_samples methods

To be implemented

  • Support sparse matrices with ARPACK solver

Contributing

Feel free to contribute to this project! Just fork it and make an issue or a pull request.

See the CONTRIBUTING.md file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_pca-1.1.0.tar.gz (92.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torch_pca-1.1.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file torch_pca-1.1.0.tar.gz.

File metadata

  • Download URL: torch_pca-1.1.0.tar.gz
  • Upload date:
  • Size: 92.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for torch_pca-1.1.0.tar.gz
Algorithm Hash digest
SHA256 096963335e32089e7262755b5344bc7b0cd02cd4daa9445b62b662299229be63
MD5 4d91c6c32f673ca6317f9da6a88aa2ca
BLAKE2b-256 e5c3b930921e991842f20fd106068e45c39bc5f43529a166cc4f42612ca4b34f

See more details on using hashes here.

File details

Details for the file torch_pca-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: torch_pca-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for torch_pca-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c20ea3111a64f93b3beef914afdeb640e12e99834f9dc276fa829479d6b236af
MD5 66edc695e9f6b8f0cdc54f1712359fc8
BLAKE2b-256 f2169753276886dfb5e4ad7f379fd63fa44518bd139f012b5916962b8b766444

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page