Skip to main content

pca: A Python Package for Principal Component Analysis.

Project description

Python Pypi Docs LOC Downloads Downloads License Github Forks Open Issues Project Status DOI Medium Colab GitHub repo size Donate

pca A Python Package for Principal Component Analysis. The core of PCA is build on sklearn functionality to find maximum compatibility when combining with other packages. But this package can do a lot more. Besides the regular pca, it can also perform SparsePCA, and TruncatedSVD. Depending on your input data, the best approach will be chosen. ⭐️ Star this repo if you like it ⭐️

Other functionalities of PCA are:

  • Biplot to plot the loadings
  • Determine the explained variance
  • Extract the best performing features
  • Scatter plot with the loadings
  • Outlier detection using Hotelling T2 and/or SPE/Dmodx

Support

Your ❤️ is important to keep maintaining my packages. You can support in various ways, have a look at the sponser page. Report bugs, issues or help out with developing new features! If you don't have the time to help or are still learning, you can also take a Medium Mebership using my referral link to keep reading all my hands-on blogs. If you also don't need that, there is always the coffee! Thank you! Buy Me a Coffee at ko-fi.com

Read the Medium blog for more details.

1. What are PCA loadings and how to effectively use Biplots?

2. Outlier Detection Using Principal Component Analysis and Hotelling’s T2 and SPE/DmodX Methods

3. Quantitative comparisons between t-SNE, UMAP, PCA, and Other Mappings.

Documentation pages

On the documentation pages you can find detailed information about the working of the pca with many examples.

Installation

pip install pca
Import pca package
from pca import pca

Quick start Make biplot

Plot Explained variance 3D plots

Normalizing out the 1st and more components from the data. This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc.

Make the biplot. It can be nicely seen that the first feature with most variance (f1), is almost horizontal in the plot, whereas the second most variance (f2) is almost vertical. This is expected because most of the variance is in f1, followed by f2 etc.

Explained variance

Biplot in 2d and 3d. Here we see the nice addition of the expected f3 in the plot in the z-direction.

biplot

biplot3d

To detect any outliers across the multi-dimensional space of PCA, the hotellings T2 test is incorporated. This basically means that we compute the chi-square tests across the top n_components (default is PC1 to PC5). It is expected that the highest variance (and thus the outliers) will be seen in the first few components because of the nature of PCA. Going deeper into PC space may therefore not required but the depth is optional. This approach results in a P-value matrix (samples x PCs) for which the P-values per sample are then combined using fishers method. This approach allows to determine outliers and the ranking of the outliers (strongest tot weak). The alpha parameter determines the detection of outliers (default: 0.05).


Citation

Please cite in your publications if this is useful for your research (see citation).

Maintainers


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pca-2.0.7.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

pca-2.0.7-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file pca-2.0.7.tar.gz.

File metadata

  • Download URL: pca-2.0.7.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pca-2.0.7.tar.gz
Algorithm Hash digest
SHA256 7c1c772c8f171e3fc3f51322383af60983ca46eccccf5d4311d953716aaaa56f
MD5 1f858c10b0e504e61497602bca380ef7
BLAKE2b-256 e24125ba4b3906162a2942d11146d45cc2447cbc8b527e12be974fd8810dea4e

See more details on using hashes here.

File details

Details for the file pca-2.0.7-py3-none-any.whl.

File metadata

  • Download URL: pca-2.0.7-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pca-2.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a50b97b7dd3a37e063212c291cb3a802ff5259295edb8ad847d5ebb6969df469
MD5 99e4c79c0bf1954c0f11ae045ae9b46d
BLAKE2b-256 586321e880ee3cee3b6f46b7d595b9e3fe2033440ff140f4aad580efd87a34b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page