Skip to main content

pca is a python package that performs the principal component analysis and makes insightful plots.

Project description

pca

Python PyPI Version License Github Forks GitHub Open Issues Project Status Downloads Downloads DOI Open In Colab Sphinx

pca is a python package to perform Principal Component Analysis and to create insightful plots. The core of PCA is build on sklearn functionality to find maximum compatibility when combining with other packages. But this package can do a lot more. Besides the regular pca, it can also perform SparsePCA, and TruncatedSVD. Depending on your input data, the best approach will be choosen.

Other functionalities of PCA are:

  • Biplot to plot the loadings
  • Determine the explained variance
  • Extract the best performing features
  • Scatter plot with the loadings
  • Outlier detection using Hotelling T2 and/or SPE/Dmodx

⭐️ Star this repo if you like it ⭐️

Install pca from PyPI

pip install pca

Import pca package

from pca import pca

Documentation pages

On the documentation pages you can find detailed information about the working of the pca with many examples.

Examples

Normalizing out the 1st and more components from the data. This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc.

Make the biplot. It can be nicely seen that the first feature with most variance (f1), is almost horizontal in the plot, whereas the second most variance (f2) is almost vertical. This is expected because most of the variance is in f1, followed by f2 etc.

Explained variance

Biplot in 2d and 3d. Here we see the nice addition of the expected f3 in the plot in the z-direction.

biplot

biplot3d

To detect any outliers across the multi-dimensional space of PCA, the hotellings T2 test is incorporated. This basically means that we compute the chi-square tests across the top n_components (default is PC1 to PC5). It is expected that the highest variance (and thus the outliers) will be seen in the first few components because of the nature of PCA. Going deeper into PC space may therefore not required but the depth is optional. This approach results in a P-value matrix (samples x PCs) for which the P-values per sample are then combined using fishers method. This approach allows to determine outliers and the ranking of the outliers (strongest tot weak). The alpha parameter determines the detection of outliers (default: 0.05).


Citation

Please cite in your publications if this is useful for your research (see citation).

Maintainers

Contribute

  • All kinds of contributions are welcome!
  • If you wish to buy me a Coffee for this work, it is very appreciated :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pca-1.8.2.tar.gz (27.6 kB view hashes)

Uploaded source

Built Distribution

pca-1.8.2-py3-none-any.whl (26.7 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page