Skip to main content

pca: A Python Package for Principal Component Analysis.

Project description

Python Pypi Docs LOC Downloads Downloads License Github Forks Open Issues Project Status DOI Medium Colab GitHub repo size Donate

pca A Python Package for Principal Component Analysis. The core of PCA is build on sklearn functionality to find maximum compatibility when combining with other packages. But this package can do a lot more. Besides the regular pca, it can also perform SparsePCA, and TruncatedSVD. Depending on your input data, the best approach will be choosen.

Other functionalities of PCA are:

  • Biplot to plot the loadings
  • Determine the explained variance
  • Extract the best performing features
  • Scatter plot with the loadings
  • Outlier detection using Hotelling T2 and/or SPE/Dmodx

⭐️ Star this repo if you like it ⭐️

Install pca from PyPI

pip install pca

Import pca package

from pca import pca

Documentation pages

On the documentation pages you can find detailed information about the working of the pca with many examples.

Examples

Normalizing out the 1st and more components from the data. This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc.

Make the biplot. It can be nicely seen that the first feature with most variance (f1), is almost horizontal in the plot, whereas the second most variance (f2) is almost vertical. This is expected because most of the variance is in f1, followed by f2 etc.

Explained variance

Biplot in 2d and 3d. Here we see the nice addition of the expected f3 in the plot in the z-direction.

biplot

biplot3d

To detect any outliers across the multi-dimensional space of PCA, the hotellings T2 test is incorporated. This basically means that we compute the chi-square tests across the top n_components (default is PC1 to PC5). It is expected that the highest variance (and thus the outliers) will be seen in the first few components because of the nature of PCA. Going deeper into PC space may therefore not required but the depth is optional. This approach results in a P-value matrix (samples x PCs) for which the P-values per sample are then combined using fishers method. This approach allows to determine outliers and the ranking of the outliers (strongest tot weak). The alpha parameter determines the detection of outliers (default: 0.05).


Citation

Please cite in your publications if this is useful for your research (see citation).

Maintainers

Contribute

  • All kinds of contributions are welcome!
  • If you wish to buy me a Coffee for this work, it is very appreciated :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pca-1.9.0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pca-1.9.0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file pca-1.9.0.tar.gz.

File metadata

  • Download URL: pca-1.9.0.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for pca-1.9.0.tar.gz
Algorithm Hash digest
SHA256 763793d9e96f54738f3f719876b90202534a657a4b22e3ee3734a4d90b3e5a36
MD5 8b81eabc6bf1bdb2b94638a57c35f096
BLAKE2b-256 8c745924acc3c8ef9087cdd0bc7fb152750b29f0137a4f0cd3c4adba092b67db

See more details on using hashes here.

File details

Details for the file pca-1.9.0-py3-none-any.whl.

File metadata

  • Download URL: pca-1.9.0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for pca-1.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f54bd63a626a7d9713cccf17a503d9c8e60a6c53bf281523bbcf67caee33b8f
MD5 e57b2d83d7e3d99c4519a9068df453e8
BLAKE2b-256 bc6832c000aea5f2311b901733153b192e4b1adb4a2585715fa775458e82af65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page