Skip to main content

pca is a python package that performs the principal component analysis and to make insightful plots.

Project description

pca

Python PyPI Version License Downloads Donate

  • pca is a python package that performs the principal component analysis and creates insightful plots.
  • Biplot to plot the loadings
  • Explained variance
  • Scatter plot with the loadings

Method overview

# fit
model=pca.fit(X)
# biplot
ax=pca.biplot(model)
ax=pca.biplot3d(model)
# plot explained variance
ax = pca.plot(model)
# Normalize out components from your dataset
Xnorm=pca.norm(X)

Contents

Installation

  • Install pca from PyPI (recommended). pca is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • It is distributed under the MIT license.

Requirements

  • Creation of a new environment is not necessarily.
conda create -n env_pca python=3.6
conda activate env_pca
pip install numpy matplotlib sklearn

Quick Start

pip install pca
  • Alternatively, install pca from the GitHub source:
git clone https://github.com/erdogant/pca.git
cd pca
python setup.py install

Import pca package

import pca as pca

Load example data

import numpy as np
from sklearn.datasets import load_iris
X = load_iris().data
label=iris.feature_names
labx=iris.target

X looks like this:

X=array([[5.1, 3.5, 1.4, 0.2],
         [4.9, 3. , 1.4, 0.2],
         [4.7, 3.2, 1.3, 0.2],
         [4.6, 3.1, 1.5, 0.2],
         ...
         [5. , 3.6, 1.4, 0.2],
         [5.4, 3.9, 1.7, 0.4],
         [4.6, 3.4, 1.4, 0.3],
         [5. , 3.4, 1.5, 0.2],

labx=[0, 0, 0, 0,...,2, 2, 2, 2, 2]
label=['label1','label2','label3','label4']

PCA reduce dimensions and plot explained variance

# Fit
model = pca.fit(X)
# Plot the explained variance. The total of captured variance is 1 and PC1 captures more then 90% of it.
ax = pca.plot(model)
# Biplot in 2D with shows the directions of features and weights of influence
ax  = pca.biplot(model)
# Biplot in 3D
ax  = pca.biplot3d(model)

Reduce dimensions as above but now plot with labx and label names

model = pca.fit(X, labx=labx, feat=feat)
ax  = pca.biplot(model)
ax  = pca.biplot3d(model)

Reduce dimensions to the number of components that capture 95% of the explained variance

# Fit model and determine the number of required components that captures 95% of the explained variance.
model = pca.fit(X, n_components=0.95)
# Plot the explained variance. The required number of components is 2 to capture 95% of the variance.
ax = pca.plot(model)

Reduce dimensions to exactly 2d and 3d

# Set components=2 to reduce to 2d
model = pca.fit(X, n_components=2)
# Set components=3 to reduce to 3d
model = pca.fit(X, n_components=3)

PCA normalization.

# Normalizing out the 1st and more components from the data. 
# This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc. 

print(X.shape)
(150, 4)

# Normalize out 1st component and return data
Xnorm = pca.norm(X, pcexclude=[1])

print(Xnorm.shape)
(150, 4)

# In this case, PC1 is "removed" and the PC2 has become PC1 etc
ax = pca.biplot(model)

Citation

Please cite pca in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019pca,
  title={pca},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/pca}},
}

Maintainers

Contribute

  • Contributions are welcome.

Licence

See LICENSE for details.

TODO

  • Add feature importance in the output.

Donation

  • This work is created and maintained in my free time. If you wish to buy me a Coffee for this work, it is very appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pca-0.1.7.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pca-0.1.7-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file pca-0.1.7.tar.gz.

File metadata

  • Download URL: pca-0.1.7.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for pca-0.1.7.tar.gz
Algorithm Hash digest
SHA256 d24330f5ec83a875a663aa697f4d973192fe8d7d111c6380e65b9a6b47a5d2fe
MD5 0593dd42d7694dd26ccc28927c2f0229
BLAKE2b-256 860c99dec98cb4fcb7b397de0631af58935fac4ae282ad0e81cbc675b79b9d23

See more details on using hashes here.

File details

Details for the file pca-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: pca-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for pca-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 1be241c8b5a6db99fb017c053bb0578d5941adb6b035a9f5fd4e1def961dcb82
MD5 67ff82ba2af3b55b361403ab0393f998
BLAKE2b-256 1757c2be3856e97ff472d2a49fa1ec67d0653034667b2527544d1123ea495797

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page