Skip to main content

pca is a python package that performs the principal component analysis and to make insightful plots.

Project description

pca

Python PyPI Version License Downloads

  • pca is a python package that performs the principal component analysis and creates insightful plots.
  • Biplot to plot the loadings
  • Explained variance
  • Scatter plot with the loadings

Method overview

# fit
model=pca.fit(X)
# biplot
ax=pca.biplot(model)
ax=pca.biplot3d(model)
# plot explained variance
ax = pca.plot(model)
# Normalize out components from your dataset
Xnorm=pca.norm(X)

Contents

Installation

  • Install pca from PyPI (recommended). pca is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • It is distributed under the MIT license.

Requirements

  • Creation of a new environment is not necessarily.
conda create -n env_pca python=3.6
conda activate env_pca
pip install numpy matplotlib sklearn

Quick Start

pip install pca
  • Alternatively, install pca from the GitHub source:
git clone https://github.com/erdogant/pca.git
cd pca
python setup.py install

Import pca package

import pca as pca

Load example data

import numpy as np
from sklearn.datasets import load_iris
X = load_iris().data
label=iris.feature_names
labx=iris.target

X looks like this:

X=array([[5.1, 3.5, 1.4, 0.2],
         [4.9, 3. , 1.4, 0.2],
         [4.7, 3.2, 1.3, 0.2],
         [4.6, 3.1, 1.5, 0.2],
         ...
         [5. , 3.6, 1.4, 0.2],
         [5.4, 3.9, 1.7, 0.4],
         [4.6, 3.4, 1.4, 0.3],
         [5. , 3.4, 1.5, 0.2],

labx=[0, 0, 0, 0,...,2, 2, 2, 2, 2]
label=['label1','label2','label3','label4']

PCA reduce dimensions and plot explained variance

# Fit
model = pca.fit(X)
# Plot the explained variance. The total of captured variance is 1 and PC1 captures more then 90% of it.
ax = pca.plot(model)
# Biplot in 2D with shows the directions of features and weights of influence
ax  = pca.biplot(model)
# Biplot in 3D
ax  = pca.biplot3d(model)

Reduce dimensions as above but now plot with labx and label names

model = pca.fit(X, labx=labx, feat=feat)
ax  = pca.biplot(model)
ax  = pca.biplot3d(model)

Reduce dimensions to the number of components that capture 95% of the explained variance

# Fit model and determine the number of required components that captures 95% of the explained variance.
model = pca.fit(X, components=0.95)
# Plot the explained variance. The required number of components is 2 to capture 95% of the variance.
ax = pca.plot(model)

Reduce dimensions to exactly 2d and 3d

# Set components=2 to reduce to 2d
model = pca.fit(X, components=2)
# Set components=3 to reduce to 3d
model = pca.fit(X, components=3)

PCA normalization.

# Normalizing out the 1st and more components from the data. 
# This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc. 

print(X.shape)
(150, 4)

# Normalize out 1st component and return data
Xnorm = pca.norm(X, pcexclude=[1])

print(Xnorm.shape)
(150, 4)

# In this case, PC1 is "removed" and the PC2 has become PC1 etc
ax = pca.biplot(model)

Citation

Please cite pca in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019pca,
  title={pca},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/pca}},
}

Maintainers

Contribute

  • Contributions are welcome.

© Copyright

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pca-0.1.2.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distribution

pca-0.1.2-py3-none-any.whl (7.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page