Skip to main content

pca is a python package that performs the principal component analysis and to make insightful plots.

Project description

pca

Python PyPI Version License Downloads

  • pca is a python package that performs the principal component analysis and creates insightful plots.
  • Biplot to plot the loadings
  • Explained variance
  • Scatter plot with the loadings

Method overview

# fit
model=pca.fit(X)
# biplot
ax=pca.biplot(model)
ax=pca.biplot3d(model)
# plot explained variance
ax = pca.plot(model)
# Normalize out components from your dataset
Xnorm=pca.norm(X)

Contents

Installation

  • Install pca from PyPI (recommended). pca is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • It is distributed under the MIT license.

Requirements

  • Creation of a new environment is not necessarily.
conda create -n env_pca python=3.6
conda activate env_pca
pip install numpy matplotlib sklearn

Quick Start

pip install pca
  • Alternatively, install pca from the GitHub source:
git clone https://github.com/erdogant/pca.git
cd pca
python setup.py install

Import pca package

import pca as pca

Load example data

import numpy as np
from sklearn.datasets import load_iris
X = load_iris().data
label=iris.feature_names
labx=iris.target

X looks like this:

X=array([[5.1, 3.5, 1.4, 0.2],
         [4.9, 3. , 1.4, 0.2],
         [4.7, 3.2, 1.3, 0.2],
         [4.6, 3.1, 1.5, 0.2],
         ...
         [5. , 3.6, 1.4, 0.2],
         [5.4, 3.9, 1.7, 0.4],
         [4.6, 3.4, 1.4, 0.3],
         [5. , 3.4, 1.5, 0.2],

labx=[0, 0, 0, 0,...,2, 2, 2, 2, 2]
label=['label1','label2','label3','label4']

PCA reduce dimensions and plot explained variance

# Fit
model = pca.fit(X)
# Plot the explained variance. The total of captured variance is 1 and PC1 captures more then 90% of it.
ax = pca.plot(model)
# Biplot in 2D with shows the directions of features and weights of influence
ax  = pca.biplot(model)
# Biplot in 3D
ax  = pca.biplot3d(model)

Reduce dimensions as above but now plot with labx and label names

model = pca.fit(X, labx=labx, feat=feat)
ax  = pca.biplot(model)
ax  = pca.biplot3d(model)

Reduce dimensions to the number of components that capture 95% of the explained variance

# Fit model and determine the number of required components that captures 95% of the explained variance.
model = pca.fit(X, components=0.95)
# Plot the explained variance. The required number of components is 2 to capture 95% of the variance.
ax = pca.plot(model)

Reduce dimensions to exactly 2d and 3d

# Set components=2 to reduce to 2d
model = pca.fit(X, components=2)
# Set components=3 to reduce to 3d
model = pca.fit(X, components=3)

PCA normalization.

# Normalizing out the 1st and more components from the data. 
# This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc. 

print(X.shape)
(150, 4)

# Normalize out 1st component and return data
Xnorm = pca.norm(X, pcexclude=[1])

print(Xnorm.shape)
(150, 4)

# In this case, PC1 is "removed" and the PC2 has become PC1 etc
ax = pca.biplot(model)

Citation

Please cite pca in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019pca,
  title={pca},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/pca}},
}

Maintainers

Contribute

  • Contributions are welcome.

© Copyright

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pca, version 0.1.3
Filename, size File type Python version Upload date Hashes
Filename, size pca-0.1.3-py3-none-any.whl (8.5 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size pca-0.1.3.tar.gz (7.8 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page