pca is a python package that performs the principal component analysis and to make insightful plots.
Project description
pca
- pca is a python package that performs the principal component analysis and creates insightful plots.
- Biplot to plot the loadings
- Explained variance
- Scatter plot with the loadings
Method overview
# fit
model=pca.fit(X)
# biplot
ax=pca.biplot(model)
ax=pca.biplot3d(model)
# plot explained variance
ax = pca.plot(model)
# Normalize out components from your dataset
Xnorm=pca.norm(X)
Contents
Installation
- Install pca from PyPI (recommended). pca is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- It is distributed under the MIT license.
Requirements
- Creation of a new environment is not necessarily.
conda create -n env_pca python=3.6
conda activate env_pca
pip install numpy matplotlib sklearn
Quick Start
pip install pca
- Alternatively, install pca from the GitHub source:
git clone https://github.com/erdogant/pca.git
cd pca
python setup.py install
Import pca package
import pca as pca
Load example data
import numpy as np
from sklearn.datasets import load_iris
X = load_iris().data
label=iris.feature_names
labx=iris.target
X looks like this:
X=array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
...
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
labx=[0, 0, 0, 0,...,2, 2, 2, 2, 2]
label=['label1','label2','label3','label4']
PCA reduce dimensions and plot explained variance
# Fit
model = pca.fit(X)
# Plot the explained variance. The total of captured variance is 1 and PC1 captures more then 90% of it.
ax = pca.plot(model)
# Biplot in 2D with shows the directions of features and weights of influence
ax = pca.biplot(model)
# Biplot in 3D
ax = pca.biplot3d(model)
Reduce dimensions as above but now plot with labx and label names
model = pca.fit(X, labx=labx, feat=feat)
ax = pca.biplot(model)
ax = pca.biplot3d(model)
Reduce dimensions to the number of components that capture 95% of the explained variance
# Fit model and determine the number of required components that captures 95% of the explained variance.
model = pca.fit(X, components=0.95)
# Plot the explained variance. The required number of components is 2 to capture 95% of the variance.
ax = pca.plot(model)
Reduce dimensions to exactly 2d and 3d
# Set components=2 to reduce to 2d
model = pca.fit(X, components=2)
# Set components=3 to reduce to 3d
model = pca.fit(X, components=3)
PCA normalization.
# Normalizing out the 1st and more components from the data.
# This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc.
print(X.shape)
(150, 4)
# Normalize out 1st component and return data
Xnorm = pca.norm(X, pcexclude=[1])
print(Xnorm.shape)
(150, 4)
# In this case, PC1 is "removed" and the PC2 has become PC1 etc
ax = pca.biplot(model)
Citation
Please cite pca in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019pca,
title={pca},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/pca}},
}
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
© Copyright
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pca-0.1.3.tar.gz
(7.8 kB
view hashes)
Built Distribution
pca-0.1.3-py3-none-any.whl
(8.5 kB
view hashes)