pca is a python package that performs the principal component analysis and to make insightful plots.
Project description
pca
- pca is a python package that performs the principal component analysis and creates insightful plots.
- Biplot to plot the loadings
- Explained variance
- Scatter plot with the loadings
Method overview
# fit
model=pca.fit(X)
# biplot
ax=pca.biplot(model)
ax=pca.biplot3d(model)
# plot explained variance
ax = pca.plot(model)
# Normalize out components from your dataset
Xnorm=pca.norm(X)
Contents
Installation
- Install pca from PyPI (recommended). pca is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- It is distributed under the MIT license.
Requirements
- Creation of a new environment is not necessarily.
conda create -n env_pca python=3.6
conda activate env_pca
pip install numpy matplotlib sklearn
Quick Start
pip install pca
- Alternatively, install pca from the GitHub source:
git clone https://github.com/erdogant/pca.git
cd pca
python setup.py install
Import pca package
import pca as pca
Load example data
import numpy as np
from sklearn.datasets import load_iris
X = load_iris().data
label=iris.feature_names
labx=iris.target
X looks like this:
X=array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
...
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
labx=[0, 0, 0, 0,...,2, 2, 2, 2, 2]
label=['label1','label2','label3','label4']
PCA reduce dimensions and plot explained variance
# Fit
model = pca.fit(X)
# Plot the explained variance. The total of captured variance is 1 and PC1 captures more then 90% of it.
ax = pca.plot(model)
# Biplot in 2D with shows the directions of features and weights of influence
ax = pca.biplot(model)
# Biplot in 3D
ax = pca.biplot3d(model)
Reduce dimensions as above but now plot with labx and label names
model = pca.fit(X, labx=labx, feat=feat)
ax = pca.biplot(model)
ax = pca.biplot3d(model)
Reduce dimensions to the number of components that capture 95% of the explained variance
# Fit model and determine the number of required components that captures 95% of the explained variance.
model = pca.fit(X, n_components=0.95)
# Plot the explained variance. The required number of components is 2 to capture 95% of the variance.
ax = pca.plot(model)
Reduce dimensions to exactly 2d and 3d
# Set components=2 to reduce to 2d
model = pca.fit(X, n_components=2)
# Set components=3 to reduce to 3d
model = pca.fit(X, n_components=3)
PCA normalization.
# Normalizing out the 1st and more components from the data.
# This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc.
print(X.shape)
(150, 4)
# Normalize out 1st component and return data
Xnorm = pca.norm(X, pcexclude=[1])
print(Xnorm.shape)
(150, 4)
# In this case, PC1 is "removed" and the PC2 has become PC1 etc
ax = pca.biplot(model)
Citation
Please cite pca in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019pca,
title={pca},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/pca}},
}
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
Licence
See LICENSE for details.
TODO
- Add feature importance in the output.
Donation
This package is created and maintained in my free time. If this package is usefull, feel free to use more of my packages. Sponser here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pca-0.1.5.tar.gz
(8.5 kB
view hashes)
Built Distribution
pca-0.1.5-py3-none-any.whl
(9.2 kB
view hashes)