pca is a python package that performs the principal component analysis and to make insightful plots.
Project description
pca
Star it if you like it!
- pca is a python package that performs the principal component analysis and creates insightful plots.
- Biplot to plot the loadings
- Explained variance
- Scatter plot with the loadings
Contents
Installation
- Install pca from PyPI (recommended). pca is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- It is distributed under the MIT license.
Requirements
- Creation of a new environment is not required but if you wish to do it:
conda create -n env_pca python=3.6
conda activate env_pca
pip install numpy matplotlib sklearn
Installation
pip install pca
- Install the latest version from the GitHub source:
git clone https://github.com/erdogant/pca.git
cd pca
python setup.py install
Import pca package
from pca import pca
Load example data
import numpy as np
from sklearn.datasets import load_iris
# Load dataset
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)
# Load pca
from pca import pca
# Initialize to reduce the data up to the nubmer of componentes that explains 95% of the variance.
model = pca(n_components=0.95)
# Reduce the data towards 3 PCs
model = pca(n_components=3)
# Fit transform
results = model.fit_transform(X)
X looks like this:
X=array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
...
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
labx=[0, 0, 0, 0,...,2, 2, 2, 2, 2]
label=['label1','label2','label3','label4']
Make scatter plot
fig, ax = model.scatter()
Make biplot
fig, ax = model.biplot(n_feat=4)
Make plot
fig, ax = model.plot()
Make 3d plots
fig, ax model.scatter3d()
fig, ax = model.biplot3d(n_feat=2)
PCA normalization.
Normalizing out the 1st and more components from the data. This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc.
print(X.shape)
(150, 4)
# Normalize out 1st component and return data
model = pca()
Xnew = model.norm(X, pcexclude=[1])
print(Xnorm.shape)
(150, 4)
# In this case, PC1 is "removed" and the PC2 has become PC1 etc
ax = pca.biplot(model)
Example to extract the feature importance:
# Import libraries
import numpy as np
import pandas as pd
from pca import pca
# Lets create a dataset with features that have decreasing variance.
# We want to extract feature f1 as most important, followed by f2 etc
f1=np.random.randint(0,100,250)
f2=np.random.randint(0,50,250)
f3=np.random.randint(0,25,250)
f4=np.random.randint(0,10,250)
f5=np.random.randint(0,5,250)
f6=np.random.randint(0,4,250)
f7=np.random.randint(0,3,250)
f8=np.random.randint(0,2,250)
f9=np.random.randint(0,1,250)
# Combine into dataframe
X = np.c_[f1,f2,f3,f4,f5,f6,f7,f8,f9]
X = pd.DataFrame(data=X, columns=['f1','f2','f3','f4','f5','f6','f7','f8','f9'])
# Initialize
model = pca()
# Fit transform
out = model.fit_transform(X)
# Print the top features. The results show that f1 is best, followed by f2 etc
print(out['topfeat'])
# PC feature
# 0 PC1 f1
# 1 PC2 f2
# 2 PC3 f3
# 3 PC4 f4
# 4 PC5 f5
# 5 PC6 f6
# 6 PC7 f7
# 7 PC8 f8
# 8 PC9 f9
Make the plots
model.plot()
Make the biplot. It can be nicely seen that the first feature with most variance (f1), is almost horizontal in the plot, whereas the second most variance (f2) is almost vertical. This is expected because most of the variance is in f1, followed by f2 etc.
ax = model.biplot(n_feat=10, legend=False)
Biplot in 3d. Here we see the nice addition of the expected f3 in the plot in the z-direction.
ax = model.biplot3d(n_feat=10, legend=False)
Citation
Please cite distfit in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019pca,
title={pca},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/pca}},
}
Maintainer
Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
Contributions are welcome.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.