Skip to main content

Factor analysis for in-memory datasets in Python

Project description

<div align="center">
<img src="docs/_static/logo.png" alt="prince_logo"/>
</div>

<br/>

<div align="center">
<a href='http://prince.readthedocs.io/en/latest/?badge=latest'>
<img src='https://readthedocs.org/projects/prince/badge/?version=latest' alt='Documentation Status' />
</a>
<a href="https://badge.fury.io/py/prince">
<img src="https://badge.fury.io/py/prince.svg?style=flat-square" alt="PyPI version"/>
</a>
<a href="https://travis-ci.org/MaxHalford/Prince?branch=master">
<img src="https://travis-ci.org/MaxHalford/Prince.svg?branch=master&style=flat-square" alt="Build Status"/>
</a>
<a href="https://coveralls.io/github/MaxHalford/Prince?branch=master">
<img src="https://coveralls.io/repos/github/MaxHalford/Prince/badge.svg?branch=master&style=flat-square" alt="Coverage Status"/>
</a>
<a href="https://www.codacy.com/app/maxhalford25/Prince?utm_source=github.com&utm_medium=referral&utm_content=MaxHalford/Prince&utm_campaign=Badge_Grade">
<img src="https://api.codacy.com/project/badge/Grade/d58aa963d6fa425b97d2b9364aecbba1?style=flat-square" alt="Codacy Badge"/>
</a>
<a href="https://requires.io/github/MaxHalford/Prince/requirements/?branch=master">
<img src="https://requires.io/github/MaxHalford/Prince/requirements.svg?branch=master&style=flat-square" alt="Requirements Status"/>
</a>
</div>

<br/>

<br/>
<div align="center">Prince is a factor analysis library for datasets that fit in memory.</div>
<br/>


## Quick start

Prince uses [pandas](http://pandas.pydata.org/) to manipulate dataframes, as such it expects an initial dataframe to work with. In the following example, a [Principal Component Analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) is applied to the iris dataset. Under the hood Prince decomposes the dataframe into two eigenvector matrices and one eigenvalue array thanks to a [Singular Value Decomposition (SVD)](https://www.wikiwand.com/en/Singular_value_decomposition). The eigenvectors can then be used to project the initial dataset onto lower dimensions.

```python
import matplotlib.pyplot as plt
import pandas as pd

import prince


df = pd.read_csv('data/iris.csv')

pca = prince.PCA(df, n_components=4)

fig1, ax1 = pca.plot_cumulative_inertia()
fig2, ax2 = pca.plot_rows(color_by='class', ellipse_fill=True)

plt.show()
```

The first plot displays the rows in the initial dataset projected on to the two first right eigenvectors (the obtained projections are called principal coordinates). The ellipses are 90% confidence intervals.

![row_principal coordinates](docs/_static/pca_row_principal_coordinates.png)

The second plot displays the cumulative contributions of each eigenvector (by looking at the corresponding eigenvalues). In this case the total contribution is above 95% while only considering the two first eigenvectors.

![cumulative_inertia](docs/_static/pca_cumulative_inertia.png)


## Installation

Although it isn't a requirement, using [Anaconda](https://www.continuum.io/downloads) is a good idea in general for doing data science in Python.

**Via PyPI**

```sh
$ pip install prince
```

**Via GitHub for the latest development version**

```sh
$ pip install git+https://github.com/MaxHalford/Prince
```

Prince has the following dependencies:

- [pandas](http://pandas.pydata.org/) for manipulating dataframes
- [matplotlib](http://matplotlib.org/) as a default plotting backend
- [fbpca](http://fbpca.readthedocs.org/en/latest/), [Facebook's randomized SVD implementation](https://research.facebook.com/blog/fast-randomized-svd/)


## Documentation

Please check out the [documentation](http://prince.readthedocs.io) for a list of available methods.


## License

<a href="https://opensource.org/licenses/MIT">
<img src="http://img.shields.io/:license-mit-ff69b4.svg?style=flat-square" alt="mit"/>
</a>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prince-0.2.0.tar.gz (15.1 kB view details)

Uploaded Source

File details

Details for the file prince-0.2.0.tar.gz.

File metadata

  • Download URL: prince-0.2.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for prince-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bb8d60140e80ec3f33fdd6cf1a1756b9d6223daaeca92d6fb0cb14d6b46d45b2
MD5 e9381b6847b47fadfed64fc9d988194a
BLAKE2b-256 9d08a01d11893ef8631f4a9acfeffc2dff3f6e44b09092229c8165398319e233

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page