prince

Statistical factor analysis in Python

These details have not been verified by PyPI

Project links

Homepage

Project description

<div align="center">
<img src="images/logo.png" alt="prince_logo"/>
</div>

<br/>

<div align="center">

<a href="https://pypi.python.org/pypi/prince">
<img src="https://img.shields.io/badge/python-3.x-blue.svg?style=flat-square" alt="PyPI version"/>
</a>

<a href="https://pypi.org/project/prince/">
<img src="https://badge.fury.io/py/prince.svg" alt="PyPI"/>
</a>

<a href="https://travis-ci.org/MaxHalford/Prince?branch=master">
<img src="https://img.shields.io/travis/MaxHalford/Prince/master.svg?style=flat-square" alt="Build Status"/>
</a>

<a href="https://coveralls.io/github/MaxHalford/Prince?branch=master">
<img src="https://coveralls.io/repos/github/MaxHalford/Prince/badge.svg?branch=master&style=flat-square" alt="Coverage Status"/>
</a>

<a href="https://opensource.org/licenses/MIT">
<img src="http://img.shields.io/:license-mit-ff69b4.svg?style=flat-square" alt="license"/>
</a>
</div>

<br/>

## Introduction

Prince is a library for doing [factor analysis](https://www.wikiwand.com/en/Factor_analysis). This includes a variety of methods including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [correspondance analysis (CA)](https://www.wikiwand.com/en/Correspondence_analysis). The goal is to provide an efficient implementation for each algorithm along with a nice API.

## Installation

:warning: Prince is only compatible with Python 3.

:snake: Although it isn't a requirement, using [Anaconda](https://www.continuum.io/downloads) is highly recommended.

**Via PyPI**

```sh
>>> pip install prince # doctest: +SKIP
```

**Via GitHub for the latest development version**

```sh
>>> pip install git+https://github.com/MaxHalford/Prince # doctest: +SKIP
```

Prince doesn't have any extra dependencies apart from the usual suspects (`sklearn`, `pandas`, `matplotlib`) which are included with Anaconda.

## Usage

### Guidelines

Under the hood Prince uses a [randomised version of SVD](https://research.fb.com/fast-randomized-svd/). This is extremely faster than using the classical approach. However the results may have a small inherent randomness. For most applications this doesn't matter and you shouldn't have to worry about it. However if you want reproducible results then you should set your random number generator's seed:

```python
>>> import numpy as np
>>> np.random.seed(42)

```

The randomised version of SVD is an iterative method. Because each of Prince's algorithms use SVD, they all possess a `n_iter` parameter which controls the number of iterations used for computing the SVD. On the one hand the higher `n_iter` is the more precise the results will be. On the other hand increasing `n_iter` increases the computation time. In general the algorithm converges very quickly so using a low `n_iter` (which is the default behaviour) is recommended.

The following papers give a good overview of the field of factor analysis if you want to go deeper:

- [A Tutorial on Principal Component Analysis](https://arxiv.org/pdf/1404.1100.pdf)
- [Theory of Correspondence Analysis](http://statmath.wu.ac.at/courses/CAandRelMeth/caipA.pdf)
- [Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions](https://arxiv.org/pdf/0909.4061.pdf)
- [Computation of Multiple Correspondence Analysis, with code in R](https://core.ac.uk/download/pdf/6591520.pdf)
- [Singular Value Decomposition Tutorial](https://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf)

### Principal component analysis (PCA)

If you're using PCA it is assumed you have a dataframe consisting of numerical variables. In this example we're going to be using the [Iris flower dataset](https://www.wikiwand.com/en/Iris_flower_data_set).

```python
>>> import pandas as pd
>>> import prince
>>> from sklearn import datasets

>>> X, y = datasets.load_iris(return_X_y=True)
>>> X = pd.DataFrame(data=X, columns=['Sepal length', 'Sepal width', 'Petal length', 'Sepal length'])
>>> y = pd.Series(y).map({0: 'Setosa', 1: 'Versicolor', 2: 'Virginica'})
>>> X.head()
Sepal length Sepal width Petal length Sepal length
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

```

The `prince.PCA` supports scikit-learn's `fit`/`transform` API. It's parameters have to passed at initialisation before calling the `fit` method.

```python
>>> pca = prince.PCA(
... n_components=2,
... n_iter=3,
... rescale_with_mean=True,
... rescale_with_std=True,
... copy=True,
... engine='auto'
... )
>>> pca = pca.fit(X)

```

The available parameters are:

- `n_components`: the number of components that are computed. You only need two if your intention is to make a chart.
- `n_iter`: the number of iterations used for computing the SVD
- `rescale_with_mean`: whether to substract each column's mean
- `rescale_with_stds`: whether to divide each column by it's standard deviation
- `copy`: if `False` then the computations will be done inplace which can have possible side-effects on the input data
- `engine`: what SVD engine to use (should be one of `['auto', 'fbpca', 'sklearn']`)

Once the `PCA` has been fitted, it can be used to extract the row principal coordinates as so:

```python
>>> pca.transform(X).head() # Same as pca.row_principal_coordinates(X).head()
0 1
0 -2.264542 0.505704
1 -2.086426 -0.655405
2 -2.367950 -0.318477
3 -2.304197 -0.575368
4 -2.388777 0.674767

```

Each column stands for a principal component whilst each row stands a row in the original dataset. You can display these projections with the `plot_row_principal_coordinates` method:

```python
>>> ax = pca.plot_row_principal_coordinates(
... X,
... ax=None,
... figsize=(7, 7),
... x_component=0,
... y_component=1,
... labels=None,
... group_labels=y,
... ellipse_outline=False,
... ellipse_fill=True,
... show_points=True
... )
>>> ax.get_figure().savefig('images/row_principal_coordinates.png')

```

<div align="center">
<img src="images/row_principal_coordinates.png" />
</div>

### Correspondance analysis (CA)

### Multiple correspondance analysis (CA)

## Going faster

By default `prince` uses `sklearn`'s SVD implementation (the one used under the hood for [`TruncatedSVD`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html)). One of the goals of Prince is to make it possible to use a different SVD backend. For the while the only other supported backend is [Facebook's randomized SVD implementation](https://research.facebook.com/blog/fast-randomized-svd/) called [fbpca](http://fbpca.readthedocs.org/en/latest/). You can use it by setting the `engine` parameter to `'fbpca'`:

```python
>>> import prince
>>> pca = prince.PCA(engine='fbpca')

```

If you are using Anaconda then you should be able to install `fbpca` without any pain by running `pip install fbpca`.

## Incoming features

I've got a lot on my hands aside from `prince`, so feel free to give me a hand!

- [Factor Analysis of Mixed Data (FAMD)](https://www.wikiwand.com/en/Factor_analysis_of_mixed_data)
- [Generalized Procustean Analysis (GPA)](https://www.wikiwand.com/en/Generalized_Procrustes_analysis)
- [Multiple Factorial Analysis (MFA)](https://www.wikiwand.com/en/Multiple_factor_analysis)

## License

The MIT License (MIT). Please see the [license file](LICENSE) for more information.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.16.6

Feb 27, 2026

0.16.5

Jan 9, 2026

0.16.3

Dec 21, 2025

0.16.2

Nov 1, 2025

0.16.1

Aug 4, 2025

0.16.0

Mar 9, 2025

0.15.0

Jan 4, 2025

0.14.0

Nov 17, 2024

0.13.1

Sep 7, 2024

0.13.0

Oct 11, 2023

0.12.1

Sep 12, 2023

0.12.0

Aug 8, 2023

0.11.0

Jul 29, 2023

0.10.8

Jun 27, 2023

0.10.7

Jun 14, 2023

0.10.6

Jun 10, 2023

0.10.5

May 31, 2023

0.10.4

May 2, 2023

0.10.3

Apr 18, 2023

0.10.2

Apr 18, 2023

0.10.1

Apr 17, 2023

0.10.0

Apr 7, 2023

0.9.0

Mar 18, 2023

0.8.3

Mar 11, 2023

0.8.2

Mar 10, 2023

0.8.1

Mar 1, 2023

0.8.0

Feb 27, 2023

0.7.1

Oct 6, 2020

0.7.0

Mar 31, 2020

0.6.3

Jul 2, 2019

0.6.2

Mar 14, 2019

0.6.1

Feb 14, 2019

0.6.0

Feb 2, 2019

0.5.2

Dec 9, 2018

0.4.10

Nov 6, 2018

0.4.9

Oct 23, 2018

0.4.8

Oct 11, 2018

0.4.7

Sep 24, 2018

0.4.6

Aug 9, 2018

0.4.5

Aug 9, 2018

0.4.4

Aug 7, 2018

0.4.3

Aug 6, 2018

0.4.2

Aug 6, 2018

0.4.1

May 22, 2018

0.4.0

May 19, 2018

0.3.8

May 17, 2018

0.3.7

May 15, 2018

0.3.6

May 8, 2018

0.3.5

May 3, 2018

0.3.4

May 3, 2018

0.3.3

May 3, 2018

0.3.2

May 1, 2018

0.3.1

Apr 27, 2018

This version

0.3.0

Apr 25, 2018

0.2.6

Mar 20, 2017

0.2.5

Mar 5, 2017

0.2.4

Jan 17, 2017

0.2.3

Jan 10, 2017

0.2.2

Jan 10, 2017

0.2.1

Nov 23, 2016

0.2.0

Nov 22, 2016

0.1.3

Nov 14, 2016

0.1.2

Nov 11, 2016

0.1.1

Nov 11, 2016

0.1.0

Nov 11, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prince-0.3.0.tar.gz (14.1 kB view details)

Uploaded Apr 25, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prince-0.3.0-py2.py3-none-any.whl (15.6 kB view details)

Uploaded Apr 25, 2018 Python 2Python 3

File details

Details for the file prince-0.3.0.tar.gz.

File metadata

Download URL: prince-0.3.0.tar.gz
Upload date: Apr 25, 2018
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for prince-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`9bc98eb8827723b221bb8f8114c7db6fd3035e26e190724bd09eedfcb4425562`
MD5	`6cdcba9df8f4434384e0dbb1f79162c1`
BLAKE2b-256	`4f993c6b43ba2898bdbcefa2ddb1360308fda6021b9a09009d7d3507b9ece425`

See more details on using hashes here.

File details

Details for the file prince-0.3.0-py2.py3-none-any.whl.

File metadata

Download URL: prince-0.3.0-py2.py3-none-any.whl
Upload date: Apr 25, 2018
Size: 15.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for prince-0.3.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f46b9d7dae01e9d879fdd3e3f6d6925045cbbfc0e6f08a91f4fd09383e7ccec`
MD5	`715467daf8fc844ce17ba06346ee78a4`
BLAKE2b-256	`156557391a089ded6b326302d8dfbdac5eebd75a014f99a5511b2d2942491bac`

See more details on using hashes here.

prince 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes