Skip to main content

Factor analysis in Python: PCA, CA, MCA, MFA, FAMD, GPA, PGA

Project description

prince_logo


Prince is a Python library for multivariate exploratory data analysis in Python. It includes a variety of methods for summarizing tabular data, including principal component analysis (PCA) and correspondence analysis (CA). Prince provides efficient implementations, using a scikit-learn API.

I made Prince when I was at university, back in 2016. I spent a significant amount of time in 2022 to revamp the entire package. It is thoroughly tested and supports many features, such as supplementary row/columns, as well as row/column weights.

Example usage

>>> import prince

>>> dataset = prince.datasets.load_decathlon()
>>> decastar = dataset.query('competition == "Decastar"')

>>> pca = prince.PCA(n_components=5)
>>> pca = pca.fit(decastar, supplementary_columns=['rank', 'points'])
>>> pca.eigenvalues_summary
          eigenvalue % of variance % of variance (cumulative)
component
0              3.114        31.14%                     31.14%
1              2.027        20.27%                     51.41%
2              1.390        13.90%                     65.31%
3              1.321        13.21%                     78.52%
4              0.861         8.61%                     87.13%

>>> pca.transform(dataset).tail()
component                       0         1         2         3         4
competition athlete
OlympicG    Lorenzo      2.070933  1.545461 -1.272104 -0.215067 -0.515746
            Karlivans    1.321239  1.318348  0.138303 -0.175566 -1.484658
            Korkizoglou -0.756226 -1.975769  0.701975 -0.642077 -2.621566
            Uldal        1.905276 -0.062984 -0.370408 -0.007944 -2.040579
            Casarsa      2.282575 -2.150282  2.601953  1.196523 -3.571794
>>> chart = pca.plot(dataset)

This chart is interactive, which doesn't show on GitHub. The green points are the column loadings.

>>> chart = pca.plot(
...     dataset,
...     show_row_labels=True,
...     show_row_markers=False,
...     row_labels_column='athlete',
...     color_rows_by='competition'
... )

Installation

pip install prince

🎨 Prince uses Altair for making charts.

Methods

flowchart TD
    cat?(Categorical data?) --> |"✅"| num_too?(Numerical data too?)
    num_too? --> |"✅"| FAMD
    num_too? --> |"❌"| multiple_cat?(More than two columns?)
    multiple_cat? --> |"✅"| MCA
    multiple_cat? --> |"❌"| CA
    cat? --> |"❌"| groups?(Groups of columns?)
    groups? --> |"✅"| MFA
    groups? --> |"❌"| shapes?(Analysing shapes?)
    shapes? --> |"✅"| GPA
    shapes? --> |"❌"| manifold?(Data on a manifold?)
    manifold? --> |"✅"| PGA
    manifold? --> |"❌"| PCA

Principal component analysis (PCA)

Correspondence analysis (CA)

Multiple correspondence analysis (MCA)

Multiple factor analysis (MFA)

Factor analysis of mixed data (FAMD)

Generalized procrustes analysis (GPA)

Principal geodesic analysis (PGA)

Correctness

Prince is tested against scikit-learn and FactoMineR. For the latter, rpy2 is used to run code in R, and convert the results to Python, which allows running automated tests. PGA is tested against geomstats. See more in the tests directory.

Citation

Please use this citation if you use this software as part of a scientific publication.

@software{Halford_Prince,
    author = {Halford, Max},
    license = {MIT},
    title = {{Prince}},
    url = {https://github.com/MaxHalford/prince}
}

License

The MIT License (MIT). Please see the license file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prince-0.18.0.tar.gz (202.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prince-0.18.0-py3-none-any.whl (195.1 kB view details)

Uploaded Python 3

File details

Details for the file prince-0.18.0.tar.gz.

File metadata

  • Download URL: prince-0.18.0.tar.gz
  • Upload date:
  • Size: 202.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prince-0.18.0.tar.gz
Algorithm Hash digest
SHA256 0e956a87104aaae7368fa8116214da43b6760b2953ddb1122245c7b4cd36509c
MD5 cc3bbfcf2db87711c83b43ce4399d0df
BLAKE2b-256 ae43d11940bad15397266139a83bddd8c84a0f624e76f7d7c5c1dbb319fc4232

See more details on using hashes here.

File details

Details for the file prince-0.18.0-py3-none-any.whl.

File metadata

  • Download URL: prince-0.18.0-py3-none-any.whl
  • Upload date:
  • Size: 195.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prince-0.18.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d3dc04ba2e7a9a247eaedc647ca6ba8da2b5d25f9e4929f6c55db566e6b59b5
MD5 cf5147133b804fe78583373bf913faba
BLAKE2b-256 be69dc3b5dbfc84d61ac0309f92b271f6a5e96663d1398d699323dc19b82b50f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page