Skip to main content

Probabilistic PCA (PPCA) with missing-data support - fast C++ core, clean Python API

Project description

ppca-cpp

Build Wheels PyPI version Python Versions License

Probabilistic PCA (PPCA) with missing-data support — fast C++ core, clean Python API.

teaser

Overview

ppca-cpp implements Probabilistic Principal Component Analysis (PPCA) as described by Tipping & Bishop (1999), with a focus on speed, usability, and robust handling of missing data. The core is written in C++ (Armadillo + CARMA + pybind11), exposed via a simple Python interface.

Key Features

  • Handles missing values natively: No need for manual imputation—just use np.nan for missing entries.
  • Familiar API: Drop-in replacement for scikit-learn PCA with attributes like components_, explained_variance_, etc.
  • Probabilistic modeling: Compute log-likelihoods, posterior latent variable distributions, multiple imputations, and more.
  • Fast and scalable: Optimized C++ backend for large datasets.
  • Flexible: Supports both batch and online (mini-batch) EM.

Quick Start

pip install ppca-py

Note: pre-built wheels are produced only for Linux and macOS (CI builds target ubuntu-latest and macos-latest). On other platforms (e.g. Windows) you will need to build from source (see further below).

Usage example:

import numpy as np
from ppca import PPCA

X_train = np.random.randn(600, 10) + 0.1  # (n_samples, n_features)
X_train[::7, 3] = np.nan                  # missing values
X_test = np.random.randn(100, 10) + 0.1
X_test[::7, 2] = np.nan                   # missing values

model = PPCA(n_components=3, batch_size=200)
model.fit(X_train)

mZ, covZ = model.posterior_latent(X_test) # latent representation
mX, covX = model.likelihood(mZ)           # reconstruction
ll = model.score_samples(X_test)          # data log likelihood

# multiple imputation (return shape: (n_draws, n_samples, n_features))
X_imputed = model.sample_missing(X_test, n_draws=5)

# estimate of components, mean and noise variance
print("Components:", model.components_)
print("Mean:", model.mean_)
print("Noise variance:", model.noise_variance_)

For a short PPCA reference doc see docs/ppca.md, and a some example scripts are provided in examples/.

Installation from Source

Minimum requirements

  • CMake >= 3.18
  • Python >= 3.9 (+ development headers)
  • C++17-capable compiler (clang on macOS, gcc on Linux, MSVC on Windows)
  • BLAS/LAPACK implementation (OpenBLAS, MKL, or Accelerate on macOS)
  • git (to fetch submodules)
  • Network access (CMake will download Armadillo into extern by default) or provide extern/armadillo-<version>/ or a system Armadillo install

Quick install (fresh clone)

git clone https://github.com/brdav/ppca-cpp.git
cd ppca-cpp
git submodule update --init --recursive   # ensure extern/carma is present
python -m pip install .                   # build and install

Editable install for development

git clone https://github.com/brdav/ppca-cpp.git
cd ppca-cpp
git submodule update --init --recursive
python -m pip install -e '.[dev]'         # editable install
pre-commit install                        # optional: register hooks

Note: Builds on Windows are untested in CI. You can attempt a Windows build but expect manual steps.

Internals

PPCA uses an Expectation-Maximization (EM) algorithm to learn parameters through maximum likelihood estimation. For details see the reference paper listed below. The equations for the EM algorithm in the presence of missing values are shown in docs/equations.md.

Citing

If you use this code academically, cite the original PPCA paper:

  • M. Tipping & C. Bishop. Probabilistic Principal Component Analysis. JRSS B, 1999.

You may also reference the library name or URL.

License

MIT License — see LICENSE.


Questions or requests? Open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ppca_py-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (244.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.0-cp311-cp311-macosx_11_0_arm64.whl (200.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

ppca_py-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (243.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.0-cp310-cp310-macosx_11_0_arm64.whl (199.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

ppca_py-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (243.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.0-cp39-cp39-macosx_11_0_arm64.whl (199.4 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file ppca_py-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0d30f64ec6e7b6292fc2fe35c5a54c9f9460785b99813271cd05aeac2e8ccc27
MD5 4ae493f69d349ef0217d4b734ea4c174
BLAKE2b-256 c3cd241e80929b27bc3347de006c46043bc63dd58d396e050dcf7053593665bd

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 73f0790a0bed815f33cccd9a549da9bfa4a645753fbdccb81cd43e1bc2e9f9ea
MD5 f280695c092acc64b7cc99ad979b4f55
BLAKE2b-256 81a81f429c6ffc2300c8adcf6587ccae0b4d543910a6467392cae0a82928d8d0

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9a17d148f4e509bc85fe00c36c03c41ff4f94f51da9e1a38f51680ab271dea55
MD5 e98906d6f1e24d06c07a282cc1418772
BLAKE2b-256 a0c54743b215f1a5820c91e027d11ca9b34a552c23670a498ab5d1aee783da7e

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 848ac3ed2f385b722fb440750f76daa600be7c160f5ea3844ffe4b9efa84d1c6
MD5 551c7b844fbe6f00161891f92d68cc0d
BLAKE2b-256 ac07f9f331ebfd3f1962743f7ebf7f56182ce65ef456867a65ff1adf7c076b3f

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 63d076be21165ef87d5e155c6107ce9fa0652c42b827e2badeedcb6bf477aff1
MD5 d9ebc387412d538fb3e56087c1bbba6d
BLAKE2b-256 8fc1216917d4eabe121103353aa99b1775c11a3fb81a9b283c4df72cc74de0f7

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e3c4361942c6b48248ec22619e24d6046d617d6bf0943c5131542a1efc7fc2ec
MD5 a7b4a1ff663240659bc940143afe50e1
BLAKE2b-256 8b372dd9d3afdc8eef43704a674aa4d4741612a6f1e7e31d58fc42b57faf791b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page