Skip to main content

Probabilistic PCA (PPCA) with missing-data support - fast C++ core, clean Python API

Project description

ppca-cpp

Build Wheels PyPI version Python Versions License

Probabilistic PCA (PPCA) with missing-data support — fast C++ core, clean Python API.

teaser

Overview

ppca-cpp implements Probabilistic Principal Component Analysis (PPCA) as described by Tipping & Bishop (1999), with a focus on speed, usability, and robust handling of missing data. The core is written in C++ (Armadillo + CARMA + pybind11), exposed via a simple Python interface.

Key Features

  • Handles missing values natively: No need for manual imputation—just use np.nan for missing entries.
  • Familiar API: Drop-in replacement for scikit-learn PCA with attributes like components_, explained_variance_, etc.
  • Probabilistic modeling: Compute log-likelihoods, posterior latent variable distributions, multiple imputations, and more.
  • Fast and scalable: Optimized C++ backend for large datasets.
  • Flexible: Supports both batch and online (mini-batch) EM.

Quick Start

pip install ppca-py

Note: pre-built wheels are produced only for Linux and macOS (CI builds target ubuntu-latest and macos-latest). On other platforms (e.g. Windows) you will need to build from source (see further below).

Usage example:

import numpy as np
from ppca import PPCA

X_train = np.random.randn(600, 10) + 0.1  # (n_samples, n_features)
X_train[::7, 3] = np.nan                  # missing values
X_test = np.random.randn(100, 10) + 0.1
X_test[::7, 2] = np.nan                   # missing values

model = PPCA(n_components=3, batch_size=200)
model.fit(X_train)

mZ, covZ = model.posterior_latent(X_test) # latent representation
mX, covX = model.likelihood(mZ)           # reconstruction
ll = model.score_samples(X_test)          # data log likelihood

# multiple imputation (return shape: (n_draws, n_samples, n_features))
X_imputed = model.sample_missing(X_test, n_draws=5)

# estimate of components, mean and noise variance
print("Components:", model.components_)
print("Mean:", model.mean_)
print("Noise variance:", model.noise_variance_)

For a short PPCA reference doc see docs/ppca.md, and some usage examples are provided in examples/.

Installation from Source

Minimum requirements

  • CMake >= 3.18
  • Python >= 3.9 (+ development headers)
  • C++17-capable compiler (clang on macOS, gcc on Linux, MSVC on Windows)
  • BLAS/LAPACK implementation (OpenBLAS, MKL, or Accelerate on macOS)
  • git (to fetch submodules)
  • Network access (CMake will download Armadillo into extern by default) or provide extern/armadillo-<version>/ or a system Armadillo install

Quick install (fresh clone)

git clone https://github.com/brdav/ppca-cpp.git
cd ppca-cpp
git submodule update --init --recursive   # ensure extern/carma is present
python -m pip install .                   # build and install

Editable install for development

git clone https://github.com/brdav/ppca-cpp.git
cd ppca-cpp
git submodule update --init --recursive
python -m pip install -e '.[dev]'         # editable install
pre-commit install                        # optional: register hooks

Note: Builds on Windows are untested in CI. You can attempt a Windows build but expect manual steps.

Internals

PPCA uses an Expectation-Maximization (EM) algorithm to learn parameters through maximum likelihood estimation. For details see the reference paper listed below. The equations for the EM algorithm in the presence of missing values are shown in docs/equations.md.

Citing

If you use this code academically, cite the original PPCA paper:

  • M. Tipping & C. Bishop. Probabilistic Principal Component Analysis. JRSS B, 1999.

You may also reference the library name or URL.

License

MIT License — see LICENSE.


Questions or requests? Open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ppca_py-1.0.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (244.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.2-cp311-cp311-macosx_11_0_arm64.whl (200.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

ppca_py-1.0.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (243.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.2-cp310-cp310-macosx_11_0_arm64.whl (199.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

ppca_py-1.0.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (243.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.2-cp39-cp39-macosx_11_0_arm64.whl (199.5 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file ppca_py-1.0.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d90ccd936f86dafe8e88998702fca8194e9d0279820368b083ec487388d98540
MD5 87a7ff07701eb8d79e57476a982b7d09
BLAKE2b-256 f02ac2d77e3d21f5a6954f25777715982fe97ebcc091d724fac6d89a736e49aa

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 70903c490300db0b858a50a35978d4de928d7e9b946c6d766dceb42151191f1f
MD5 e39185b66a7a4af047b215762b3f5674
BLAKE2b-256 c2e3e363fd0c9c634fca7ad5a58c9c983eddff0eb28a4a9f8d226e2db0e11133

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f1a0ce2190f6af35b54341613ca388491eccc4c1f58a96dc40d32c019d6817d2
MD5 484c6b899a9ae237bb98a34c48fd1bdf
BLAKE2b-256 24485d64361f9c08eafbe56c2460146ad9658cd3011d3aba263f8cbefc8bd5bc

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1c5cbaf15af17c1f27e5e0e1ca4e2bfd2409ad46ea596361f34c9633d03ec3df
MD5 0e939b175f38165f95b122e6c7892b1e
BLAKE2b-256 9ea3366d3e83fdc386cb12210ea95c6f4753c6fca62a8622cf2168cc9d47a1d7

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f4f3ec76ffc87cccbb36f7f1a3664d6869f5e5f2983109913f893b7fad663dae
MD5 830f7d26c7bb6ab023ed567d194dd796
BLAKE2b-256 48287f48e0d5156374e2ebb415b657d9bfe4ec46b7421ae5cb7074643c0e9755

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 92813b0403a3ec8b05ef0e4131d2a29307020972e7b062a6179dff1feceb0bb7
MD5 2e3e09aa1f479d683787127ce835e1d2
BLAKE2b-256 d526f2960bb44cf1fbc560180f313bfc4d20d0486af2ad72e1a22f2c94d1fb65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page