Skip to main content

Probabilistic PCA (PPCA) with missing-data support - fast C++ core, clean Python API

Project description

ppca-cpp

Build Wheels PyPI version Python Versions License

Probabilistic PCA (PPCA) with missing-data support — fast C++ core, clean Python API.

teaser

Overview

ppca-cpp implements Probabilistic Principal Component Analysis (PPCA) as described by Tipping & Bishop (1999), with a focus on speed, usability, and robust handling of missing data. The core is written in C++ (Armadillo + CARMA + pybind11), exposed via a simple Python interface.

Key Features

  • Handles missing values natively: No need for manual imputation—just use np.nan for missing entries.
  • Familiar API: Drop-in replacement for scikit-learn PCA with attributes like components_, explained_variance_, etc.
  • Probabilistic modeling: Compute log-likelihoods, posterior latent variable distributions, multiple imputations, and more.
  • Fast and scalable: Optimized C++ backend for large datasets.
  • Flexible: Supports both batch and online (mini-batch) EM.

Quick Start

pip install ppca-py

Note: pre-built wheels are produced only for Linux and macOS (CI builds target ubuntu-latest and macos-latest). On other platforms (e.g. Windows) you will need to build from source (see further below).

Usage example:

import numpy as np
from ppca import PPCA

X_train = np.random.randn(600, 10) + 0.1  # (n_samples, n_features)
X_train[::7, 3] = np.nan                  # missing values
X_test = np.random.randn(100, 10) + 0.1
X_test[::7, 2] = np.nan                   # missing values

model = PPCA(n_components=3, batch_size=200)
model.fit(X_train)

mZ, covZ = model.posterior_latent(X_test) # latent representation
mX, covX = model.likelihood(mZ)           # reconstruction
ll = model.score_samples(X_test)          # data log likelihood

# multiple imputation (return shape: (n_draws, n_samples, n_features))
X_imputed = model.sample_missing(X_test, n_draws=5)

# estimate of components, mean and noise variance
print("Components:", model.components_)
print("Mean:", model.mean_)
print("Noise variance:", model.noise_variance_)

For a short PPCA reference doc see docs/ppca.md, and some usage examples are provided in examples/.

Installation from Source

Minimum requirements

  • CMake >= 3.18
  • Python >= 3.9 (+ development headers)
  • C++17-capable compiler (clang on macOS, gcc on Linux, MSVC on Windows)
  • BLAS/LAPACK implementation (OpenBLAS, MKL, or Accelerate on macOS)
  • git (to fetch submodules)
  • Network access (CMake will download Armadillo into extern by default) or provide extern/armadillo-<version>/ or a system Armadillo install

Quick install (fresh clone)

git clone https://github.com/brdav/ppca-cpp.git
cd ppca-cpp
git submodule update --init --recursive   # ensure extern/carma is present
python -m pip install .                   # build and install

Editable install for development

git clone https://github.com/brdav/ppca-cpp.git
cd ppca-cpp
git submodule update --init --recursive
python -m pip install -e '.[dev]'         # editable install
pre-commit install                        # optional: register hooks

Note: Builds on Windows are untested in CI. You can attempt a Windows build but expect manual steps.

Internals

PPCA uses an Expectation-Maximization (EM) algorithm to learn parameters through maximum likelihood estimation. For details see the reference paper listed below. The equations for the EM algorithm in the presence of missing values are shown in docs/equations.md.

Citing

If you use this code academically, cite the original PPCA paper:

  • M. Tipping & C. Bishop. Probabilistic Principal Component Analysis. JRSS B, 1999.

You may also reference the library name or URL.

License

MIT License — see LICENSE.


Questions or requests? Open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ppca_py-1.0.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (244.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.1-cp311-cp311-macosx_11_0_arm64.whl (200.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

ppca_py-1.0.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (243.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.1-cp310-cp310-macosx_11_0_arm64.whl (199.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

ppca_py-1.0.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (243.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.1-cp39-cp39-macosx_11_0_arm64.whl (199.4 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file ppca_py-1.0.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f5f3f7f6b3bf345b1de4bd0eef7f66a2097905c7e3c66f7a5d8b688488319abb
MD5 10738081049ad78722208b7fbbaeb5ed
BLAKE2b-256 f53b09924b74c1f09c9a504bfe744daceb9c39c356c9d862731d407d2cc0202b

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d589e80f1e81d7fe7aed1a40709125b77129398a512cb49d3dfdd7812ec68c21
MD5 398098fa9b244967610087c2b7d3e0a5
BLAKE2b-256 f86e2448bef13bf2a6dd8fd82ecedbadb427b1e16b9576d93bf57bb851c675bd

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 69eb32008703b116582c4ffb8fda22e6997f72fabe6178ab23bd8312b24346f1
MD5 e3afa032f254ead0a39b11a60f8cc7d9
BLAKE2b-256 6121947730afe46de166c4e995a59a75153cf3f2d2997389424c66762edc622d

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 917c3c7c7f67863ecfd70bbf07901fdae5560731dd42fafaf2d191986a65d5ed
MD5 0b35a6c4ca8934f4c9fcb3a3a07b5b0c
BLAKE2b-256 7db41cee4811c46f098720cfd8c12331c37688f4759bf9ea7d5ca537422b3059

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e9fe41fc9afd311d9342177cb2da5ff17f81f74ecfe374ca46a01fe4ffb84f88
MD5 a34ec188723efbff93dde7d4372d722e
BLAKE2b-256 6e1afe591293864cc30fc237ac3dee9e3942738bf084829d47427ab56190ba05

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4b5d4dc714de6ab54a53911edf56c290e0d5eb2383bac6aa9ca58e7fa4e8e450
MD5 aa0f73f7d37a37cb42663ecc8dbe8be6
BLAKE2b-256 ab50c6da0a1cbd7ea51d9af5ace08dbbe06009f9e53735ef0d71dccc8ca7af6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page