Skip to main content

Probabilistic PCA (PPCA) with missing-data support - fast C++ core, clean Python API

Project description

ppca-cpp

Build Wheels PyPI version Python Versions License

Probabilistic PCA (PPCA) with missing-data support — fast C++ core, clean Python API.

ppca teaser

Overview

ppca-cpp implements Probabilistic Principal Component Analysis (PPCA) as described by Tipping & Bishop (1999), with a focus on speed, usability, and robust handling of missing data. The core is written in C++ (Armadillo), exposed via a simple Python interface.

Key Features

  • Handles missing values natively: No need for manual imputation—just use np.nan for missing entries.
  • Familiar API: Drop-in replacement for scikit-learn PCA with attributes like components_, explained_variance_, etc.
  • Probabilistic modeling: Compute log-likelihoods, posterior latent variable distributions, multiple imputations, and more.
  • Fast and scalable: Optimized C++ backend for large datasets.
  • Flexible: Supports both batch and online (mini-batch) EM.

Quick Start

pip install ppca-py

Note: pre-built wheels are produced only for Linux and macOS (CI builds target ubuntu-latest and macos-latest). On other platforms (e.g. Windows) you will need to build from source (see further below).

Usage example:

import numpy as np
from ppca import PPCA

X_train = np.random.randn(600, 10) + 0.1  # (n_samples, n_features)
X_train[::7, 3] = np.nan                  # missing values
X_test = np.random.randn(100, 10) + 0.1
X_test[::7, 2] = np.nan                   # missing values

model = PPCA(n_components=3, batch_size=200)
model.fit(X_train)

mZ, covZ = model.posterior_latent(X_test) # latent representation
mX, covX = model.likelihood(mZ)           # reconstruction
ll = model.score_samples(X_test)          # data log likelihood

# multiple imputation (return shape: (n_draws, n_samples, n_features))
X_imputed = model.sample_missing(X_test, n_draws=5)

# estimate of components, mean and noise variance
print("Components:", model.components_)
print("Mean:", model.mean_)
print("Noise variance:", model.noise_variance_)

For a short PPCA reference doc see docs/ppca.md, and some usage examples are provided in examples/.

Installation from Source

For development install from source:

git clone https://github.com/brdav/ppca-cpp.git
cd ppca-cpp
git submodule update --init --recursive
python -m pip install -e '.[dev]'
pre-commit install

Minimum build dependencies

  • CMake >= 3.18
  • Python >= 3.9 (development headers)
  • C++17-capable compiler (clang, gcc, or MSVC)
  • BLAS/LAPACK implementation (OpenBLAS, MKL, or Accelerate)

Note: Builds on Windows are untested in CI. You can attempt a Windows build but expect manual steps.

The PPCA C++ core can also be built independently:

cmake -S src/cpp -B build/cpp -DCMAKE_BUILD_TYPE=Release
cmake --build build/cpp --target ppca -j

Internals

PPCA uses an Expectation-Maximization (EM) algorithm to learn parameters through maximum likelihood estimation. For details see the reference paper listed below. The equations for the EM algorithm in the presence of missing values are shown in docs/equations.md.

Citing

If you use this code academically, cite the original PPCA paper:

  • M. Tipping & C. Bishop. Probabilistic Principal Component Analysis. JRSS B, 1999.

You may also reference the library name or URL.

License

MIT License — see LICENSE.


Questions or requests? Open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ppca_py-1.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.3-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

ppca_py-1.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.3-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

ppca_py-1.0.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ppca_py-1.0.3-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file ppca_py-1.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 960f75b10e0c0fbb459331f75592990249a4290e13917fa98661b15f2b0369fe
MD5 b31fcf3214b97e5c26f9c518b3883183
BLAKE2b-256 c580aa9e5001e77c6f8b226c6a17319f4cd25cc7779a6cdc95364ad6bba74680

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 05312cd80dcd940116f4fabddc430b0522ad968343fe437927b326e204db9bd4
MD5 96a2a88a960a725c5de1242ef92c78e3
BLAKE2b-256 828568db895e99042663c9729733787daa399aa80994dab791de26c75dcc21f6

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2be5a598603da62d747c48ca1cf6cc335a5752063ba78d54a50e900e08211332
MD5 2fcf65db881eef9dfaab85f0d1462e14
BLAKE2b-256 544ef8617bb42e68eadb416e45546980f670481f006072c2393c1f8fb150db66

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 db0f0600c91b0f12c48d1af58093d0c7f833945a5eda6f847ab354345cb50d06
MD5 f78a55acd2d62b00c0645b36ee41027b
BLAKE2b-256 4f2afd8a1d8c2bf7ddabd1586f2caeb43b9eeef18d244755050f5743a280210c

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d253706ac2a4aca89cf8ebf969af2dbad9665ba26dfffe8e682954569551e7c0
MD5 7eef27ec137a939a10011eea222c7913
BLAKE2b-256 655f88f064cdb045c8baf001e3d8e6a1f50ed2ad3f0c1986960e17beec398dd7

See more details on using hashes here.

File details

Details for the file ppca_py-1.0.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ppca_py-1.0.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d5aceb743c2730b2bee8d98bb7fb464bbb602349ad2f5eb24650c2873fdc815a
MD5 abaee8cda0db0ad4f6177c5abc659144
BLAKE2b-256 dd380bcbbf1a7aad07e6b93f017f1ef97512c7c5d6239b87029bc630aa1cedc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page