Skip to main content

Python+Rust implementation of the Probabilistic Principal Component Analysis model

Project description

Probabilistic Principal Component Analysis (PPCA) model

PyPI version

This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin.

Installing

This package is available in PyPI!

pip install ppca-rs

Why use PPCA?

Glad you asked!

  • The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
  • The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
  • The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
  • The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.

Why use ppca-rs?

That's an easy one!

  • It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
  • It uses rayon to paralellize computations evenly across as many CPUs as you have.
  • It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
  • Battle-tested at Vio.com with some ridiculously huge datasets.

Quick example

import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA

samples: np.ndarray

# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)

# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)


# And now, here is a free sample of what you can do:

# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)

# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)

# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()

Juicy extras!

  • Tired of the linear? Support for PPCA mixture models is coming soon. Clustering and dimensionality reduction in a single tool.
  • Support for adaptation of DataFrames using either pandas or polars. Never juggle those dfs in your code again.

Building from soure

Prerequisites

You will need Rust, which can be installed locally (i.e., without sudo) and you will also need maturin, which can be installed by

pip install maturin

pipenv is also a good idea if you are going to mess around with it locally. At least, you need a venv set, otherwise, maturin will complain with you.

Installing it locally

Check the Makefile for the available commands (or just type make). To install it locally, do

make install    # optional: i=python.version (e.g, `i=3.9`)

Messing around and testing

To mess around, inside a virtual environment (a Pipfile is provided for the pipenv lovers), do

maturin develop  # use the flag --release to unlock superspeed!

This will install the package locally as is from source.

How do I use this stuff?

See the examples in the examples folder. Also, all functions are type hinted and commented. If you are using pylance or mypy, it should be easy to navigate.

Is it faster than the pure Python implemetation you made?

You bet!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ppca_rs-0.3.0.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

ppca_rs-0.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (497.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

File details

Details for the file ppca_rs-0.3.0.tar.gz.

File metadata

  • Download URL: ppca_rs-0.3.0.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.14.2

File hashes

Hashes for ppca_rs-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0be93296e2d8afe542f088df95265758d3cd72ca95a744594eacfbb27a99549f
MD5 9d987c9fdf38f3a93233d7e8cf89cddb
BLAKE2b-256 d8c40f464f359f54390cb6f4ae5ba61a68ae0d71bd687f8eb4038358ea069001

See more details on using hashes here.

File details

Details for the file ppca_rs-0.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 baeb44b79c2463544f29c09be69ee423cef03bc04e976ba0de9e3f93dfd71ac9
MD5 7db55d51a3f2e03e54bb10c25780a3f3
BLAKE2b-256 bd0e19b407a4d2ba690e9867ab7989f5574adf17705edc8562969ae97ae2aa9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page