Skip to main content

Python+Rust implementation of the Probabilistic Principal Component Analysis model

Project description

Probabilistic Principal Component Analysis (PPCA) model

PyPI version Crates.io version Docs.rs version

This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin.

Installing

This package is available in PyPI!

pip install ppca-rs

And you can also use it natively in Rust:

cargo add ppca

Why use PPCA?

Glad you asked!

  • The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
  • The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
  • The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
  • The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.

Why use ppca-rs?

That's an easy one!

  • It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
  • It uses rayon to paralellize computations evenly across as many CPUs as you have.
  • It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
  • Battle-tested at Vio.com with some ridiculously huge datasets.

Quick example

import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA

samples: np.ndarray

# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)

# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)


# And now, here is a free sample of what you can do:

# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)

# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)

# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()

Juicy extras!

  • Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!
  • Support for adaptation of DataFrames using either pandas or polars. Never juggle those dfs in your code again.

Building from soure

Prerequisites

You will need Rust, which can be installed locally (i.e., without sudo) and you will also need maturin, which can be installed by

pip install maturin

pipenv is also a good idea if you are going to mess around with it locally. At least, you need a venv set, otherwise, maturin will complain with you.

Installing it locally

Check the Makefile for the available commands (or just type make). To install it locally, do

make install    # optional: i=python.version (e.g, `i=3.9`)

Messing around and testing

To mess around, inside a virtual environment (a Pipfile is provided for the pipenv lovers), do

maturin develop  # use the flag --release to unlock superspeed!

This will install the package locally as is from source.

How do I use this stuff?

See the examples in the examples folder. Also, all functions are type hinted and commented. If you are using pylance or mypy, it should be easy to navigate.

Is it faster than the pure Python implemetation you made?

You bet!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ppca_rs-0.5.1.tar.gz (49.4 kB view details)

Uploaded Source

Built Distributions

ppca_rs-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (635.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

ppca_rs-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (635.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

ppca_rs-0.5.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (635.9 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

ppca_rs-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (636.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

ppca_rs-0.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (636.6 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

File details

Details for the file ppca_rs-0.5.1.tar.gz.

File metadata

  • Download URL: ppca_rs-0.5.1.tar.gz
  • Upload date:
  • Size: 49.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.14.16

File hashes

Hashes for ppca_rs-0.5.1.tar.gz
Algorithm Hash digest
SHA256 3157ed75400a4911749c24bd9547991c2722c61efc0ddd4480b2d2453f432d77
MD5 903083af337f11b1e1cbd884c8656e30
BLAKE2b-256 e77127710bc6a8ac2915b650adb599859f3a11e5639b8c99ce0f16236153b71d

See more details on using hashes here.

File details

Details for the file ppca_rs-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 965cabb83b651a2a113bd1ae82fc8b519e88019b1fc43acd0dc160a4dd4a390d
MD5 c37c7b8b943ffadb52950b5f2ffec8a8
BLAKE2b-256 997601b81206dbe36f9a09df039bef72985067fe71749ddf90dd773c0786d129

See more details on using hashes here.

File details

Details for the file ppca_rs-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 77c195c135761d06546a270495c7e9e27744639e27772ddbb6e5ef70204a34f8
MD5 acc3ed63f284f90b117e6a08d141d0fa
BLAKE2b-256 9119b2da9e7c78dbe803f52b676e2adf9b8ffb445926438ef0982d95da15b830

See more details on using hashes here.

File details

Details for the file ppca_rs-0.5.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.5.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cb37052ec9ec88d5df2c0e9d9a5e4988fd376fc291d32ac5c7ca6df91d032947
MD5 4b02e0d6defc16d4db477ad881778b5a
BLAKE2b-256 25516940e30240938c8a6a1aa0d93734d389d48b2dc82eff9440aff00a52607c

See more details on using hashes here.

File details

Details for the file ppca_rs-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 86a622b5fe2fdcac4324ecd3cf133815dc95ee66acb6cd48d41016b65b26ee8a
MD5 b123e69b408f9f9729537196b0e7be7f
BLAKE2b-256 776e0c658ab5ad9291ecece7763634a013d41b3b9ffbf4c1cacdc5ea1e70b74d

See more details on using hashes here.

File details

Details for the file ppca_rs-0.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 101d85c9b3a7ff6f99280f9620db991b39a5ffa66760383ae89246aaf1fd88b7
MD5 b25b5269a78107b92370f03ebb1b38a8
BLAKE2b-256 e9986d39c827a41a4b9f8a161afc4dc32d218cfa253fdfc1df6c3d5374fab2e2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page