Python+Rust implementation of the Probabilistic Principal Component Analysis model
Project description
Probabilistic Principal Component Analysis (PPCA) model
This project implements a PPCA model implemented in Rust for Python using pyO3
and maturin
.
Installing
This package is available in PyPI!
pip install ppca-rs
Why use PPCA?
Glad you asked!
- The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
- The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
- The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
- The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.
Why use ppca-rs
?
That's an easy one!
- It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
- It uses
rayon
to paralellize computations evenly across as many CPUs as you have. - It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
- Battle-tested at Vio.com with some ridiculously huge datasets.
Quick example
import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA
samples: np.ndarray
# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)
# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)
# And now, here is a free sample of what you can do:
# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)
# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)
# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()
Juicy extras!
- Tired of the linear? Support for PPCA mixture models is coming soon. Clustering and dimensionality reduction in a single tool.
- Support for adaptation of DataFrames using either
pandas
orpolars
. Never juggle thosedf
s in your code again.
Building from soure
Prerequisites
You will need Rust, which can be installed locally (i.e., without sudo
) and you will also need maturin
, which can be installed by
pip install maturin
pipenv
is also a good idea if you are going to mess around with it locally. At least, you need a venv
set, otherwise, maturin
will complain with you.
Installing it locally
Check the Makefile
for the available commands (or just type make
). To install it locally, do
make install # optional: i=python.version (e.g, `i=3.9`)
Messing around and testing
To mess around, inside a virtual environment (a Pipfile
is provided for the pipenv
lovers), do
maturin develop # use the flag --release to unlock superspeed!
This will install the package locally as is from source.
How do I use this stuff?
See the examples in the examples
folder. Also, all functions are type hinted and commented. If you are using pylance
or mypy
, it should be easy to navigate.
Is it faster than the pure Python implemetation you made?
You bet!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for ppca_rs-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1d86dfe5146c9fa710be05f58303d6c070d3c939c76bbfab57619b920629a0a |
|
MD5 | b852a83838769ae08a0cd0eec4e677d6 |
|
BLAKE2b-256 | d154f1d1574477fcd234c87dacb22f7ef4a6ede102a98b0947112115f94f4f1e |
Hashes for ppca_rs-0.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8ec08f9f4e183f169a32d22c939d068439129b67e578514d4622e96b9ca16a9 |
|
MD5 | ceaac4eb7753e0f1105ce582811bc8de |
|
BLAKE2b-256 | 304723fc9019f940d6bbea6a0d2a4245986303bb44148e79adc23f50bbacad3b |
Hashes for ppca_rs-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34bb63a68eeb9b49def33249aa27e5c69ef43b798626bbcbf1d0f75e3bc663d9 |
|
MD5 | 204f79bb13bb1802c7b32d3b0d986416 |
|
BLAKE2b-256 | 58adcdfc3eb4d21846c618cc1b1cf4546e24b94ecba3e85b8d719bfa1391df1d |
Hashes for ppca_rs-0.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33b0c4fa5ad46440d5d8328e367ff816b5c83afc7d723327a617df7c6a634eeb |
|
MD5 | 6d57363d5f5e8c83039299e28e9c1a9d |
|
BLAKE2b-256 | a8edc1a257484c24903d595bbaef0602a5de6967972769fc795950001ff19c49 |
Hashes for ppca_rs-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fd79d36378a1099de8d585b1eb8b2f5b9ad8d4555d9f51b7dd63975f822b0b9 |
|
MD5 | 2584c1e3e1d73b57ad7427a3db2e9272 |
|
BLAKE2b-256 | 17656db711f29b4fc90387f0a9ab862a28fa45e8370ee77bacdcee7739eb9550 |
Hashes for ppca_rs-0.3.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7086d938fed7c714f2ed7910f36e4472704c12c10f6e378f079d882274f9bc45 |
|
MD5 | 8a2b8773f7f0f2bb41922f8d5a35127c |
|
BLAKE2b-256 | b9b5c0fd8ac4f3a8b4ffc573a630ba14670d8e4199cd022bc8a3571156ea0ec8 |
Hashes for ppca_rs-0.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97dbe4d80b1c75bbb3ec02a4f7949f5ebac2734a548b47ad1027da63bcc2c5af |
|
MD5 | a8bc44a25cd886553467af3b042b5f63 |
|
BLAKE2b-256 | ea45cc97a2222e8f8b80169a0f5e9ed5ef4a0744e5ba884e5fb9385338170b3e |
Hashes for ppca_rs-0.3.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 710a99b9dd65bb49bdf8c81e715e8844a9e7a3d4f78f3b5ac9c7eb415c1bf4d6 |
|
MD5 | eaba2aaec1a9eb3bb94b1d03eb707099 |
|
BLAKE2b-256 | c59225d90ea1477545a8497cd0c807ad03fe760c72b0e85b557725c8cf0f6d54 |