Python+Rust implementation of the Probabilistic Principal Component Analysis model
Project description
Probabilistic Principal Component Analysis (PPCA) model
This project implements a PPCA model implemented in Rust for Python using pyO3
and maturin
.
Installing
This package is available in PyPI!
pip install ppca-rs
And you can also use it natively in Rust:
cargo add ppca
Why use PPCA?
Glad you asked!
- The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
- The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
- The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
- The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.
Why use ppca-rs
?
That's an easy one!
- It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
- It uses
rayon
to paralellize computations evenly across as many CPUs as you have. - It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
- Battle-tested at Vio.com with some ridiculously huge datasets.
Quick example
import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA
samples: np.ndarray
# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)
# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)
# And now, here is a free sample of what you can do:
# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)
# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)
# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()
Juicy extras!
- Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!
- Support for adaptation of DataFrames using either
pandas
orpolars
. Never juggle thosedf
s in your code again.
Building from soure
Prerequisites
You will need Rust, which can be installed locally (i.e., without sudo
) and you will also need maturin
, which can be installed by
pip install maturin
pipenv
is also a good idea if you are going to mess around with it locally. At least, you need a venv
set, otherwise, maturin
will complain with you.
Installing it locally
Check the Makefile
for the available commands (or just type make
). To install it locally, do
make install # optional: i=python.version (e.g, `i=3.9`)
Messing around and testing
To mess around, inside a virtual environment (a Pipfile
is provided for the pipenv
lovers), do
maturin develop # use the flag --release to unlock superspeed!
This will install the package locally as is from source.
How do I use this stuff?
See the examples in the examples
folder. Also, all functions are type hinted and commented. If you are using pylance
or mypy
, it should be easy to navigate.
Is it faster than the pure Python implemetation you made?
You bet!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for ppca_rs-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4a785c6bfae89a98f90f0962502751b401e0abc0551ca1e6950363b63f1ecdd |
|
MD5 | 417657c5b9e6cbfce07a891393cfe739 |
|
BLAKE2b-256 | b0d3c521c66cf296e7a7aaad477859ba75c38feaab4acd3070a1ff2e9d21ea8c |
Hashes for ppca_rs-0.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a90f65d37b0549e69558c2e42058c69d24d58e6cdc8b71e4319fddb081632b5 |
|
MD5 | fa902de5560e044adb7eb757309c07ea |
|
BLAKE2b-256 | 44890e27062fa656e95e5875688ded7ea4f10366af009d127885cad2d4c44535 |
Hashes for ppca_rs-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e64e6daf0ecb261d3ea2bb52571a4037c448d8fc3f5f3f99457491f6bcd1d0a |
|
MD5 | 5bed449e6ea4d51ca0a898431e5a73e6 |
|
BLAKE2b-256 | badde4cf0c020007ae53c5a48c474c4537d8a0a00aa0faa2625b65d56f4914f9 |
Hashes for ppca_rs-0.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b4ebfa9c610416119889b0cf4524837a5f4ccfffdc176b7679f179bb208f808 |
|
MD5 | d27006a37e04c18bc76aa9e92f3c0b44 |
|
BLAKE2b-256 | 975c8014611dc8673402773b4e6f6111019baa88a94883c1e544aa79af070562 |
Hashes for ppca_rs-0.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80e786443301dd1c67e71769abe4c4b0d4195c7f7f28791a2f2f64963b743026 |
|
MD5 | ccb729e3a2f37f1fd0e29d56430ae09d |
|
BLAKE2b-256 | 5cf0735b0a6e24bd6e3689897c4e777d811b1ac641d9dabc1e63c670e71ad434 |
Hashes for ppca_rs-0.3.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7e7528e5ab49e2fc376e67fe89a0c9671f99610dca6d5caf091f792862ef5d8 |
|
MD5 | 8dab4e53b631e80afcb344dcba29489d |
|
BLAKE2b-256 | c5c4c312e22062ea828f2b09b1f27833c02bd2104dd84373261452b95d9059b7 |
Hashes for ppca_rs-0.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33f2e8d012d2e59a529670b2b5aecf882be676161821840c00cb4a2534340b0d |
|
MD5 | 75aa173132ca6843f0109414189c5740 |
|
BLAKE2b-256 | b89eb1ad1d2b137ee8a18f3107c245d733f27fd291a68b91b64d6728547fd633 |
Hashes for ppca_rs-0.3.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bab2b302e46a290d94207badb4db1935306d19ea48bcaf0e66f228150c4f84a4 |
|
MD5 | 8ae4cf192020e92703e840cec30d8bf4 |
|
BLAKE2b-256 | e886240ec06952e411841f3fb5cdd1e44e54252100ce3e3c5c673251734583d6 |