Python+Rust implementation of the Probabilistic Principal Component Analysis model
Project description
Probabilistic Principal Component Analysis (PPCA) model
This project implements a PPCA model implemented in Rust for Python using pyO3
and maturin
.
Installing
This package is available in PyPI!
pip install ppca-rs
And you can also use it natively in Rust:
cargo add ppca
Why use PPCA?
Glad you asked!
- The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
- The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
- The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
- The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.
Why use ppca-rs
?
That's an easy one!
- It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
- It uses
rayon
to paralellize computations evenly across as many CPUs as you have. - It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
- Battle-tested at Vio.com with some ridiculously huge datasets.
Quick example
import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA
samples: np.ndarray
# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)
# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)
# And now, here is a free sample of what you can do:
# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)
# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)
# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()
Juicy extras!
- Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!
- Support for adaptation of DataFrames using either
pandas
orpolars
. Never juggle thosedf
s in your code again.
Building from soure
Prerequisites
You will need Rust, which can be installed locally (i.e., without sudo
) and you will also need maturin
, which can be installed by
pip install maturin
pipenv
is also a good idea if you are going to mess around with it locally. At least, you need a venv
set, otherwise, maturin
will complain with you.
Installing it locally
Check the Makefile
for the available commands (or just type make
). To install it locally, do
make install # optional: i=python.version (e.g, `i=3.9`)
Messing around and testing
To mess around, inside a virtual environment (a Pipfile
is provided for the pipenv
lovers), do
maturin develop # use the flag --release to unlock superspeed!
This will install the package locally as is from source.
How do I use this stuff?
See the examples in the examples
folder. Also, all functions are type hinted and commented. If you are using pylance
or mypy
, it should be easy to navigate.
Is it faster than the pure Python implemetation you made?
You bet!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for ppca_rs-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f6374aecbd90fb7ba9fc2a32d926e49fe983317035ef8f9b8275bea950a62e2 |
|
MD5 | fd1f7456c82bbc0f99f94534e08a44e6 |
|
BLAKE2b-256 | 72058fd5c10617b0a0e52d60fc59ef697307b5e094746d52c8cefd772bcfd2f2 |
Hashes for ppca_rs-0.4.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d1598ff87db5ee63023ae98a449794b859ecf8682ac252fa6bbd6d04e9ddb15 |
|
MD5 | 8a8ba1ca0e2708096152b299a9085e52 |
|
BLAKE2b-256 | f16dd76be29764263ba45c3838b302b869ade8a292f5049964c5d6174fa336e7 |