Skip to main content

Python+Rust implementation of the Probabilistic Principal Component Analysis model

Project description

Probabilistic Principal Component Analysis (PPCA) model

PyPI version Crates.io version Docs.rs version

This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin.

Installing

This package is available in PyPI!

pip install ppca-rs

And you can also use it natively in Rust:

cargo add ppca

Why use PPCA?

Glad you asked!

  • The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
  • The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
  • The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
  • The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.

Why use ppca-rs?

That's an easy one!

  • It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
  • It uses rayon to paralellize computations evenly across as many CPUs as you have.
  • It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
  • Battle-tested at Vio.com with some ridiculously huge datasets.

Quick example

import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA

samples: np.ndarray

# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)

# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)


# And now, here is a free sample of what you can do:

# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)

# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)

# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()

Juicy extras!

  • Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!
  • Support for adaptation of DataFrames using either pandas or polars. Never juggle those dfs in your code again.

Building from soure

Prerequisites

You will need Rust, which can be installed locally (i.e., without sudo) and you will also need maturin, which can be installed by

pip install maturin

pipenv is also a good idea if you are going to mess around with it locally. At least, you need a venv set, otherwise, maturin will complain with you.

Installing it locally

Check the Makefile for the available commands (or just type make). To install it locally, do

make install    # optional: i=python.version (e.g, `i=3.9`)

Messing around and testing

To mess around, inside a virtual environment (a Pipfile is provided for the pipenv lovers), do

maturin develop  # use the flag --release to unlock superspeed!

This will install the package locally as is from source.

How do I use this stuff?

See the examples in the examples folder. Also, all functions are type hinted and commented. If you are using pylance or mypy, it should be easy to navigate.

Is it faster than the pure Python implemetation you made?

You bet!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ppca_rs-0.4.1.tar.gz (49.6 kB view details)

Uploaded Source

Built Distributions

ppca_rs-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (634.4 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

ppca_rs-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (634.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

ppca_rs-0.4.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (541.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

ppca_rs-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (618.8 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

ppca_rs-0.4.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (542.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

ppca_rs-0.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (618.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

ppca_rs-0.4.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (541.8 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

ppca_rs-0.4.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (634.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

ppca_rs-0.4.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (541.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

File details

Details for the file ppca_rs-0.4.1.tar.gz.

File metadata

  • Download URL: ppca_rs-0.4.1.tar.gz
  • Upload date:
  • Size: 49.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.14.6

File hashes

Hashes for ppca_rs-0.4.1.tar.gz
Algorithm Hash digest
SHA256 2c42b49e7372c7a0dd9e8a0ff1e95f98484d3ff94fc95e6af39aa16a96ab5c02
MD5 1005e6d42067a76e6208ae8ff6ecab51
BLAKE2b-256 b685fbd7ec8933b3af3eddf5c3f22a39311b05a94e1c3703dcb46bc36c16f63d

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62b15f88a07ea37597495d751b18e1fc5452950636e95e3b647e1f16a39d7562
MD5 0e6f1697a58566412527819b204e7d7b
BLAKE2b-256 ae3e15716d96a8b21d546efa66e15b76560815e3527a189205339cb2aaa9de28

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f532d5578bc7cd7df956a25c87124c326015de9b7984751ff0d53edc72d9f112
MD5 f84c519a5ed84c8d90b3aeb5ee918441
BLAKE2b-256 cd6dfce6a95710c5a342b21e3adc4d4f1e57e3430f8ddd813722b5a9f816f8ad

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4831a562c614fbf4b798104162921283a64465cf8b0345b0f29f367b329392e9
MD5 a6e2eaf5d078f6c8a00bc2c81217f607
BLAKE2b-256 7136d4eae5717a9a34a7c8d16bbc8f1e7bf4d5c900b4df6362300989a0283516

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 51941099de1ff50c800cf48894f22aaa8a4d8a3721efb01532139330affe31c4
MD5 a15e16046ba465f9825c2bd1c7c33899
BLAKE2b-256 ac17755043af00040823774c12e052a5f3cc054a04726aa721129515aef79754

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 65167a5be2bd484ff18f4206b6e7059b945826f9b437ea1db9f5906484da979e
MD5 532472066965766990d732013c416d4e
BLAKE2b-256 d67949513018a42ea59e9f9fb7e3814b19b11cf67a9f060ff62397e0f4ecfc17

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a64161f2b62ec3f9fc808c449e86b41f358a554a2fee950a494a079907591870
MD5 49f3c4c357882a1822ffea005f76396f
BLAKE2b-256 490de84317bfc819755766e2d95f08dd94fb38a832bbbd13f2ab31f5a2e961c9

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5cc353b849d8901ceb6139c5f1760083f7538ca04b6f32b368427289300d5b10
MD5 688a55f576b42bf422b16a901c580e19
BLAKE2b-256 3e6af83e206a18b46943ed1bd3dad2234a467982c635edc625f4fe0d1449bae5

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7bb9110e62e4fe454b9a0267c8227bc4accac3a88ff563d8784ccc0d727e2847
MD5 14869d94595cc3de4a2f7c88e2748564
BLAKE2b-256 23c78b00d5e015f6ed681912691550da0fc5715c1af0945a25b8b03014960676

See more details on using hashes here.

File details

Details for the file ppca_rs-0.4.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ppca_rs-0.4.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c23c53678d9c72d1b5bc36181e25844ec84cc0dd172c58514d169d065a370189
MD5 bd3bddf47f2872b86b079ee417441496
BLAKE2b-256 ef2b6671d7f03e1f738f72359c2ea49878ef60d9fa65b45ff1aca77b705ac974

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page