Skip to main content

Principal component pursuit built on top of scikit-learn

Project description

skpcp

Robust principal component analysis via Principal Component Pursuit (PCP) with scikit-learn transformer interface.

codecov Tests Doc Build

Installation

pip install skpcp

Getting Started

Principal Component Pursuit (PCP) is a method for decomposing a data matrix X into a low-rank component L and a sparse component S, i.e., X = L + S. The skpcp package provides an implementation of PCP with a scikit-learn compatible transformer interface.

At its core the algorithm solves the following optimization problem $$ \min_{L,S} |L|_* + \lambda |S|1 \quad \text{s.t.} \quad X = L + S $$ where $|L|*$ is the nuclear norm (sum of singular values) of L, $|S|_1$ is the element-wise $\ell_1$ norm of S, and $\lambda > 0$ is a regularization parameter that controls the trade-off between the low-rank and sparse components. In practice, the user does not need to set the value of $\lambda$, as it is automatically chosen based on the dimensions of the input data matrix X. We refer the users to the original paper by Candes et al. (2011) for more details: Robust Principal Component Analysis?.

import numpy as np
from skpcp import PCP

# Generate synthetic data with low-rank and sparse components
RNG = np.random.default_rng(42)
n_samples, n_features, rank = 100, 50, 5
L = np.dot(RNG.normal(size=(n_samples, rank)), RNG.normal(size=(rank, n_features)))  # Low rank component
S = RNG.binomial(1, 0.1, size=(n_samples, n_features)) * RNG.normal(loc=0, scale=10, size=(n_samples, n_features))  # Sparse component
X = L + S

# Fit PCP model
pcp = PCP()
pcp.fit(X)
L_est = pcp.low_rank_  # Estimated low-rank component
S_est = pcp.sparse_  # Estimated sparse component

Alternatively you can use the fit_transform method to fit the model and obtain the low-rank component in one step:

L_est = pcp.fit_transform(X)

Note that the fit method decomposes the input data matrix X into its low-rank component L_est and sparse component S_est. The behavior of the transformmethod of PCP differs from that of a typical scikit-learn transformer, in that it accepts the same data matrix X that was used in fit. You cannot pass a new data matrix to transform, as the decomposition is specific to the input data used in fit.

Please see the examples and the API reference for more details.

Documentation

The documentation is supported by Sphinx and it is hosted on GitHub pages.

To build the HTML pages locally, first make sure you have installed the package with its documentation dependencies:

uv pip install -e .[docs]

then run the following:

sphinx-build docs docs/_build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skpcp-0.1.0.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skpcp-0.1.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file skpcp-0.1.0.tar.gz.

File metadata

  • Download URL: skpcp-0.1.0.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for skpcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 73277ae5aff5954572fe2e77e2c352368d47ca0f3400587706367e63720cdee9
MD5 da665a822a49e932ee7d0bea2e7b50e8
BLAKE2b-256 dc9ca7db988b98b1b54bcdc94b8f9cf9b52baee88ca8cfd49d90cdb1375001c0

See more details on using hashes here.

File details

Details for the file skpcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: skpcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for skpcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f1b7880ee430ca776347c3e03bc25b1e48af5bf43564766a1b998d70301ef2c
MD5 9b7ef27780132a7e5b1630041d48afff
BLAKE2b-256 dbebd6b8e6f289327acb2642f507aadab5a7acf06b86919d9070155576d30aa7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page