Skip to main content

Advanced Projection Pursuit implementation with tied/untied weights, nonlinear/linear distance distortion, and comprehensive documentation

Project description

🪈 pyppur: Python Projection Pursuit Unsupervised Reduction

PyPI PyPI Downloads Documentation

Overview

pyppur is a Python package that implements projection pursuit methods for dimensionality reduction. Unlike traditional methods such as PCA, pyppur focuses on finding interesting non-linear projections by minimizing either reconstruction loss or distance distortion.

Installation

pip install pyppur

Features

  • Two optimization objectives:
    • Distance Distortion: Preserves pairwise distances between data points
    • Reconstruction: Minimizes reconstruction error using ridge functions
  • Multiple initialization strategies (PCA-based and random)
  • Full scikit-learn compatible API
  • Supports standardization and custom weighting

Usage

Basic Example

import numpy as np
from pyppur import ProjectionPursuit, Objective
from sklearn.datasets import load_digits

# Load data
digits = load_digits()
X = digits.data
y = digits.target

# Projection pursuit with distance distortion
pp_dist = ProjectionPursuit(
    n_components=2,
    objective=Objective.DISTANCE_DISTORTION,
    alpha=1.5,  # Steepness of the ridge function
    n_init=3,   # Number of random initializations
    verbose=True
)

# Fit and transform
X_transformed = pp_dist.fit_transform(X)

# Projection pursuit with reconstruction loss (tied weights)
pp_recon_tied = ProjectionPursuit(
    n_components=2,
    objective=Objective.RECONSTRUCTION,
    alpha=1.0,
    tied_weights=True
)

# Projection pursuit with reconstruction loss (free decoder)
pp_recon_free = ProjectionPursuit(
    n_components=2,
    objective=Objective.RECONSTRUCTION,
    alpha=1.0,
    tied_weights=False,
    l2_reg=0.01
)

# Fit and transform
X_transformed_recon_tied = pp_recon_tied.fit_transform(X)
X_transformed_recon_free = pp_recon_free.fit_transform(X)

# Evaluate the methods
dist_metrics = pp_dist.evaluate(X, y)
recon_tied_metrics = pp_recon_tied.evaluate(X, y)
recon_free_metrics = pp_recon_free.evaluate(X, y)

print("Distance distortion method:")
print(f"  Trustworthiness: {dist_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {dist_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {dist_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {dist_metrics['reconstruction_error']:.4f}")

print("\nReconstruction method (tied weights):")
print(f"  Trustworthiness: {recon_tied_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {recon_tied_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {recon_tied_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {recon_tied_metrics['reconstruction_error']:.4f}")

print("\nReconstruction method (free decoder):")
print(f"  Trustworthiness: {recon_free_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {recon_free_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {recon_free_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {recon_free_metrics['reconstruction_error']:.4f}")

API Reference

The main class in pyppur is ProjectionPursuit, which provides the following methods:

  • fit(X): Fit the model to data
  • transform(X): Apply dimensionality reduction to new data
  • fit_transform(X): Fit the model and transform data
  • reconstruct(X): Reconstruct data from projections
  • reconstruction_error(X): Compute reconstruction error
  • distance_distortion(X): Compute distance distortion
  • compute_trustworthiness(X, n_neighbors): Measure how well local structure is preserved
  • compute_silhouette(X, labels): Measure how well clusters are separated
  • evaluate(X, labels, n_neighbors): Compute all evaluation metrics at once

Theory

Projection pursuit finds interesting low-dimensional projections of multivariate data. When used for dimensionality reduction, it aims to optimize an "interestingness" index which can be:

  1. Distance Distortion: Minimizes the difference between pairwise distances in original and projected spaces (optionally with nonlinearity)
  2. Reconstruction Error: Minimizes the error when reconstructing the data using ridge functions

Mathematical Formulations

Tied-Weights Ridge Autoencoder (Default)

Z = g(X A^T)
X̂ = Z A

Free Decoder Ridge Autoencoder (Available with tied_weights=False)

Z = g(X A^T)  
X̂ = Z B

Where:

  • X is the input data matrix (n_samples × n_features)
  • A are the encoder projection directions (n_components × n_features)
  • B are the decoder weights (n_components × n_features, when untied)
  • g(z) = tanh(α * z) is the ridge function with steepness parameter α
  • Z is the projected data (n_samples × n_components)
  • is the reconstructed data

Distance Distortion Options

  • With nonlinearity: Compares distances between original space and g(X A^T)
  • Without nonlinearity: Compares distances between original space and linear projections X A^T

Requirements

  • Python 3.8+
  • NumPy (>=1.20.0)
  • SciPy (>=1.7.0)
  • scikit-learn (>=1.0.0)
  • matplotlib (>=3.3.0)

License

MIT

Citation

If you use pyppur in your research, please cite it as:

@software{pyppur,
  author = {Gaurav Sood},
  title = {pyppur: Python Projection Pursuit Unsupervised Reduction},
  url = {https://github.com/gojiplus/pyppur},
  version = {0.2.0},
  year = {2025},
}

🔗 Adjacent Repositories

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyppur-0.3.0.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyppur-0.3.0-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file pyppur-0.3.0.tar.gz.

File metadata

  • Download URL: pyppur-0.3.0.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyppur-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d882476ba2d6e1dac79b0b666eb3655251ac0c7921406c9280cec30f5c250257
MD5 02e38bfd6b707f4298ab759e25f34d14
BLAKE2b-256 d79c5ce6b64c4596c4a547beb35a413ff347a9fce2edf37ff3c39aeb98063b1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyppur-0.3.0.tar.gz:

Publisher: python-publish.yml on finite-sample/pyppur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyppur-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: pyppur-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyppur-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e5d10ac518262c96d6b8b870e4729185dd6bcb66c924cb73c29d929449100d4
MD5 d8aaa00e00bca860a9b5ddabe83e9641
BLAKE2b-256 6377cf23c5ff896885cc9d6103904bb51041bbd385de1558891f54af1d50afbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyppur-0.3.0-py3-none-any.whl:

Publisher: python-publish.yml on finite-sample/pyppur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page