Skip to main content

Advanced Projection Pursuit implementation with tied/untied weights, nonlinear/linear distance distortion, and comprehensive documentation

Project description

🪈 pyppur: Python Projection Pursuit Unsupervised Reduction

PyPI PyPI Downloads Documentation CI

Overview

pyppur is a Python package that implements projection pursuit methods for dimensionality reduction. Unlike traditional methods such as PCA, pyppur focuses on finding interesting non-linear projections by minimizing either reconstruction loss or distance distortion.

Installation

pip install pyppur

Features

  • Two optimization objectives:
    • Distance Distortion: Preserves pairwise distances between data points
    • Reconstruction: Minimizes reconstruction error using ridge functions
  • Multiple initialization strategies (PCA-based and random)
  • Full scikit-learn compatible API
  • Supports standardization and custom weighting

Usage

Basic Example

import numpy as np
from pyppur import ProjectionPursuit, Objective
from sklearn.datasets import load_digits

# Load data
digits = load_digits()
X = digits.data
y = digits.target

# Projection pursuit with distance distortion
pp_dist = ProjectionPursuit(
    n_components=2,
    objective=Objective.DISTANCE_DISTORTION,
    alpha=1.5,  # Steepness of the ridge function
    n_init=3,   # Number of random initializations
    verbose=True
)

# Fit and transform
X_transformed = pp_dist.fit_transform(X)

# Projection pursuit with reconstruction loss (tied weights)
pp_recon_tied = ProjectionPursuit(
    n_components=2,
    objective=Objective.RECONSTRUCTION,
    alpha=1.0,
    tied_weights=True
)

# Projection pursuit with reconstruction loss (free decoder)
pp_recon_free = ProjectionPursuit(
    n_components=2,
    objective=Objective.RECONSTRUCTION,
    alpha=1.0,
    tied_weights=False,
    l2_reg=0.01
)

# Fit and transform
X_transformed_recon_tied = pp_recon_tied.fit_transform(X)
X_transformed_recon_free = pp_recon_free.fit_transform(X)

# Evaluate the methods
dist_metrics = pp_dist.evaluate(X, y)
recon_tied_metrics = pp_recon_tied.evaluate(X, y)
recon_free_metrics = pp_recon_free.evaluate(X, y)

print("Distance distortion method:")
print(f"  Trustworthiness: {dist_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {dist_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {dist_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {dist_metrics['reconstruction_error']:.4f}")

print("\nReconstruction method (tied weights):")
print(f"  Trustworthiness: {recon_tied_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {recon_tied_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {recon_tied_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {recon_tied_metrics['reconstruction_error']:.4f}")

print("\nReconstruction method (free decoder):")
print(f"  Trustworthiness: {recon_free_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {recon_free_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {recon_free_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {recon_free_metrics['reconstruction_error']:.4f}")

API Reference

The main class in pyppur is ProjectionPursuit, which provides the following methods:

  • fit(X): Fit the model to data
  • transform(X): Apply dimensionality reduction to new data
  • fit_transform(X): Fit the model and transform data
  • reconstruct(X): Reconstruct data from projections
  • reconstruction_error(X): Compute reconstruction error
  • distance_distortion(X): Compute distance distortion
  • compute_trustworthiness(X, n_neighbors): Measure how well local structure is preserved
  • compute_silhouette(X, labels): Measure how well clusters are separated
  • evaluate(X, labels, n_neighbors): Compute all evaluation metrics at once

Theory

Projection pursuit finds interesting low-dimensional projections of multivariate data. When used for dimensionality reduction, it aims to optimize an "interestingness" index which can be:

  1. Distance Distortion: Minimizes the difference between pairwise distances in original and projected spaces (optionally with nonlinearity)
  2. Reconstruction Error: Minimizes the error when reconstructing the data using ridge functions

Mathematical Formulations

Tied-Weights Ridge Autoencoder (Default)

Z = g(X A^T)
X̂ = Z A

Free Decoder Ridge Autoencoder (Available with tied_weights=False)

Z = g(X A^T)  
X̂ = Z B

Where:

  • X is the input data matrix (n_samples × n_features)
  • A are the encoder projection directions (n_components × n_features)
  • B are the decoder weights (n_components × n_features, when untied)
  • g(z) = tanh(α * z) is the ridge function with steepness parameter α
  • Z is the projected data (n_samples × n_components)
  • is the reconstructed data

Distance Distortion Options

  • With nonlinearity: Compares distances between original space and g(X A^T)
  • Without nonlinearity: Compares distances between original space and linear projections X A^T

Requirements

  • Python 3.10+
  • NumPy (>=1.20.0)
  • SciPy (>=1.7.0)
  • scikit-learn (>=1.0.0)
  • matplotlib (>=3.3.0)

License

MIT

Citation

If you use pyppur in your research, please cite it as:

@software{pyppur,
  author = {Gaurav Sood},
  title = {pyppur: Python Projection Pursuit Unsupervised Reduction},
  url = {https://github.com/gojiplus/pyppur},
  version = {0.2.0},
  year = {2025},
}

🔗 Adjacent Repositories

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyppur-0.3.1.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyppur-0.3.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file pyppur-0.3.1.tar.gz.

File metadata

  • Download URL: pyppur-0.3.1.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyppur-0.3.1.tar.gz
Algorithm Hash digest
SHA256 8a71b1ba568516ddc1f9e4ccdcaa57df0b28f2af212739e65190a635a7251586
MD5 8a58856a9f9bcdc5bbf2f2696bf04552
BLAKE2b-256 4a29a07332b1b6ead0fbb996e3d4f322f9d952b0c9dfa398646e0ae3acc5b6e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyppur-0.3.1.tar.gz:

Publisher: python-publish.yml on finite-sample/pyppur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyppur-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: pyppur-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyppur-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 936234cc6435ec9ea071f36b3510e79162930134fb91aaf2e9d4616e4c0339c2
MD5 1015a3db4355d1f78b845aa5bafd221f
BLAKE2b-256 0aed2cea473ccc738ad5125e1dbb9824c268b10267bca4e26e59360e1995db8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyppur-0.3.1-py3-none-any.whl:

Publisher: python-publish.yml on finite-sample/pyppur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page