Skip to main content

Projection Pursuit implementation for minimizing reconstruction loss and distance distortion

Project description

🪈 pyppur: Python Projection Pursuit Unsupervised Reduction

PyPI PyPI Downloads

Overview

pyppur is a Python package that implements projection pursuit methods for dimensionality reduction. Unlike traditional methods such as PCA, pyppur focuses on finding interesting non-linear projections by minimizing either reconstruction loss or distance distortion.

Installation

pip install pyppur

Features

  • Two optimization objectives:
    • Distance Distortion: Preserves pairwise distances between data points
    • Reconstruction: Minimizes reconstruction error using ridge functions
  • Multiple initialization strategies (PCA-based and random)
  • Full scikit-learn compatible API
  • Supports standardization and custom weighting

Usage

Basic Example

import numpy as np
from pyppur import ProjectionPursuit, Objective
from sklearn.datasets import load_digits

# Load data
digits = load_digits()
X = digits.data
y = digits.target

# Projection pursuit with distance distortion
pp_dist = ProjectionPursuit(
    n_components=2,
    objective=Objective.DISTANCE_DISTORTION,
    alpha=1.5,  # Steepness of the ridge function
    n_init=3,   # Number of random initializations
    verbose=True
)

# Fit and transform
X_transformed = pp_dist.fit_transform(X)

# Projection pursuit with reconstruction loss
pp_recon = ProjectionPursuit(
    n_components=2,
    objective=Objective.RECONSTRUCTION,
    alpha=1.0
)

# Fit and transform
X_transformed_recon = pp_recon.fit_transform(X)

# Evaluate the methods
dist_metrics = pp_dist.evaluate(X, y)
recon_metrics = pp_recon.evaluate(X, y)

print("Distance distortion method:")
print(f"  Trustworthiness: {dist_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {dist_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {dist_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {dist_metrics['reconstruction_error']:.4f}")

print("\nReconstruction method:")
print(f"  Trustworthiness: {recon_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {recon_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {recon_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {recon_metrics['reconstruction_error']:.4f}")

API Reference

The main class in pyppur is ProjectionPursuit, which provides the following methods:

  • fit(X): Fit the model to data
  • transform(X): Apply dimensionality reduction to new data
  • fit_transform(X): Fit the model and transform data
  • reconstruct(X): Reconstruct data from projections
  • reconstruction_error(X): Compute reconstruction error
  • distance_distortion(X): Compute distance distortion
  • compute_trustworthiness(X, n_neighbors): Measure how well local structure is preserved
  • compute_silhouette(X, labels): Measure how well clusters are separated
  • evaluate(X, labels, n_neighbors): Compute all evaluation metrics at once

Theory

Projection pursuit finds interesting low-dimensional projections of multivariate data. When used for dimensionality reduction, it aims to optimize an "interestingness" index which can be:

  1. Distance Distortion: Minimizes the difference between pairwise distances in original and projected spaces
  2. Reconstruction Error: Minimizes the error when reconstructing the data using ridge functions

The mathematical formulation for the ridge function autoencoder is:

z_i = a_j^T x_i
x̂_i = ∑_j g(z_i) a_j

Where:

  • x_i is the input data point
  • a_j are the projection directions
  • g(z) is the ridge function (tanh in our implementation)
  • x̂_i is the reconstructed data point

Requirements

  • Python 3.8+
  • NumPy (>=1.20.0)
  • SciPy (>=1.7.0)
  • scikit-learn (>=1.0.0)
  • matplotlib (>=3.3.0)

License

MIT

Citation

If you use pyppur in your research, please cite it as:

@software{pyppur,
  author = {Gaurav Sood},
  title = {pyppur: Python Projection Pursuit Unsupervised Reduction},
  url = {https://github.com/gojiplus/pyppur},
  version = {0.2.0},
  year = {2025},
}

🔗 Adjacent Repositories

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyppur-0.2.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyppur-0.2.0-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file pyppur-0.2.0.tar.gz.

File metadata

  • Download URL: pyppur-0.2.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for pyppur-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bcf54ee8ff1ddb76368b3e8649180a1101a72cc11bd06a0b2baebd0e96160703
MD5 e4c9cc1e43af83f67b3b918f20f6a440
BLAKE2b-256 71f172007bc2d70b6b3a186bf8b68d1a0e62ff66d05a952724207c9938e405db

See more details on using hashes here.

File details

Details for the file pyppur-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyppur-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for pyppur-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82d105d56d078efa241880fa7733b5ba0eb28c5dd99bb9f86dc98433669eebaf
MD5 16b31ebf9588e0d86063b0e390bc988a
BLAKE2b-256 8f0f68e18c43b90196e6091f86a2aee392e88defc0c4be183ff66dadeb4e1d07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page