Skip to main content

Projection Pursuit implementation for minimizing reconstruction loss and distance distortion

Project description

pyppur: Python Projection Pursuit Unsupervised Reduction

Overview

pyppur is a Python package that implements projection pursuit methods for dimensionality reduction. Unlike traditional methods such as PCA, pyppur focuses on finding interesting non-linear projections by minimizing either reconstruction loss or distance distortion.

Installation

pip install pyppur

Features

  • Two optimization objectives:
    • Distance Distortion: Preserves pairwise distances between data points
    • Reconstruction: Minimizes reconstruction error using ridge functions
  • Multiple initialization strategies (PCA-based and random)
  • Full scikit-learn compatible API
  • Supports standardization and custom weighting

Usage

Basic Example

import numpy as np
from pyppur import ProjectionPursuit, Objective
from sklearn.datasets import load_digits

# Load data
digits = load_digits()
X = digits.data
y = digits.target

# Projection pursuit with distance distortion
pp_dist = ProjectionPursuit(
    n_components=2,
    objective=Objective.DISTANCE_DISTORTION,
    alpha=1.5,  # Steepness of the ridge function
    n_init=3,   # Number of random initializations
    verbose=True
)

# Fit and transform
X_transformed = pp_dist.fit_transform(X)

# Projection pursuit with reconstruction loss
pp_recon = ProjectionPursuit(
    n_components=2,
    objective=Objective.RECONSTRUCTION,
    alpha=1.0
)

# Fit and transform
X_transformed_recon = pp_recon.fit_transform(X)

# Evaluate the methods
dist_metrics = pp_dist.evaluate(X, y)
recon_metrics = pp_recon.evaluate(X, y)

print("Distance distortion method:")
print(f"  Trustworthiness: {dist_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {dist_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {dist_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {dist_metrics['reconstruction_error']:.4f}")

print("\nReconstruction method:")
print(f"  Trustworthiness: {recon_metrics['trustworthiness']:.4f}")
print(f"  Silhouette: {recon_metrics['silhouette']:.4f}")
print(f"  Distance distortion: {recon_metrics['distance_distortion']:.4f}")
print(f"  Reconstruction error: {recon_metrics['reconstruction_error']:.4f}")

Comparing Methods

import matplotlib.pyplot as plt

# Plot the results
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X_transformed[:, 0], X_transformed[:, 1], c=y, cmap='tab10')
plt.title('Projection Pursuit (Distance Distortion)')
plt.colorbar()

plt.subplot(1, 2, 2)
plt.scatter(X_transformed_recon[:, 0], X_transformed_recon[:, 1], c=y, cmap='tab10')
plt.title('Projection Pursuit (Reconstruction)')
plt.colorbar()

plt.tight_layout()
plt.show()

API Reference

The main class in pyppur is ProjectionPursuit, which provides the following methods:

  • fit(X): Fit the model to data
  • transform(X): Apply dimensionality reduction to new data
  • fit_transform(X): Fit the model and transform data
  • reconstruct(X): Reconstruct data from projections
  • reconstruction_error(X): Compute reconstruction error
  • distance_distortion(X): Compute distance distortion
  • compute_trustworthiness(X, n_neighbors): Measure how well local structure is preserved
  • compute_silhouette(X, labels): Measure how well clusters are separated
  • evaluate(X, labels, n_neighbors): Compute all evaluation metrics at once

Theory

Projection pursuit finds interesting low-dimensional projections of multivariate data. When used for dimensionality reduction, it aims to optimize an "interestingness" index which can be:

  1. Distance Distortion: Minimizes the difference between pairwise distances in original and projected spaces
  2. Reconstruction Error: Minimizes the error when reconstructing the data using ridge functions

The mathematical formulation for the ridge function autoencoder is:

z_i = a_j^T x_i
x̂_i = ∑_j g(z_i) a_j

Where:

  • x_i is the input data point
  • a_j are the projection directions
  • g(z) is the ridge function (tanh in our implementation)
  • x̂_i is the reconstructed data point

Requirements

  • Python 3.8+
  • NumPy
  • SciPy
  • scikit-learn

License

MIT

Citation

If you use pyppur in your research, please cite it as:

@software{pyppur,
  author = {Your Name},
  title = {pyppur: Python Projection Pursuit Unsupervised Reduction},
  url = {https://github.com/yourusername/pyppur},
  version = {0.1.0},
  year = {2023},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyppur-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyppur-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file pyppur-0.1.0.tar.gz.

File metadata

  • Download URL: pyppur-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyppur-0.1.0.tar.gz
Algorithm Hash digest
SHA256 567084e5a918bb14467c04becf5d30c6984a7e060223bad9095c8d11de81e321
MD5 8df3eb65e34a92c0b9d8f48f30276ba3
BLAKE2b-256 b0ca5a00c4092c9b3853facb81e7602303fdb4aaad9ebabe6ad39a8f512e27b6

See more details on using hashes here.

File details

Details for the file pyppur-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyppur-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyppur-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1db0aa70640f7dc23d0c176d0f78811a59faf09547501dbdd7b1317ea43df5b7
MD5 598153dc6b6ed7b7339bd7284c191a5f
BLAKE2b-256 a3bfefab35636a417bdc41b36ee99c09083dbc57f29b8bb3fa811e4bb115d551

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page