Projection Pursuit implementation for minimizing reconstruction loss and distance distortion
Project description
pyppur: Python Projection Pursuit Unsupervised Reduction
Overview
pyppur is a Python package that implements projection pursuit methods for dimensionality reduction. Unlike traditional methods such as PCA, pyppur focuses on finding interesting non-linear projections by minimizing either reconstruction loss or distance distortion.
Installation
pip install pyppur
Features
- Two optimization objectives:
- Distance Distortion: Preserves pairwise distances between data points
- Reconstruction: Minimizes reconstruction error using ridge functions
- Multiple initialization strategies (PCA-based and random)
- Full scikit-learn compatible API
- Supports standardization and custom weighting
Usage
Basic Example
import numpy as np
from pyppur import ProjectionPursuit, Objective
from sklearn.datasets import load_digits
# Load data
digits = load_digits()
X = digits.data
y = digits.target
# Projection pursuit with distance distortion
pp_dist = ProjectionPursuit(
n_components=2,
objective=Objective.DISTANCE_DISTORTION,
alpha=1.5, # Steepness of the ridge function
n_init=3, # Number of random initializations
verbose=True
)
# Fit and transform
X_transformed = pp_dist.fit_transform(X)
# Projection pursuit with reconstruction loss
pp_recon = ProjectionPursuit(
n_components=2,
objective=Objective.RECONSTRUCTION,
alpha=1.0
)
# Fit and transform
X_transformed_recon = pp_recon.fit_transform(X)
# Evaluate the methods
dist_metrics = pp_dist.evaluate(X, y)
recon_metrics = pp_recon.evaluate(X, y)
print("Distance distortion method:")
print(f" Trustworthiness: {dist_metrics['trustworthiness']:.4f}")
print(f" Silhouette: {dist_metrics['silhouette']:.4f}")
print(f" Distance distortion: {dist_metrics['distance_distortion']:.4f}")
print(f" Reconstruction error: {dist_metrics['reconstruction_error']:.4f}")
print("\nReconstruction method:")
print(f" Trustworthiness: {recon_metrics['trustworthiness']:.4f}")
print(f" Silhouette: {recon_metrics['silhouette']:.4f}")
print(f" Distance distortion: {recon_metrics['distance_distortion']:.4f}")
print(f" Reconstruction error: {recon_metrics['reconstruction_error']:.4f}")
Comparing Methods
import matplotlib.pyplot as plt
# Plot the results
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.scatter(X_transformed[:, 0], X_transformed[:, 1], c=y, cmap='tab10')
plt.title('Projection Pursuit (Distance Distortion)')
plt.colorbar()
plt.subplot(1, 2, 2)
plt.scatter(X_transformed_recon[:, 0], X_transformed_recon[:, 1], c=y, cmap='tab10')
plt.title('Projection Pursuit (Reconstruction)')
plt.colorbar()
plt.tight_layout()
plt.show()
API Reference
The main class in pyppur is ProjectionPursuit, which provides the following methods:
fit(X): Fit the model to datatransform(X): Apply dimensionality reduction to new datafit_transform(X): Fit the model and transform datareconstruct(X): Reconstruct data from projectionsreconstruction_error(X): Compute reconstruction errordistance_distortion(X): Compute distance distortioncompute_trustworthiness(X, n_neighbors): Measure how well local structure is preservedcompute_silhouette(X, labels): Measure how well clusters are separatedevaluate(X, labels, n_neighbors): Compute all evaluation metrics at once
Theory
Projection pursuit finds interesting low-dimensional projections of multivariate data. When used for dimensionality reduction, it aims to optimize an "interestingness" index which can be:
- Distance Distortion: Minimizes the difference between pairwise distances in original and projected spaces
- Reconstruction Error: Minimizes the error when reconstructing the data using ridge functions
The mathematical formulation for the ridge function autoencoder is:
z_i = a_j^T x_i
x̂_i = ∑_j g(z_i) a_j
Where:
x_iis the input data pointa_jare the projection directionsg(z)is the ridge function (tanh in our implementation)x̂_iis the reconstructed data point
Requirements
- Python 3.8+
- NumPy
- SciPy
- scikit-learn
License
MIT
Citation
If you use pyppur in your research, please cite it as:
@software{pyppur,
author = {Your Name},
title = {pyppur: Python Projection Pursuit Unsupervised Reduction},
url = {https://github.com/yourusername/pyppur},
version = {0.1.0},
year = {2023},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyppur-0.1.0.tar.gz.
File metadata
- Download URL: pyppur-0.1.0.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
567084e5a918bb14467c04becf5d30c6984a7e060223bad9095c8d11de81e321
|
|
| MD5 |
8df3eb65e34a92c0b9d8f48f30276ba3
|
|
| BLAKE2b-256 |
b0ca5a00c4092c9b3853facb81e7602303fdb4aaad9ebabe6ad39a8f512e27b6
|
File details
Details for the file pyppur-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyppur-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1db0aa70640f7dc23d0c176d0f78811a59faf09547501dbdd7b1317ea43df5b7
|
|
| MD5 |
598153dc6b6ed7b7339bd7284c191a5f
|
|
| BLAKE2b-256 |
a3bfefab35636a417bdc41b36ee99c09083dbc57f29b8bb3fa811e4bb115d551
|