Advanced Projection Pursuit implementation with tied/untied weights, nonlinear/linear distance distortion, and comprehensive documentation
Project description
🪈 pyppur: Python Projection Pursuit Unsupervised Reduction
Overview
pyppur is a Python package that implements projection pursuit methods for dimensionality reduction. Unlike traditional methods such as PCA, pyppur focuses on finding interesting non-linear projections by minimizing either reconstruction loss or distance distortion.
Installation
pip install pyppur
Features
- Two optimization objectives:
- Distance Distortion: Preserves pairwise distances between data points
- Reconstruction: Minimizes reconstruction error using ridge functions
- Multiple initialization strategies (PCA-based and random)
- Full scikit-learn compatible API
- Supports standardization and custom weighting
Usage
Basic Example
import numpy as np
from pyppur import ProjectionPursuit, Objective
from sklearn.datasets import load_digits
# Load data
digits = load_digits()
X = digits.data
y = digits.target
# Projection pursuit with distance distortion
pp_dist = ProjectionPursuit(
n_components=2,
objective=Objective.DISTANCE_DISTORTION,
alpha=1.5, # Steepness of the ridge function
n_init=3, # Number of random initializations
verbose=True
)
# Fit and transform
X_transformed = pp_dist.fit_transform(X)
# Projection pursuit with reconstruction loss (tied weights)
pp_recon_tied = ProjectionPursuit(
n_components=2,
objective=Objective.RECONSTRUCTION,
alpha=1.0,
tied_weights=True
)
# Projection pursuit with reconstruction loss (free decoder)
pp_recon_free = ProjectionPursuit(
n_components=2,
objective=Objective.RECONSTRUCTION,
alpha=1.0,
tied_weights=False,
l2_reg=0.01
)
# Fit and transform
X_transformed_recon_tied = pp_recon_tied.fit_transform(X)
X_transformed_recon_free = pp_recon_free.fit_transform(X)
# Evaluate the methods
dist_metrics = pp_dist.evaluate(X, y)
recon_tied_metrics = pp_recon_tied.evaluate(X, y)
recon_free_metrics = pp_recon_free.evaluate(X, y)
print("Distance distortion method:")
print(f" Trustworthiness: {dist_metrics['trustworthiness']:.4f}")
print(f" Silhouette: {dist_metrics['silhouette']:.4f}")
print(f" Distance distortion: {dist_metrics['distance_distortion']:.4f}")
print(f" Reconstruction error: {dist_metrics['reconstruction_error']:.4f}")
print("\nReconstruction method (tied weights):")
print(f" Trustworthiness: {recon_tied_metrics['trustworthiness']:.4f}")
print(f" Silhouette: {recon_tied_metrics['silhouette']:.4f}")
print(f" Distance distortion: {recon_tied_metrics['distance_distortion']:.4f}")
print(f" Reconstruction error: {recon_tied_metrics['reconstruction_error']:.4f}")
print("\nReconstruction method (free decoder):")
print(f" Trustworthiness: {recon_free_metrics['trustworthiness']:.4f}")
print(f" Silhouette: {recon_free_metrics['silhouette']:.4f}")
print(f" Distance distortion: {recon_free_metrics['distance_distortion']:.4f}")
print(f" Reconstruction error: {recon_free_metrics['reconstruction_error']:.4f}")
API Reference
The main class in pyppur is ProjectionPursuit, which provides the following methods:
fit(X): Fit the model to datatransform(X): Apply dimensionality reduction to new datafit_transform(X): Fit the model and transform datareconstruct(X): Reconstruct data from projectionsreconstruction_error(X): Compute reconstruction errordistance_distortion(X): Compute distance distortioncompute_trustworthiness(X, n_neighbors): Measure how well local structure is preservedcompute_silhouette(X, labels): Measure how well clusters are separatedevaluate(X, labels, n_neighbors): Compute all evaluation metrics at once
Theory
Projection pursuit finds interesting low-dimensional projections of multivariate data. When used for dimensionality reduction, it aims to optimize an "interestingness" index which can be:
- Distance Distortion: Minimizes the difference between pairwise distances in original and projected spaces (optionally with nonlinearity)
- Reconstruction Error: Minimizes the error when reconstructing the data using ridge functions
Mathematical Formulations
Tied-Weights Ridge Autoencoder (Default)
Z = g(X A^T)
X̂ = Z A
Free Decoder Ridge Autoencoder (Available with tied_weights=False)
Z = g(X A^T)
X̂ = Z B
Where:
Xis the input data matrix (n_samples × n_features)Aare the encoder projection directions (n_components × n_features)Bare the decoder weights (n_components × n_features, when untied)g(z) = tanh(α * z)is the ridge function with steepness parameter αZis the projected data (n_samples × n_components)X̂is the reconstructed data
Distance Distortion Options
- With nonlinearity: Compares distances between original space and
g(X A^T) - Without nonlinearity: Compares distances between original space and linear projections
X A^T
Requirements
- Python 3.10+
- NumPy (>=1.20.0)
- SciPy (>=1.7.0)
- scikit-learn (>=1.0.0)
- matplotlib (>=3.3.0)
License
MIT
Citation
If you use pyppur in your research, please cite it as:
@software{pyppur,
author = {Gaurav Sood},
title = {pyppur: Python Projection Pursuit Unsupervised Reduction},
url = {https://github.com/gojiplus/pyppur},
version = {0.2.0},
year = {2025},
}
🔗 Adjacent Repositories
- gojiplus/get-weather-data — Get weather data for a list of zip codes for a range of dates
- gojiplus/text-as-data — Pipeline for Analyzing Text Data: Acquire, Preprocess, Analyze
- gojiplus/calibre — Advanced Calibration Models
- gojiplus/skiplist_join
- gojiplus/rmcp — R MCP Server
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyppur-0.4.0.tar.gz.
File metadata
- Download URL: pyppur-0.4.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ffc3587500c421af63d4719290705d73011c3c599954b24b671cebc9af3eb2c
|
|
| MD5 |
80e0f880e1057543166bf0e9e3d22a70
|
|
| BLAKE2b-256 |
53ace80929d98806e939d4d1b5d8e7a59ae5ce19a4f9009f97eec777c45aa78f
|
Provenance
The following attestation bundles were made for pyppur-0.4.0.tar.gz:
Publisher:
python-publish.yml on finite-sample/pyppur
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyppur-0.4.0.tar.gz -
Subject digest:
0ffc3587500c421af63d4719290705d73011c3c599954b24b671cebc9af3eb2c - Sigstore transparency entry: 774442543
- Sigstore integration time:
-
Permalink:
finite-sample/pyppur@ae2f2c8b1b9661b31f21a0c24a79113692de0126 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@ae2f2c8b1b9661b31f21a0c24a79113692de0126 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pyppur-0.4.0-py3-none-any.whl.
File metadata
- Download URL: pyppur-0.4.0-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19a54262aff938c61921fb79b94a341594372bcb927db5aadcdf74fe152760e9
|
|
| MD5 |
d60e5227ba4e063f757685897666fdf4
|
|
| BLAKE2b-256 |
d3127ed4e3da9283393a5de351685e2ce95e62453e1705a82d07d79b19f84f69
|
Provenance
The following attestation bundles were made for pyppur-0.4.0-py3-none-any.whl:
Publisher:
python-publish.yml on finite-sample/pyppur
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyppur-0.4.0-py3-none-any.whl -
Subject digest:
19a54262aff938c61921fb79b94a341594372bcb927db5aadcdf74fe152760e9 - Sigstore transparency entry: 774442544
- Sigstore integration time:
-
Permalink:
finite-sample/pyppur@ae2f2c8b1b9661b31f21a0c24a79113692de0126 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@ae2f2c8b1b9661b31f21a0c24a79113692de0126 -
Trigger Event:
workflow_dispatch
-
Statement type: