Skip to main content

A streamlined and fast implementation of parametric UMAP using PyTorch and FAISS

Project description

Parametric UMAP

A PyTorch implementation of Parametric UMAP (Uniform Manifold Approximation and Projection) for learning low-dimensional parametric embeddings of high-dimensional data.

Install

A proper installation of PyTorch (possibly with GPU accelaration) is suggested before installing this package. The package can then be installed with

pip install parametric_umap

Overview

Parametric UMAP (original paper) extends the original UMAP algorithm by learning a neural network that can map new data points to the lower-dimensional space without having to rerun the entire optimization. This (unofficial) implementation provides a flexible and efficient way to perform parametric dimensionality reduction leveraging PyTorch and FAISS.

Features

  • Neural network-based parametric mapping
  • Efficient nearest neighbor computation using FAISS
  • Sparse matrix operations for memory efficiency
  • GPU acceleration support
  • Model saving and loading capabilities
  • Correlation loss term to preserve distance relationships

Quick start

from parametric_umap import ParametricUMAP
from sklearn.datasets import make_swiss_roll
import numpy as np

# Create sample data
n_samples = 1000
X, color = make_swiss_roll(n_samples=n_samples, random_state=42)

# Initialize and fit the model
pumap = ParametricUMAP(
    n_components=2,
    hidden_dim=128,
    n_layers=3,
    n_epochs=10
)

# Fit and transform the data
embeddings = pumap.fit_transform(X)

# Transform new data
X_new = np.random.rand(100, 3)
new_embeddings = pumap.transform(X_new)

Key Parameters

Hyperparameters default values follow the original UMAP implementation

UMAP parameters

  • a: parameter for scaling distances between embedded points
  • b: parameter for controlling sharpness of the curve's transition between attraction and repulsion
  • n_neighbors: number of neighbors to compute for UMAP knn graph (default: 15)

Parametric model

  • n_components: Dimension of the output embedding (default: 2)
  • hidden_dim: Dimension of hidden layers in the MLP (default: 1024)
  • n_layers: Number of hidden layers (default: 3)
  • n_neighbors: Number of nearest neighbors (default: 15)
  • correlation_weight: Weight of the correlation loss term (default: 0.1)
  • learning_rate: Learning rate for optimization (default: 1e-4)
  • n_epochs: Number of training epochs (default: 10)
  • batch_size: Training batch size (default: 32)
  • use_batchnorm: Whether to use batch normalization in the embedding MLP (default: False)
  • use_dropout: Whether to use dropout in the embedding MLP (default: False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parametric_umap-0.1.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parametric_umap-0.1.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file parametric_umap-0.1.0.tar.gz.

File metadata

  • Download URL: parametric_umap-0.1.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for parametric_umap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1c63d650b2a7fb3c7e6fdc503bb928046d7a55dce93b6055d81d5f2e1eb2eb1b
MD5 1cd8b323e431526d877da18753220421
BLAKE2b-256 3e5f39d05bf3485bae8de439a0fe9e8b2a065408f273c5cf6e0920a5d5b68a8f

See more details on using hashes here.

File details

Details for the file parametric_umap-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for parametric_umap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ca7db111d677c66e1971f898b98309ac8262b0c1681dda018bf8a42e1c429a2
MD5 32c59f3f00e0b883f9bd3de591508e28
BLAKE2b-256 12e2b9580ddbe88f02ac077418d7d898d6e5be338a5d295e19437f23b0fb44fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page