A streamlined and fast implementation of parametric UMAP using PyTorch and FAISS
Project description
Parametric UMAP
A PyTorch implementation of Parametric UMAP (Uniform Manifold Approximation and Projection) for learning low-dimensional parametric embeddings of high-dimensional data.
Install
A proper installation of PyTorch (possibly with GPU accelaration) is suggested before installing this package. The package can then be installed with
pip install parametric_umap
Overview
Parametric UMAP (original paper) extends the original UMAP algorithm by learning a neural network that can map new data points to the lower-dimensional space without having to rerun the entire optimization. This (unofficial) implementation provides a flexible and efficient way to perform parametric dimensionality reduction leveraging PyTorch and FAISS.
Features
- Neural network-based parametric mapping
- Efficient nearest neighbor computation using FAISS
- Sparse matrix operations for memory efficiency
- GPU acceleration support
- Model saving and loading capabilities
- Correlation loss term to preserve distance relationships
Quick start
from parametric_umap import ParametricUMAP
from sklearn.datasets import make_swiss_roll
import numpy as np
# Create sample data
n_samples = 1000
X, color = make_swiss_roll(n_samples=n_samples, random_state=42)
# Initialize and fit the model
pumap = ParametricUMAP(
n_components=2,
hidden_dim=128,
n_layers=3,
n_epochs=10
)
# Fit and transform the data
embeddings = pumap.fit_transform(X)
# Transform new data
X_new = np.random.rand(100, 3)
new_embeddings = pumap.transform(X_new)
Key Parameters
Hyperparameters default values follow the original UMAP implementation
UMAP parameters
a: parameter for scaling distances between embedded pointsb: parameter for controlling sharpness of the curve's transition between attraction and repulsionn_neighbors: number of neighbors to compute for UMAP knn graph (default: 15)
Parametric model
n_components: Dimension of the output embedding (default: 2)hidden_dim: Dimension of hidden layers in the MLP (default: 1024)n_layers: Number of hidden layers (default: 3)n_neighbors: Number of nearest neighbors (default: 15)correlation_weight: Weight of the correlation loss term (default: 0.1)learning_rate: Learning rate for optimization (default: 1e-4)n_epochs: Number of training epochs (default: 10)batch_size: Training batch size (default: 32)use_batchnorm: Whether to use batch normalization in the embedding MLP (default: False)use_dropout: Whether to use dropout in the embedding MLP (default: False)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parametric_umap-0.1.0.tar.gz.
File metadata
- Download URL: parametric_umap-0.1.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c63d650b2a7fb3c7e6fdc503bb928046d7a55dce93b6055d81d5f2e1eb2eb1b
|
|
| MD5 |
1cd8b323e431526d877da18753220421
|
|
| BLAKE2b-256 |
3e5f39d05bf3485bae8de439a0fe9e8b2a065408f273c5cf6e0920a5d5b68a8f
|
File details
Details for the file parametric_umap-0.1.0-py3-none-any.whl.
File metadata
- Download URL: parametric_umap-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ca7db111d677c66e1971f898b98309ac8262b0c1681dda018bf8a42e1c429a2
|
|
| MD5 |
32c59f3f00e0b883f9bd3de591508e28
|
|
| BLAKE2b-256 |
12e2b9580ddbe88f02ac077418d7d898d6e5be338a5d295e19437f23b0fb44fd
|