Torch Dimensionality Reduction Library
Project description
Torch Dimensionality Reduction
TorchDR is an open-source library for dimensionality reduction (DR) built on PyTorch. DR constructs low-dimensional representations (or embeddings) that best preserve the intrinsic geometry of an input dataset encoded via a pairwise affinity matrix. TorchDR provides GPU-accelerated implementations of popular DR algorithms in a unified framework, ensuring high performance by leveraging the latest advances of the PyTorch ecosystem.
Key Features
🚀 Blazing Fast: engineered for speed with GPU acceleration, torch.compile support, and optimized algorithms leveraging sparsity and negative sampling.
🧩 Modular by Design: very component is designed to be easily customized, extended, or replaced to fit your specific needs.
🪶 Memory-Efficient: natively handles sparsity and memory-efficient symbolic operations to process massive datasets without memory overflows.
🤝 Seamless Integration: Fully compatible with the scikit-learn and PyTorch ecosystems. Use familiar APIs and integrate effortlessly into your existing workflows.
📦 Minimal Dependencies: requires only PyTorch, NumPy, and scikit‑learn; optionally add Faiss for fast k‑NN or KeOps for symbolic computation.
Getting Started
TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import UMAP
x = fetch_openml("mnist_784").data.astype("float32")
z = UMAP(n_neighbors=30).fit_transform(x)
🚀 GPU Acceleration
TorchDR is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda" as shown in the example below:
z_gpu = UMAP(n_neighbors=30, device="cuda").fit_transform(x)
🔥 PyTorch 2.0+ torch.compile Support
TorchDR supports torch.compile for an additional performance boost on modern PyTorch versions. Just add the compile=True flag as follows:
z_gpu_compile = UMAP(n_neighbors=30, device="cuda", compile=True).fit_transform(x)
⚙️ Backends
The backend keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.
- Set
backend="faiss"to rely on Faiss for fast kNN computations (Recommended). - To perform exact symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps library. This library also allows computing kNN graphs. To enable KeOps, set
backend="keops". - Finally, setting
backend=Nonewill use raw PyTorch for all computations.
Methods
Neighbor Embedding (optimal for data visualization)
TorchDR provides a suite of neighbor embedding methods.
Linear-time (Negative Sampling). State-of-the-art speed on large datasets: UMAP, LargeVis, InfoTSNE, PACMAP.
Quadratic-time (Exact Repulsion). Compute the full pairwise repulsion: SNE, TSNE, TSNEkhorn, COSNE.
Remark. For quadratic-time algorithms,
TorchDRprovides exact implementations that scale linearly in memory usingbackend=keops. ForTSNEspecifically, one can also explore fast approximations, such asFIt-SNEimplemented in tsne-cuda, which bypass full pairwise repulsion.
Spectral Embedding
TorchDR provides various spectral embedding methods: PCA, IncrementalPCA, KernelPCA, PHATE.
Benchmarks
Relying on TorchDR enables an orders-of-magnitude improvement in runtime performance compared to CPU-based implementations. See the code.
Examples
See the examples folder for all examples.
MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.
CIFAR100. (Code)
Visualizing the CIFAR100 dataset using DINO features and TSNE.
Advanced Features
Affinities
TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:
- Affinities based on k-NN normalizations:
SelfTuningAffinity,MAGICAffinity,UMAPAffinity,PHATEAffinity,PACMAPAffinity. - Doubly stochastic affinities:
SinkhornAffinity,DoublyStochasticQuadraticAffinity. - Adaptive affinities with entropy control:
EntropicAffinity,SymmetricEntropicAffinity.
Evaluation Metric
TorchDR provides efficient GPU-compatible evaluation metrics: silhouette_score.
Installation
Install the core torchdr library from PyPI:
pip install torchdr
:warning: torchdr does not install faiss-gpu or pykeops by default. You need to install them separately to use the corresponding backends.
-
Faiss (Recommended): For the fastest k-NN computations, install Faiss. Please follow their official installation guide. A common method is using
conda:conda install -c pytorch -c nvidia faiss-gpu
-
KeOps: For memory-efficient symbolic computations, install PyKeOps.
pip install pykeops
Installation from Source
If you want to use the latest, unreleased version of torchdr, you can install it directly from GitHub:
pip install git+https://github.com/torchdr/torchdr
Finding Help
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torchdr-0.3.tar.gz.
File metadata
- Download URL: torchdr-0.3.tar.gz
- Upload date:
- Size: 3.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb6df12cad9b7fb56bf01a737a670c4a4e67d320f6563059e2f8eccba3904d7d
|
|
| MD5 |
649187c6b50254a576619b9e5c2dda08
|
|
| BLAKE2b-256 |
733a9c0fe7f8ae03c6e5ce8f95554d8b8e0692b812a3211d2dc7dc0118aea6cc
|
File details
Details for the file torchdr-0.3-py3-none-any.whl.
File metadata
- Download URL: torchdr-0.3-py3-none-any.whl
- Upload date:
- Size: 124.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0bedb5a79788971a1f9d12919922ffd660664757dd44606915feb521f659cc7
|
|
| MD5 |
fcfaa5ab2a23bd128a3b292a8b47a21f
|
|
| BLAKE2b-256 |
4c9ff9a9a6709c013323ba4b78eab8c5bfac3c0ccdb790515d90b3a27856f000
|