UMAP with GPUs
Project description
GPU Parallelized Uniform Manifold Approximation and Projection (GPUMAP) is the GPU-ported version of the UMAP dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.
At the moment only CUDA capable GPUs are supported. Due to a dependency on FAISS, only Linux (and potentially MacOS) platforms are supported at the moment.
For further information on UMAP see the the original implementation https://github.com/lmcinnes/umap/.
How to use GPUMAP
The gpumap package inherits from sklearn classes, and thus drops in neatly next to other sklearn transformers with an identical calling API.
import gpumap
from sklearn.datasets import load_digits
digits = load_digits()
embedding = gpumap.GPUMAP().fit_transform(digits.data)
There are a number of parameters that can be set for the GPUMAP class; the major ones are as follows:
n_neighbors: This determines the number of neighboring points used in local approximations of manifold structure. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50, with a choice of 10 to 15 being a sensible default.
min_dist: This controls how tightly the embedding is allowed compress points together. Larger values ensure embedded points are more evenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5, with 0.1 being a reasonable default.
The metric parameter is supported to keep the interface aligned with UMAP, however, setting it to anything but ‘euclidean’ will fall back to the sequential version. Processing sparse matrices is not supported either, and will similarly cause a fallback to the sequential version for parts of the algorithm.
Performance and Examples
GPUMAP, like UMAP, is very efficient at embedding large high dimensional datasets. In particular it scales well with both input dimension and embedding dimension. Performance depends strongly depends on the used GPU. For a problem such as the 784-dimensional MNIST digits dataset with 70000 data samples, GPUMAP can complete the embedding in around 30 seconds with an (outdated) NVIDIA GTX 745 graphics card. More recent hardware will scale accordingly. Despite this runtime efficiency UMAP still produces high quality embeddings.
The obligatory MNIST digits dataset, embedded in 29 seconds using a 3.6 GHz Intel Core i7 processor and an NVIDIA GTX 745 GPU (n_neighbors=10, min_dist=0.001):
The MNIST digits dataset is fairly straightforward however. A better test is the more recent “Fashion MNIST” dataset of images of fashion items (again 70000 data sample in 784 dimensions). GPUMAP produced this embedding in 2 minutes exactly (n_neighbors=5, min_dist=0.1):
Installing
GPUMAP has the same dependecies of UMAP, namely scikit-learn, numpy, scipy and numba. GPUMAP adds a requirement for faiss to perform nearest-neighbor search on GPUs.
Requirements:
scikit-learn
(numpy)
(scipy)
numba
faiss
Install Options
GPUMAP can be installed via Conda, PyPi or from source:
Option 1: Conda
Set up a new conda environment, if needed.
conda create -n env
conda activate env
conda install python
Install dependecies: Numba and FAISS
conda install numba
conda install scikit-learn
conda install faiss-gpu cudatoolkit=10.0 -c pytorch # For CUDA10
# For older CUDA versions:
# conda install faiss-gpu cudatoolkit=8.0 -c pytorch # For CUDA8
# conda install faiss-gpu cudatoolkit=9.0 -c pytorch # For CUDA9
conda install -c conda-forge gpumap
Option 2: PyPi
GPUMAP is also available as a PyPi package.
pip install scikit-learn numba faiss gpumap
Note that the prebuilt FAISS library is not officially supported by upstream.
Option 3: Build
Building from source is easy, clone the repository or get the code onto your computer by other means and run the installer with:
python setup.py install
Note that the dependecies need to be installed beforehand. These are the FAISS https://github.com/facebookresearch/faiss/blob/master/INSTALL.md library and Numba http://numba.pydata.org/numba-doc/latest/user/installing.html.
License
The gpumap package is based on the umap package and thus is also 3-clause BSD licensed.
Contributing
Contributions are always welcome! Fork away!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gpumap-0.1.1.tar.gz
.
File metadata
- Download URL: gpumap-0.1.1.tar.gz
- Upload date:
- Size: 43.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8f6a72081f957ff5642c72f8c9c87bf419bd991bf7e85ed1ee1d387f48ecc82 |
|
MD5 | b17aa913694bab94db20f58e65d472aa |
|
BLAKE2b-256 | e72d7264c32ceb1fb2135b715775c381adf83611a958d9a9fa0e8dbad3e5b7cd |
File details
Details for the file gpumap-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: gpumap-0.1.1-py3-none-any.whl
- Upload date:
- Size: 52.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd1a04b9b99b379248ab179c6555d30416424958f2e6e7edefe417e73184d6f7 |
|
MD5 | 0c21d5c9945d9c3425c5c98a419a3869 |
|
BLAKE2b-256 | 9168cb9459a558d826086e1c668b85b32a9f04cc7eeb0758f6dba571989f097e |