Skip to main content

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

Project description

Introduction

t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensionality reduction and visualization of high dimensional datasets. A popular implementation of t-SNE uses the Barnes-Hut algorithm to approximate the gradient at each iteration of gradient descent. We modified this implementation as follows:

  • Computation of Input Similiarities: Instead of computing nearest neighbors using vantage-point trees, we approximate nearest neighbors using the Annoy library. The neighbor lookups are multithreaded to take advantage of machines with multiple cores. Using “near” neighbors as opposed to strictly “nearest” neighbors is faster, but also has a smoothing effect, which can be useful for embedding some datasets (see Linderman et al. (2017)).

  • Computation of the N-body Simulation: Instead of approximating the N-body simulation using Barnes-Hut, we interpolate onto an equispaced grid and use FFT to perform the convolution, dramatically reducing the time to compute the gradient at each iteration of gradient descent.

  • Early exaggeration: In Linderman and Steinerberger (2017), we showed that appropriately choosing the early exaggeration coefficient can lead to improved embedding of swissrolls and other synthetic datase ts

  • Late exaggeration: By increasing the exaggeration coefficient late in the optimization process (e.g. after 800 of 1000 iterations) can improve separation of the clusters

Check out our preprint for more details and some benchmarks.

This PyPI package is a Cython wrapper for FIt-SNE and was written by Gioele La Manno.

Installation

The only prerequisite is FFTW. FFTW and fitsne can be installed as follows:

conda config --add channels conda-forge #if not already in your channels. Needed for fftw.
conda install cython numpy fftw
pip install fitsne

And you’re good to go!

References

If you use our software, please cite:

George C. Linderman, Manas Rachh, Jeremy G. Hoskins, Stefan Steinerberger, Yuval Kluger. (2017). Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding. (2017) arXiv:1712.09005 (link)

Our implementation is derived from the Barnes-Hut implementation:

Laurens van der Maaten (2014). Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, 15(1):3221–3245. (link)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fitsne-0.1.7.tar.gz (137.4 kB view details)

Uploaded Source

File details

Details for the file fitsne-0.1.7.tar.gz.

File metadata

  • Download URL: fitsne-0.1.7.tar.gz
  • Upload date:
  • Size: 137.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fitsne-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f64fb882e826f2ba6ab2a00bbc69b2d1126f019f01c6ef479f2550f47229c521
MD5 8826983c90bdcb9b119d9d37b740c0c4
BLAKE2b-256 1db01507b43aacce04621cb808362b57f9e0f374af8adef3ed259f8a5cf4d40e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page