Skip to main content

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

Project description

Introduction

t-Stochastic Neighborhood Embedding ([t-SNE](https://lvdmaaten.github.io/tsne/)) is a highly successful method for dimensionality reduction and visualization of high dimensional datasets. A popular implementation of t-SNE uses the Barnes-Hut algorithm to approximate the gradient at each iteration of gradient descent. We accelerated this implementation as follows:

  • Computation of the N-body Simulation: Instead of approximating the N-body simulation using Barnes-Hut, we interpolate onto an equispaced grid and use FFT to perform the convolution, dramatically reducing the time to compute the gradient at each iteration of gradient descent. See the this post for some intuition on how it works.

  • Computation of Input Similarities: Instead of computing nearest neighbors using vantage-point trees, we approximate nearest neighbors using the Annoy library. The neighbor lookups are multithreaded to take advantage of machines with multiple cores. Using “near” neighbors as opposed to strictly “nearest” neighbors is faster, but also has a smoothing effect, which can be useful for embedding some datasets (see Linderman et al. (2017)). If subtle detail is required (e.g. in identifying small clusters), then use vantage-point trees (which is also multithreaded in this implementation).

Check out our paper or preprint for more details and some benchmarks.

Features

Additionally, this implementation includes the following features:

  • Early exaggeration: In Linderman and Steinerberger (2018), we showed that appropriately choosing the early exaggeration coefficient can lead to improved embedding of swissrolls and other synthetic datasets. Early exaggeration is built into all t-SNE implementations; here we highlight its importance as a parameter.

  • Late exaggeration: Increasing the exaggeration coefficient late in the optimization process can improve separation of the clusters. Kobak and Berens (2019) suggest starting late exaggeration immediately following early exaggeration.

  • Initialization: Custom initialization can be provided from Python/Matlab/R. As suggested by Kobak and Berens (2019), initializing t-SNE with the first two principal components (scaled to have standard deviation 0.0001) results in an embedding which often preserves the global structure more effectively than the default random normalization. See there for other initialisation tricks.

  • Variable degrees of freedom: Kobak et al. (2019) show that decreasing the degree of freedom (df) of the t-distribution (resulting in heavier tails) reveals fine structure that is not visible in standard t-SNE.

  • All optimisation parameters can be controlled from Python. For example, Belkina et al. (2019) highlight the importance of increasing the learning rate when embedding large data sets.

Implementations

There are (at least) three options for using FIt-SNE in Python:

  • This PyPI package (see installation instructions below), which is a Cython wrapper for FIt-SNE and was written by Gioele La Manno. This package is not directly updated; rather, the C++ code from FIt-SNE is ported here occasionally. The current version of the C++ code corresponds to FIt-SNE 1.1.0 (commit 714e12e).

  • The Python wrapper available from the FIt-SNE Github. It is not on PyPI, but rather wraps the FIt-SNE binary.

  • OpenTSNE, which is a pure Python implementation of FIt-SNE, also available on PyPI.

Installation

The only prerequisite is FFTW. FFTW and fitsne can be installed as follows:

conda config --add channels conda-forge #if not already in your channels. Needed for fftw.
conda install cython numpy fftw
pip install fitsne

And you’re good to go!

Bug reports, feature requests, etc.

If you have any problems with this package, please open an issue on the Github repository.

References

If you use our software, please cite:

George C. Linderman, Manas Rachh, Jeremy G. Hoskins, Stefan Steinerberger, Yuval Kluger. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Methods. (link)

Our implementation is derived from the Barnes-Hut implementation:

Laurens van der Maaten (2014). Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, 15(1):3221–3245. (link)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fitsne-1.2.1.tar.gz (158.1 kB view details)

Uploaded Source

Built Distribution

fitsne-1.2.1-py3.8-linux-x86_64.egg (753.4 kB view details)

Uploaded Source

File details

Details for the file fitsne-1.2.1.tar.gz.

File metadata

  • Download URL: fitsne-1.2.1.tar.gz
  • Upload date:
  • Size: 158.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for fitsne-1.2.1.tar.gz
Algorithm Hash digest
SHA256 3cde3768e453d7dd74f4e17e43dab8d7fa6e9c2e41fb886a87cab3a1a354dc53
MD5 d9789c875b244318efc654ff85c7ffa1
BLAKE2b-256 0bd46dc0da00310bb9cef100d14ace03418bdef566495390635d6e6355afb6a4

See more details on using hashes here.

File details

Details for the file fitsne-1.2.1-py3.8-linux-x86_64.egg.

File metadata

  • Download URL: fitsne-1.2.1-py3.8-linux-x86_64.egg
  • Upload date:
  • Size: 753.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for fitsne-1.2.1-py3.8-linux-x86_64.egg
Algorithm Hash digest
SHA256 df85b8ef638271dbca542da2be9f0047c2d41ab7521be474589fc55119102124
MD5 20a209865924746392d542ac306d695a
BLAKE2b-256 4753409429edb52352fe626135d93e3fed22cf2c050b0f5fb3c43510e377a6bb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page