Skip to main content

Fast Laplacian Eigenmaps for dimensionality reduction.

Project description

Latest PyPI version License: MIT Twitter

Fast Laplacian Eigenmaps in python

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMSlib to compute approximate-nearest-neighbors. Performs several times faster than the default scikit-learn implementation.

Installation

You'll need NMSlib for using this package properly. Installing it with no binaries is recommended if your CPU supports advanced instructions (it problably does):

pip3 install --no-binary :all: nmslib

Along with requirements:

pip3 install numpy pandas scipy scikit-learn 

Then you can install this package with pip:

pip3 install fastlapmap

Usage

See the following example with the handwritten digits data. Here, I visually compare results from the scikit-learn Laplacian Eigenmaps implementation to those from my implementation.

Note that this implementation contains two similarity-learning algorithms: anisotropic diffusion maps and fuzzy simplicial sets.

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import SpectralEmbedding
from fastlapmap import LapEigenmap

# Load some data
from sklearn.datasets import load_digits
digits = load_digits()
data = digits.data

# Define hyperparameters
N_EIGS=2
N_NEIGHBORS=10
N_JOBS=10

sk_se = SpectralEmbedding(n_components=N_EIGS, n_neighbors=N_NEIGHBORS, n_jobs=N_JOBS).fit_transform(data)

flapmap_diff = LapEigenmap(data, n_eigs=2, similarity='diffusion', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)
flapmap_fuzzy = LapEigenmap(data, n_eigs=2, similarity='fuzzy', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)

fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle('Handwritten digits data:', fontsize=24)
ax1.scatter(sk_se[:, 0], sk_se[:, 1], c=digits.target, cmap='Spectral', s=5)
ax1.set_title('Sklearn\'s Laplacian Eigenmaps', fontsize=20)
ax2.scatter(flapmap_diff[:, 0], flapmap_diff[:, 1], c=digits.target, cmap='Spectral', s=5)
ax2.set_title('Fast Laplacian Eigenmaps with diffusion harmonics', fontsize=20)
ax3.scatter(flapmap_fuzzy[:, 0], flapmap_fuzzy[:, 1], c=digits.target, cmap='Spectral', s=5)
ax3.set_title('Fast Laplacian Eigenmaps with fuzzy simplicial sets', fontsize=20)
plt.show()

As we can see, results are nearly identical.

Parameters

data : numpy.ndarray, pandas.DataFrame or scipy.sparse.csr_matrix Input data. By default will use nmslib for approximate nearest-neighbors, which works both on numpy arrays and sparse matrices (faster and cheaper option). Alternatively, users can provide a precomputed affinity matrix by stating metric='precomputed'.

n_eigs : int (optional, default 10). Number of eigenvectors to decompose the graph Laplacian into.

k : int (optional, default 10). Number of k-nearest-neighbors to use when computing affinities.

metric : str (optional, default 'euclidean'). which metric to use when computing neighborhood distances. Defaults to 'euclidean'. Accepted metrics include: -'sqeuclidean' -'euclidean' -'l1' -'lp' - requires setting the parameter p - equivalent to minkowski distance -'cosine' -'angular' -'negdotprod' -'levenshtein' -'hamming' -'jaccard' -'jansen-shan'

M : int (optional, default 10). defines the maximum number of neighbors in the zero and above-zero layers during HSNW (Hierarchical Navigable Small World Graph). However, the actual default maximum number of neighbors for the zero layer is 2*M. A reasonable range for this parameter is 5-100. For more information on HSNW, please check https://arxiv.org/abs/1603.09320. HSNW is implemented in python via NMSlib. Please check more about NMSlib at https://github.com/nmslib/nmslib.

efC : int (optional, default 20). A 'hnsw' parameter. Increasing this value improves the quality of a constructed graph and leads to higher accuracy of search. However this also leads to longer indexing times. A reasonable range for this parameter is 10-500.

efS : int (optional, default 100). A 'hnsw' parameter. Similarly to efC, increasing this value improves recall at the expense of longer retrieval time. A reasonable range for this parameter is 10-500.

n_jobs : int (optional, default 1) How many threads to use in approximate-nearest-neighbors computation.

similarity : str (optional, default 'diffusion'). Which algorithm to use for similarity learning. Options are diffusion harmonics ('diffusion') , fuzzy simplicial sets ('fuzzy') and continuous k-nearest-neighbors ('cknn').

norm_laplacian : bool (optional, default True). Whether to renormalize the graph Laplacian.

return_evals : bool (optional, default False). Whether to also return the eigenvalues in a tuple of eigenvectors, eigenvalues. Defaults to False.

verbose : bool (optional, default False). Whether to report information on the current progress of the algorithm.

Benchmark

See the runtime comparison between this implementation and scikit-learn:

## Load benchmark function:
from fastlapmap.benchmark import runtime_benchmark

# Load data
from sklearn.datasets import load_digits
digits = load_digits()
data = digits.data

# Define hyperparameters
N_EIGS = 2
N_NEIGHBORS = 10
N_JOBS = 10
SIZES = [1000, 5000, 10000, 25000, 50000, 100000]
N_RUNS = 3

runtime_benchmark(data,
                  n_eigs=N_EIGS,
                  n_neighbors=N_NEIGHBORS,
                  n_jobs=N_JOBS,
                  sizes=SIZES,
                  n_runs=N_RUNS)

As you can see, the diffusion harmoics model is the fastest, followed closely by fuzzy simplicial sets. Both outperform scikit-learn default implementation and escalate linearly with sample size.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastlapmap-0.1.2.tar.gz (238.0 kB view details)

Uploaded Source

Built Distributions

fastlapmap-0.1.2-py3.8.egg (37.9 kB view details)

Uploaded Egg

fastlapmap-0.1.2-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file fastlapmap-0.1.2.tar.gz.

File metadata

  • Download URL: fastlapmap-0.1.2.tar.gz
  • Upload date:
  • Size: 238.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for fastlapmap-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f1e731e4473079e73631cf257f3451a42f64eabec8feba1f010f66a908c6e05f
MD5 502a856b67bd7035030e0a44e7910662
BLAKE2b-256 859103728164fd2d501bac13ff3f0724bc41a039edc33b8316561f3e065ac919

See more details on using hashes here.

File details

Details for the file fastlapmap-0.1.2-py3.8.egg.

File metadata

  • Download URL: fastlapmap-0.1.2-py3.8.egg
  • Upload date:
  • Size: 37.9 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for fastlapmap-0.1.2-py3.8.egg
Algorithm Hash digest
SHA256 6a4b67c9f0371bc1c326a92ee0433e66c29f3b6d898696c01be5440062e33098
MD5 910e37028c2bde0350095401c414b309
BLAKE2b-256 fb58d5b9eecb236fbb1e0855d3173c77a3c1d486cde69eeebf6402efa89bcd48

See more details on using hashes here.

File details

Details for the file fastlapmap-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: fastlapmap-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for fastlapmap-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 03c3527abb2e605066923738e879921aad4eea0872906534a5c2af95a435cc35
MD5 a293a4b62c135878fab880d2e1077b86
BLAKE2b-256 2319cc7278c95f7dd302a7983e2f88e3d28e563ba857e2e8bb7475b17c156ccc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page