Fast Laplacian Eigenmaps for dimensionality reduction.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Fast Laplacian Eigenmaps in python

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMSlib to compute approximate-nearest-neighbors. Performs several times faster than the default scikit-learn implementation.

Installation

You'll need NMSlib for using this package properly. Installing it with no binaries is recommended if your CPU supports advanced instructions (it problably does):

pip3 install --no-binary :all: nmslib

Along with requirements:

pip3 install numpy pandas scipy scikit-learn

Then you can install this package with pip:

pip3 install fastlapmap

Usage

See the following example with the handwritten digits data. Here, I visually compare results from the scikit-learn Laplacian Eigenmaps implementation to those from my implementation.

Note that this implementation contains two similarity-learning algorithms: anisotropic diffusion maps and fuzzy simplicial sets.

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import SpectralEmbedding
from fastlapmap import LapEigenmap

# Load some data
from sklearn.datasets import load_digits
digits = load_digits()
data = digits.data

# Define hyperparameters
N_EIGS=2
N_NEIGHBORS=10
N_JOBS=10

sk_se = SpectralEmbedding(n_components=N_EIGS, n_neighbors=N_NEIGHBORS, n_jobs=N_JOBS).fit_transform(data)

flapmap_diff = LapEigenmap(data, n_eigs=2, similarity='diffusion', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)
flapmap_fuzzy = LapEigenmap(data, n_eigs=2, similarity='fuzzy', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)

fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle('Handwritten digits data:', fontsize=24)
ax1.scatter(sk_se[:, 0], sk_se[:, 1], c=digits.target, cmap='Spectral', s=5)
ax1.set_title('Sklearn\'s Laplacian Eigenmaps', fontsize=20)
ax2.scatter(flapmap_diff[:, 0], flapmap_diff[:, 1], c=digits.target, cmap='Spectral', s=5)
ax2.set_title('Fast Laplacian Eigenmaps with diffusion harmonics', fontsize=20)
ax3.scatter(flapmap_fuzzy[:, 0], flapmap_fuzzy[:, 1], c=digits.target, cmap='Spectral', s=5)
ax3.set_title('Fast Laplacian Eigenmaps with fuzzy simplicial sets', fontsize=20)
plt.show()

As we can see, results are nearly identical.

Parameters

data : numpy.ndarray, pandas.DataFrame or scipy.sparse.csr_matrix Input data. By default will use nmslib for approximate nearest-neighbors, which works both on numpy arrays and sparse matrices (faster and cheaper option). Alternatively, users can provide a precomputed affinity matrix by stating metric='precomputed'.

n_eigs : int (optional, default 10). Number of eigenvectors to decompose the graph Laplacian into.

k : int (optional, default 10). Number of k-nearest-neighbors to use when computing affinities.

metric : str (optional, default 'euclidean'). which metric to use when computing neighborhood distances. Defaults to 'euclidean'. Accepted metrics include: -'sqeuclidean' -'euclidean' -'l1' -'lp' - requires setting the parameter p - equivalent to minkowski distance -'cosine' -'angular' -'negdotprod' -'levenshtein' -'hamming' -'jaccard' -'jansen-shan'

M : int (optional, default 10). defines the maximum number of neighbors in the zero and above-zero layers during HSNW (Hierarchical Navigable Small World Graph). However, the actual default maximum number of neighbors for the zero layer is 2*M. A reasonable range for this parameter is 5-100. For more information on HSNW, please check https://arxiv.org/abs/1603.09320. HSNW is implemented in python via NMSlib. Please check more about NMSlib at https://github.com/nmslib/nmslib.

efC : int (optional, default 20). A 'hnsw' parameter. Increasing this value improves the quality of a constructed graph and leads to higher accuracy of search. However this also leads to longer indexing times. A reasonable range for this parameter is 10-500.

efS : int (optional, default 100). A 'hnsw' parameter. Similarly to efC, increasing this value improves recall at the expense of longer retrieval time. A reasonable range for this parameter is 10-500.

n_jobs : int (optional, default 1) How many threads to use in approximate-nearest-neighbors computation.

similarity : str (optional, default 'diffusion'). Which algorithm to use for similarity learning. Options are diffusion harmonics ('diffusion') , fuzzy simplicial sets ('fuzzy') and continuous k-nearest-neighbors ('cknn').

norm_laplacian : bool (optional, default True). Whether to renormalize the graph Laplacian.

return_evals : bool (optional, default False). Whether to also return the eigenvalues in a tuple of eigenvectors, eigenvalues. Defaults to False.

verbose : bool (optional, default False). Whether to report information on the current progress of the algorithm.

Benchmark

See the runtime comparison between this implementation and scikit-learn:

## Load benchmark function:
from fastlapmap.benchmark import runtime_benchmark

# Load data
from sklearn.datasets import load_digits
digits = load_digits()
data = digits.data

# Define hyperparameters
N_EIGS = 2
N_NEIGHBORS = 10
N_JOBS = 10
SIZES = [1000, 5000, 10000, 25000, 50000, 100000]
N_RUNS = 3

runtime_benchmark(data,
                  n_eigs=N_EIGS,
                  n_neighbors=N_NEIGHBORS,
                  n_jobs=N_JOBS,
                  sizes=SIZES,
                  n_runs=N_RUNS)

As you can see, the diffusion harmoics model is the fastest, followed closely by fuzzy simplicial sets. Both outperform scikit-learn default implementation and escalate linearly with sample size.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.2

Oct 27, 2021

0.1.1

Oct 24, 2021

0.1

Oct 24, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastlapmap-0.1.2.tar.gz (238.0 kB view details)

Uploaded Oct 27, 2021 Source

Built Distributions

fastlapmap-0.1.2-py3.8.egg (37.9 kB view details)

Uploaded Oct 27, 2021 Egg

fastlapmap-0.1.2-py3-none-any.whl (20.3 kB view details)

Uploaded Oct 27, 2021 Python 3

File details

Details for the file fastlapmap-0.1.2.tar.gz.

File metadata

Download URL: fastlapmap-0.1.2.tar.gz
Upload date: Oct 27, 2021
Size: 238.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for fastlapmap-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`f1e731e4473079e73631cf257f3451a42f64eabec8feba1f010f66a908c6e05f`
MD5	`502a856b67bd7035030e0a44e7910662`
BLAKE2b-256	`859103728164fd2d501bac13ff3f0724bc41a039edc33b8316561f3e065ac919`

See more details on using hashes here.

File details

Details for the file fastlapmap-0.1.2-py3.8.egg.

File metadata

Download URL: fastlapmap-0.1.2-py3.8.egg
Upload date: Oct 27, 2021
Size: 37.9 kB
Tags: Egg
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for fastlapmap-0.1.2-py3.8.egg
Algorithm	Hash digest
SHA256	`6a4b67c9f0371bc1c326a92ee0433e66c29f3b6d898696c01be5440062e33098`
MD5	`910e37028c2bde0350095401c414b309`
BLAKE2b-256	`fb58d5b9eecb236fbb1e0855d3173c77a3c1d486cde69eeebf6402efa89bcd48`

See more details on using hashes here.

File details

Details for the file fastlapmap-0.1.2-py3-none-any.whl.

File metadata

Download URL: fastlapmap-0.1.2-py3-none-any.whl
Upload date: Oct 27, 2021
Size: 20.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for fastlapmap-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03c3527abb2e605066923738e879921aad4eea0872906534a5c2af95a435cc35`
MD5	`a293a4b62c135878fab880d2e1077b86`
BLAKE2b-256	`2319cc7278c95f7dd302a7983e2f88e3d28e563ba857e2e8bb7475b17c156ccc`

See more details on using hashes here.

fastlapmap 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fast Laplacian Eigenmaps in python

Installation

Usage

Parameters

Benchmark

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes