Fast Laplacian Eigenmaps for dimensionality reduction.
Project description
Fast Laplacian Eigenmaps in python
Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMSlib to compute approximate-nearest-neighbors. Performs several times faster than the default scikit-learn implementation.
Installation
You'll need NMSlib for using this package properly. Installing it with no binaries is recommended if your CPU supports advanced instructions (it problably does):
pip3 install --no-binary :all: nmslib
Along with requirements:
pip3 install numpy pandas scipy scikit-learn
Then you can install this package with pip:
pip3 install fastlapmap
Usage
See the following example with the handwritten digits data. Here, I visually compare results from the scikit-learn Laplacian Eigenmaps implementation to those from my implementation.
Note that this implementation contains two similarity-learning algorithms: anisotropic diffusion maps and fuzzy simplicial sets.
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import SpectralEmbedding
from fastlapmap import LapEigenmap
# Load some data
from sklearn.datasets import load_digits
digits = load_digits()
data = digits.data
# Define hyperparameters
N_EIGS=2
N_NEIGHBORS=10
N_JOBS=10
sk_se = SpectralEmbedding(n_components=N_EIGS, n_neighbors=N_NEIGHBORS, n_jobs=N_JOBS).fit_transform(data)
flapmap_diff = LapEigenmap(data, n_eigs=2, similarity='diffusion', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)
flapmap_fuzzy = LapEigenmap(data, n_eigs=2, similarity='fuzzy', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle('Handwritten digits data:', fontsize=24)
ax1.scatter(sk_se[:, 0], sk_se[:, 1], c=digits.target, cmap='Spectral', s=5)
ax1.set_title('Sklearn\'s Laplacian Eigenmaps', fontsize=20)
ax2.scatter(flapmap_diff[:, 0], flapmap_diff[:, 1], c=digits.target, cmap='Spectral', s=5)
ax2.set_title('Fast Laplacian Eigenmaps with diffusion harmonics', fontsize=20)
ax3.scatter(flapmap_fuzzy[:, 0], flapmap_fuzzy[:, 1], c=digits.target, cmap='Spectral', s=5)
ax3.set_title('Fast Laplacian Eigenmaps with fuzzy simplicial sets', fontsize=20)
plt.show()
As we can see, results are nearly identical.
Parameters
data
: numpy.ndarray, pandas.DataFrame or scipy.sparse.csr_matrix Input data. By default will use nmslib for
approximate nearest-neighbors, which works both on numpy arrays and sparse matrices (faster and cheaper option).
Alternatively, users can provide a precomputed affinity matrix by stating metric='precomputed'
.
n_eigs
: int (optional, default 10).
Number of eigenvectors to decompose the graph Laplacian into.
k
: int (optional, default 10).
Number of k-nearest-neighbors to use when computing affinities.
metric
: str (optional, default 'euclidean').
which metric to use when computing neighborhood distances. Defaults to 'euclidean'.
Accepted metrics include:
-'sqeuclidean'
-'euclidean'
-'l1'
-'lp' - requires setting the parameter p
- equivalent to minkowski distance
-'cosine'
-'angular'
-'negdotprod'
-'levenshtein'
-'hamming'
-'jaccard'
-'jansen-shan'
M
: int (optional, default 10).
defines the maximum number of neighbors in the zero and above-zero layers during HSNW
(Hierarchical Navigable Small World Graph). However, the actual default maximum number
of neighbors for the zero layer is 2*M. A reasonable range for this parameter
is 5-100. For more information on HSNW, please check https://arxiv.org/abs/1603.09320.
HSNW is implemented in python via NMSlib. Please check more about NMSlib at https://github.com/nmslib/nmslib.
efC
: int (optional, default 20).
A 'hnsw' parameter. Increasing this value improves the quality of a constructed graph
and leads to higher accuracy of search. However this also leads to longer indexing times.
A reasonable range for this parameter is 10-500.
efS
: int (optional, default 100).
A 'hnsw' parameter. Similarly to efC, increasing this value improves recall at the
expense of longer retrieval time. A reasonable range for this parameter is 10-500.
n_jobs
: int (optional, default 1)
How many threads to use in approximate-nearest-neighbors computation.
similarity
: str (optional, default 'diffusion').
Which algorithm to use for similarity learning. Options are diffusion harmonics ('diffusion')
, fuzzy simplicial sets ('fuzzy') and continuous k-nearest-neighbors ('cknn').
norm_laplacian
: bool (optional, default True).
Whether to renormalize the graph Laplacian.
return_evals
: bool (optional, default False).
Whether to also return the eigenvalues in a tuple of eigenvectors, eigenvalues. Defaults to False.
verbose
: bool (optional, default False).
Whether to report information on the current progress of the algorithm.
Benchmark
See the runtime comparison between this implementation and scikit-learn:
## Load benchmark function:
from fastlapmap.benchmark import runtime_benchmark
# Load data
from sklearn.datasets import load_digits
digits = load_digits()
data = digits.data
# Define hyperparameters
N_EIGS = 2
N_NEIGHBORS = 10
N_JOBS = 10
SIZES = [1000, 5000, 10000, 25000, 50000, 100000]
N_RUNS = 3
runtime_benchmark(data,
n_eigs=N_EIGS,
n_neighbors=N_NEIGHBORS,
n_jobs=N_JOBS,
sizes=SIZES,
n_runs=N_RUNS)
As you can see, the diffusion harmoics model is the fastest, followed closely by fuzzy simplicial sets. Both outperform scikit-learn default implementation and escalate linearly with sample size.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file fastlapmap-0.1.2.tar.gz
.
File metadata
- Download URL: fastlapmap-0.1.2.tar.gz
- Upload date:
- Size: 238.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1e731e4473079e73631cf257f3451a42f64eabec8feba1f010f66a908c6e05f |
|
MD5 | 502a856b67bd7035030e0a44e7910662 |
|
BLAKE2b-256 | 859103728164fd2d501bac13ff3f0724bc41a039edc33b8316561f3e065ac919 |
File details
Details for the file fastlapmap-0.1.2-py3.8.egg
.
File metadata
- Download URL: fastlapmap-0.1.2-py3.8.egg
- Upload date:
- Size: 37.9 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a4b67c9f0371bc1c326a92ee0433e66c29f3b6d898696c01be5440062e33098 |
|
MD5 | 910e37028c2bde0350095401c414b309 |
|
BLAKE2b-256 | fb58d5b9eecb236fbb1e0855d3173c77a3c1d486cde69eeebf6402efa89bcd48 |
File details
Details for the file fastlapmap-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: fastlapmap-0.1.2-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.1 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03c3527abb2e605066923738e879921aad4eea0872906534a5c2af95a435cc35 |
|
MD5 | a293a4b62c135878fab880d2e1077b86 |
|
BLAKE2b-256 | 2319cc7278c95f7dd302a7983e2f88e3d28e563ba857e2e8bb7475b17c156ccc |