Skip to main content

Python Implementation of the Glimmer algorithm for multidimensional scaling

Project description

PyGlimmerMDS

Multidimensional scaling (MDS) for large data sets - a python implementation of the Glimmer algorithm.
[Glimmer: Multilevel MDS on the GPU - 2009 - IEEE TVCG - Ingram, Munzner, Olano]

Glimmer performs dimensionality reduction on high-dimensional data sets of many instances, avoiding the quadratic runtime behavior of naive MDS implementations by employing a multilevel (coarse to fine) approach. This implementation has a GPU switch, but also gives considerable speedup with only CPU nonetheless and makes MDS on large data sets feasible.

Glimmer is a metric MDS and uses Euclidean distance in the high-dimensional space as the dissimilarity measure. This is not the classical MDS that has a linear projection solution. Instead it solves the following optimization problem:

$$\underset{y_1,..,y_n}{\mathrm{argmin}} ~ \sum_{i=1}^n \sum_{j=i+1}^n \Big(\lVert x_i-x_j \rVert - \lVert y_i-y_j \rVert\Big) ^2 \quad \mathrm{where} x_i \in \mathbb{R}^D \mathrm{and} y_i \in \mathbb{R}^{d \ll D}$$

Installation

PyGlimmerMDS is available on PyPi and can be installed through pip.

pip install PyGlimmerMDS

or if you want to install a specific commit use

pip install git+https://github.com/hageldave/PyGlimmerMDS@<commit_hash>

How to use

Very briefly

Performing Glimmer on a data set works like this:

mds = Glimmer(decimation_factor=2, stress_ratio_tol=1-1e-5, rng=rng)
projection = mds.fit_transform(data) # alternative: projection, stress = execute_glimmer(data)
print(f"final stress={mds.stress}")

Enable GPU acceleration

The GPU implementation is based on CuPy, which is an optional dependency. The GPU implementation will only be available if a CuPy package is installed. Which CuPy package to install depends on the available hardware and driver (i.e. cupy-cuda12x, cupy-cuda13x, or cupy-rocm-7-0 as of writing). GPU acceleration can then be used as follows:

mds = Glimmer(gpu=True, ....)
# or
projection, stress = execute_glimmer_gpu(data, ....)

Complete example

Jittering the Iris data set to produce a data set of 38,400 points. Performing Glimmer on this data set.

from pyglimmermds import Glimmer, execute_glimmer
from sklearn import preprocessing as prep
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=0xBA0BAB)

# get iris data
dataset = datasets.load_iris()
data = dataset.data
labels = dataset.target
# duplicate data with added noise
for _ in range(8):
  data = np.vstack((data,data+(rng.random((data.shape[0], data.shape[1]))*0.2-.1)))
  labels = np.append(labels,labels)
print(data.shape)
print(labels.shape)

# perform MDS
data = prep.StandardScaler().fit_transform(data)
mds = Glimmer(decimation_factor=2, stress_ratio_tol=1-1e-5, rng=rng)
projection = mds.fit_transform(data) # alternative: projection, stress = execute_glimmer(data)
print(f"final stress={mds.stress}")

# show scatter plot
fig, ax = plt.subplots()
scatter = ax.scatter(projection[:, 0], projection[:, 1], c=labels, s=0.02)
ax.axis('equal')
plt.show(fig)

glimmer_iris

This video shows the layouting happening per level and iteration

https://github.com/user-attachments/assets/aa9f7a8c-1c03-46a3-8ee1-19b3d2d4033e

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyglimmermds-1.2.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyglimmermds-1.2.1-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file pyglimmermds-1.2.1.tar.gz.

File metadata

  • Download URL: pyglimmermds-1.2.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pyglimmermds-1.2.1.tar.gz
Algorithm Hash digest
SHA256 a412389ba13d1493dc9287501f4c13882204590ec0f82ad0675f0c9155c2f777
MD5 3798f62fde63699b0b0d2db84baee042
BLAKE2b-256 b97aebc23b800fd6c9fe2714c1c560a8aa468bd053c9065a357dc19444512002

See more details on using hashes here.

File details

Details for the file pyglimmermds-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: pyglimmermds-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pyglimmermds-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3faf397d08d1ea086e65df8576e723ac6b674351edad0bdfe373a174958e9b16
MD5 23e8ad6ba528c68b2be6fa2b95834396
BLAKE2b-256 0a7110a75c9642907c083c6ec0897314800b3f94a331fd6b092cc964a6db1c05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page