Skip to main content

Python Implementation of the Glimmer algorithm for multidimensional scaling

Project description

PyGlimmerMDS

Multidimensional scaling (MDS) for large data sets - a python implementation of the Glimmer algorithm.
[Glimmer: Multilevel MDS on the GPU - 2009 - IEEE TVCG - Ingram, Munzner, Olano]

Glimmer performs dimensionality reduction on high-dimensional data sets of many instances, avoiding the quadratic runtime behavior of naive MDS implementations by employing a multilevel (coarse to fine) approach. This implementation has a GPU switch, but gives considerable speedup with CPU nonetheless and makes MDS on large data sets feasible.

Glimmer is a metric MDS and uses Euclidean distance in the high-dimensional space as the dissimilarity measure. This is not the classical MDS that has a linear projection solution. Instead it solves the following optimization problem:

$$\underset{y_1,..,y_n}{\mathrm{argmin}} ~ \sum_{i=1}^n \sum_{j=i+1}^n \Big(\lVert x_i-x_j \rVert - \lVert y_i-y_j \rVert\Big) ^2 \quad \mathrm{where} x_i \in \mathbb{R}^D \mathrm{and} y_i \in \mathbb{R}^{d \ll D}$$

Installation

PyGlimmerMDS is available on PyPi and can be installed through pip.

pip install PyGlimmerMDS

or if you want to install a specific commit use

pip install git+https://github.com/hageldave/PyGlimmerMDS@<commit_hash>

How to use

Very briefly

Performing Glimmer on a data set works like this:

mds = Glimmer(decimation_factor=2, stress_ratio_tol=1-1e-5, rng=rng, gpu=False)
projection = mds.fit_transform(data) # alternative: projection, stress = execute_glimmer(data)
print(f"final stress={mds.stress}")

Complete example

Jittering the Iris data set to produce a data set of 38,400 points. Performing Glimmer on this data set.

from pyglimmermds import Glimmer, execute_glimmer
from sklearn import preprocessing as prep
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=0xBA0BAB)

# get iris data
dataset = datasets.load_iris()
data = dataset.data
labels = dataset.target
# duplicate data with added noise
for _ in range(8):
  data = np.vstack((data,data+(rng.random((data.shape[0], data.shape[1]))*0.2-.1)))
  labels = np.append(labels,labels)
print(data.shape)
print(labels.shape)

# perform MDS
data = prep.StandardScaler().fit_transform(data)
mds = Glimmer(decimation_factor=2, stress_ratio_tol=1-1e-5, rng=rng)
projection = mds.fit_transform(data) # alternative: projection, stress = execute_glimmer(data)
print(f"final stress={mds.stress}")

# show scatter plot
fig, ax = plt.subplots()
scatter = ax.scatter(projection[:, 0], projection[:, 1], c=labels, s=0.02)
ax.axis('equal')
plt.show(fig)

glimmer_iris

This video shows the layouting happening per level and iteration

https://github.com/user-attachments/assets/aa9f7a8c-1c03-46a3-8ee1-19b3d2d4033e

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyglimmermds-1.2.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyglimmermds-1.2.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file pyglimmermds-1.2.0.tar.gz.

File metadata

  • Download URL: pyglimmermds-1.2.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pyglimmermds-1.2.0.tar.gz
Algorithm Hash digest
SHA256 c0f3614cdf2cdd8b7a9aa0f5175ea368241c39fa14959e29d50102bc77d55c26
MD5 f9f3b45705cac72fa331afe1406a98bc
BLAKE2b-256 f4cf4f548a1ce3262f22f68d04f7779aeea5901edce447589bf11fc574f3a519

See more details on using hashes here.

File details

Details for the file pyglimmermds-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyglimmermds-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for pyglimmermds-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de60402df64d8d7f23db93be86742db25315700593b91f5d5b4f8af05e7c9397
MD5 e9797f436daecdc169d4b16d74b60d75
BLAKE2b-256 c0a30b513a81df1c4f33c7278e5788fe6e2dcda9bc932a9c485a4aec33e214c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page