Skip to main content

Python Implementation of UMATO (Uniform Manifold Approximation with Two-Phase Optimization)

Project description

UMATO

Uniform Manifold Approximation with Two-phase Optimization


Uniform Manifold Approximation with Two-phase Optimization (UMATO) is a dimensionality reduction technique, which can preserve the global as well as the local structure of high-dimensional data. Most existing dimensionality reduction algorithms focus on either of the two aspects, however, such insufficiency can lead to overlooking or misinterpreting important global patterns in the data. Moreover, the existing algorithms suffer from instability. To address these issues, UMATO proposes a two-phase optimization: global optimization and local optimization. First, we obtain the global structure by selecting and optimizing the hub points. Next, we initialize and optimize other points using the nearest neighbor graph. Our experiments with one synthetic and three real world datasets show that UMATO can outperform the baseline algorithms, such as PCA, t-SNE, Isomap, UMAP, LAMP and PacMAP, in terms of accuracy, stability, and scalability.

System Requirements

  • Python 3.9 or greater
  • scikit-learn
  • numpy
  • scipy
  • numba

Installation

UMATO is available via pip.

pip install umato
import umato
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
emb = umato.UMATO(hub_num=50).fit_transform(X)

For detailed information on the algorithm and parameter usage, check the API documentation listed under the Wiki.

Findings

Detailed statistical data supporting our findings in accuracy and scalability analyses are presented in the figures below:

Figure 1: Accuracy Analysis between Dimensionality Reduction Techniques

Figure 1
The average scores that nine DR techniques obtain in the accuracy analysis. For each quality metric, DR techniques ranked in the first--fourth place are highlighted in blue, where we assign higher opacity to the better techniques. Similarly, techniques ranked in the six--ninth place are highlighted in red, where worse techniques have higher opacity. UMATO substantially outperforms baselines in terms of global metrics with a slight sacrifice in local metric scores. Note that we standardize both the original data and projections to minimize the impact of scaling.

Figure 2: Local and Global Metric Rankings

Ranking of DR techniques determined by local and global quality metrics in accuracy analysis. Among the nine techniques we compared, UMATO demonstrated the highest accuracy in terms of global metrics and ranked fourth in local metrics. The error bars depict 95% confidence interval.

Figure 3: Scalability with Large Datasets

The results of the scalability analysis with large datasets. The number of points (size) and dimensionality (dim.) are depicted on the left side of each dataset’s name. We depict the runtime of each DR technique in mm:ss format. UMATO outperformed every competitor except PCA, with an average speedup of ×14.3 over UMAP.

Figure 4: Projection Subset Analysis

Figure 4
The subset of the projections generated in our accuracy analysis. Colors depict the class label of each dataset. The analysis results verified that UMATO outperforms competitors in terms of accurately preserving global structure while maintaining competitive performance in depicting local structure.

Figure 5: Scalability with Small Datasets

The results of the scalability analysis with small datasets. Note that LAMP has been removed from the figure as it is needs substantially long computation time, making the runtime of all other techniques look similar. UMATO takes about three seconds on average to generate projections, outperforming all other nonlinear DR techniques. The error bars depict confidence intervals (95%)

Citation

UMATO can be cited as follows:

@inproceedings{jeon2022vis,
  title={Uniform Manifold Approximation with Two-phase Optimization},
  author={Jeon, Hyeon and Ko, Hyung-Kwon and Lee, Soohyun and Jo, Jaemin and Seo, Jinwook},
  booktitle={2022 IEEE Visualization and Visual Analytics (VIS)},
  pages={80--84},
  year={2022},
  organization={IEEE}
}

Jeon, H., Ko, H. K., Lee, S., Jo, J., & Seo, J. (2022, October). Uniform Manifold Approximation with Two-phase Optimization. In 2022 IEEE Visualization and Visual Analytics (VIS) (pp. 80-84). IEEE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umato-1.0.0.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

umato-1.0.0-py3-none-any.whl (44.0 kB view details)

Uploaded Python 3

File details

Details for the file umato-1.0.0.tar.gz.

File metadata

  • Download URL: umato-1.0.0.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for umato-1.0.0.tar.gz
Algorithm Hash digest
SHA256 98e183600d270577fca4f9ab3de70da492d2c99dbf41220c74266f2e14f25404
MD5 e7627ac404f8e11f94837047354f2df3
BLAKE2b-256 d581db6155f54052ae927934478db77f12350924b79506e4aadbfc89262b35ce

See more details on using hashes here.

File details

Details for the file umato-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: umato-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 44.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for umato-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a222bc65f27c193e29ca33d999214ec1bb9d6f929b2ec63bf86f9a2cae792384
MD5 ce07bc1aefa85aff0f73cf17fd47b89f
BLAKE2b-256 82526608b0385fac3a5f54f6f6def2c2c14a3fe6a4e5ae3b974efb5fe62e2b40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page