Skip to main content

Python Implementation of UMATO (Uniform Manifold Approximation with Two-Phase Optimization)

Project description

Uniform Manifold Approximation with Two-phase Optimization

Notice

Appreciate all the interests in UMATO at VIS 2022! We'll soon work on cleaning the codes and resolving the bugs (within Early 2023). Thank you!


Uniform Manifold Approximation with Two-phase Optimization (UMATO) is a dimensionality reduction technique, which can preserve the global as well as the local structure of high-dimensional data. Most existing dimensionality reduction algorithms focus on either of the two aspects, however, such insufficiency can lead to overlooking or misinterpreting important patterns in the data. For this aim, we propose a two-phase optimization: global optimization and local optimization. First, we obtain the global structure by selecting and optimizing the hub points. Next, we initialize and optimize other points using the nearest neighbor graph. Our experiments with one synthetic and three real world datasets show that UMATO can outperform the baseline algorithms, such as PCA, t-SNE, Isomap, UMAP, Topological Autoencoders and Anchor t-SNE, in terms of global measures and qualitative projection results.

System Requirements

  • Python 3.6 or greater
  • scikit-learn
  • numpy
  • scipy
  • numba
  • pandas (to read csv files)

Installation

UMATO is available via pip.

pip install umato
import umato
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
emb = umato.UMATO(hub_num=20).fit_transform(X)

Evaluation

Training models & Generating embedding result

We will generate embedding results for each algorithm for the comparison. The algorithms we will use are the following:

We can run each method separately, or all of them at once.

# run all datasets
bash run-benchmark.sh

# run specific dataset (e.g., MNIST dataset)
bash run-benchmark.sh mnist

This will cover PCA, t-SNE, UMAP and Topological Autoencoders. To run Anchor t-SNE, you need CUDA and GPU. Please refer to here for specification.

Quantitative evaluation

Likewise, we compared the embedding result quantitatively. We use measures such as Distance to a measure and KL divergence between density distributions for comparison.

To print the quantitative result:

# print table result
python -m evaluation.comparison --algo=all --data=spheres --measure=all

Result for the Spheres dataset

PCA Isomap t-SNE UMAP TopoAE At-SNE UMATO (ours)
DTM 0.9950 0.7784 0.9116 0.9209 0.6619 0.9448 0.3849
KL-Div (sigma=0.01) 0.7568 0.4492 0.6070 0.6100 0.1865 0.6584 0.1569
KL-Div (sigma=0.1) 0.6525 0.4267 0.5365 0.5383 0.3007 0.5712 0.1333
KL-Div (sigma=1.) 0.0153 0.0095 0.0128 0.0134 0.0057 0.0138 0.0008
Cont 0.7983 0.9041 0.8903 0.8760 0.8317 0.8721 0.7884
Trust 0.6088 0.6266 0.7073 0.6499 0.6339 0.6433 0.6558
MRRE_X 0.7985 0.9039 0.9032 0.8805 0.8317 0.8768 0.7887
MRRE_Z 0.6078 0.6268 0.7261 0.6494 0.6326 0.6424 0.6557
  • DTM & KL divergence: Lower is better
  • The winnder and runner-up is in bold.

References

  • Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. JMLR, 9(Nov), 2579-2605.
  • McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
  • Moor, M., Horn, M., Rieck, B., & Borgwardt, K. (2020). Topological autoencoders. ICML.
  • Fu, C., Zhang, Y., Cai, D., & Ren, X. (2019, July). AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 176-186).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umato-0.1.0.tar.gz (51.6 kB view details)

Uploaded Source

Built Distribution

umato-0.1.0-py3-none-any.whl (64.1 kB view details)

Uploaded Python 3

File details

Details for the file umato-0.1.0.tar.gz.

File metadata

  • Download URL: umato-0.1.0.tar.gz
  • Upload date:
  • Size: 51.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for umato-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b5303c8fd6d182d4a38d1739f839fc77fb14219195955c874d225b5a0c9c2f9a
MD5 5859d48fc3438f5b77e8e33c70f8223a
BLAKE2b-256 418444585dbb885d6a5450fb5eefda97d1f2fdfcc9c9d6e3ac3f62248d417f1c

See more details on using hashes here.

File details

Details for the file umato-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: umato-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 64.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for umato-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 867a1951fd465267bc8f8759ce2972a29387a21d5e64347e80aede1428d972b7
MD5 d87150297ff8695ecb05569053a73a7e
BLAKE2b-256 5dfa2e6ed333830e1cbfe69db7d2901690d9187adb10fcf95cb6d06c4eec74e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page