Implementation of UMATO (Uniform Manifold Approximation with Two-Phase Optimization)
Project description
Uniform Manifold Approximation with Two-phase Optimization
Uniform Manifold Approximation with Two-phase Optimization (UMATO) is a dimensionality reduction technique, which can preserve the global as well as the local structure of high-dimensional data. Most existing dimensionality reduction algorithms focus on either of the two aspects, however, such insufficiency can lead to overlooking or misinterpreting important patterns in the data. For this aim, we propose a two-phase optimization: global optimization and local optimization. First, we obtain the global structure by selecting and optimizing the hub points. Next, we initialize and optimize other points using the nearest neighbor graph. Our experiments with one synthetic and three real world datasets show that UMATO can outperform the baseline algorithms, such as PCA, t-SNE, Isomap, UMAP, Topological Autoencoders and Anchor t-SNE, in terms of global measures and qualitative projection results.
System Requirements
- Python 3.6 or greater
- scikit-learn
- numpy
- scipy
- numba
- pandas (to read csv files)
Run
You can try the following code to see the result:
# install requirements
pip install scikit-learn numpy numba pandas
# download specific (e.g., MNIST) datasets
bash download.sh mnist
# run UMATO
python test.py --data=mnist
Evaluation
Training models & Generating embedding result
We will generate embedding results for each algorithm for the comparison. The algorithms we will use are the following:
- PCA
- t-SNE
- UMAP
- Topological Autoencoder
- Anchor t-SNE
- UMATO (ours)
We can run each method separately, or all of them at once.
# run all datasets
bash run-benchmark.sh
# run specific dataset (e.g., MNIST dataset)
bash run-benchmark.sh mnist
This will cover PCA, t-SNE, UMAP and Topological Autoencoders. To run Anchor t-SNE, you need CUDA and GPU. Please refer to here for specification.
Qualitative evaluation
For the qualitative evaluation, we can compare the 2D visualization of each algorithm. We used the svelte web framework and d3 for the visualization.
# see visualization
cd visualization
# install requirements
npm install
# run svelte app
npm run dev
Embedding results of the Spheres dataset for each algorithm
2D visualization |
---|
Quantitative evaluation
Likewise, we compared the embedding result quantitatively. We use measures such as Distance to a measure and KL divergence between density distributions for comparison.
To print the quantitative result:
# print table result
python -m evaluation.comparison --algo=all --data=spheres --measure=all
Result for the Spheres dataset
PCA | Isomap | t-SNE | UMAP | TopoAE | At-SNE | UMATO (ours) | |
---|---|---|---|---|---|---|---|
DTM | 0.9950 | 0.7784 | 0.9116 | 0.9209 | 0.6619 | 0.9448 | 0.3849 |
KL-Div (sigma=0.01) | 0.7568 | 0.4492 | 0.6070 | 0.6100 | 0.1865 | 0.6584 | 0.1569 |
KL-Div (sigma=0.1) | 0.6525 | 0.4267 | 0.5365 | 0.5383 | 0.3007 | 0.5712 | 0.1333 |
KL-Div (sigma=1.) | 0.0153 | 0.0095 | 0.0128 | 0.0134 | 0.0057 | 0.0138 | 0.0008 |
Cont | 0.7983 | 0.9041 | 0.8903 | 0.8760 | 0.8317 | 0.8721 | 0.7884 |
Trust | 0.6088 | 0.6266 | 0.7073 | 0.6499 | 0.6339 | 0.6433 | 0.6558 |
MRRE_X | 0.7985 | 0.9039 | 0.9032 | 0.8805 | 0.8317 | 0.8768 | 0.7887 |
MRRE_Z | 0.6078 | 0.6268 | 0.7261 | 0.6494 | 0.6326 | 0.6424 | 0.6557 |
- DTM & KL divergence: Lower is better
- The winnder and runner-up is in bold.
References
- Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. JMLR, 9(Nov), 2579-2605.
- McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
- Moor, M., Horn, M., Rieck, B., & Borgwardt, K. (2020). Topological autoencoders. ICML.
- Fu, C., Zhang, Y., Cai, D., & Ren, X. (2019, July). AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 176-186).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file umato-0.0.3.tar.gz
.
File metadata
- Download URL: umato-0.0.3.tar.gz
- Upload date:
- Size: 42.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b397239ba426f75dee7ad824ed8d2be4dbb109aa3ad925ec244f6a089681c1f4 |
|
MD5 | b03a649c768c66835beea098c9be5095 |
|
BLAKE2b-256 | a7129bb7b616f28ca19d79d2a4e601087061abc28fe8a7ac34a0f20408247219 |
File details
Details for the file umato-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: umato-0.0.3-py3-none-any.whl
- Upload date:
- Size: 50.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab3c0aa160f01dd40beba68d3df513044eae2d40a6387d71f2accbea61d1087b |
|
MD5 | 95dbe86695a86174bc94f54f2187c52f |
|
BLAKE2b-256 | f17d1337d8bf7ae80b35cc950fd099eabffe2d8792f2f2de466a94effca2fb46 |