Skip to main content

the official implementation for PaCMAP: Pairwise Controlled Manifold Approximation Projection

Project description

PaCMAP

PaCMAP (Pairwise Controlled Manifold Approximation) is a dimensionality reduction method that can be used for visualization, preserving both local and global structure of the data in original space. PaCMAP optimizes the low dimensional embedding using three kinds of pairs of points: neighbor pairs (pair_neighbors), mid-near pair (pair_MN), and further pairs (pair_FP), whose numbers are n_neighbors, n_MN and n_FP respectively.

Previous dimensionality reduction techniques focus on either local structure (e.g. t-SNE, LargeVis and UMAP) or global structure (e.g. TriMAP), but not both, although with carefully tuning the parameter in their algorithms that controls the balance between global and local structure, which mainly adjusts the number of considered neighbors. Instead of considering more neighbors to attract for preserving glocal structure, PaCMAP dynamically uses a special group of pairs -- mid-near pairs, to first capture global structure and then refine local structure, which both preserve global and local structure.

Installation

Requirements:

  • numpy
  • sklearn
  • annoy
  • numba

To install PaCMAP, you can use pip:

pip install pacmap

Benchmarks

The following images are visualizations of two datasets: MNIST and Mammoth, generated by PaCMAP. The two visualizations demonstrate the local and global structure's preservation ability of PaCMAP respectively.

MNIST

Mammoth

Parameters

The list of the most important parameters is given below. Changing these values will affect the result of dimension reduction significantly.

  • n_neighbors: n_neighbors controls the number of neighbors considered in the k-Nearest Neighbor graph

  • MN_ratio: the ratio of the number of mid-near pairs to the number of neighbors, n_MN = $\lfloor$ n_neighbors * MN_ratio $\rfloor$

  • FP_ratio: the ratio of the number of further pairs to the number of neighbors, n_FP = $\lfloor$ n_neighbors * FP_ratio $\rfloor$

Reproducing the experiments

We have provided the code we use to run experiment for better reproducibility. The code are separated into three parts, in three folders, respectively:

  • data, which includes all the datasets we used, preprocessed into the file format each DR method use
  • experiments, which includes all the scripts we use to produce DR results
  • evaluation, which includes all the scripts we use to evaluate DR results, stated in Section 8 in our paper

After downloading the code, you may need to specify the location you stored in the script to make them fully functional.

Citation

If you use PaCMAP in your publication, or you used the implementation in this repository, please cite our preprint here:

@article{
    #TODO
}

License

Please see the license file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pacmap-0.4.tar.gz (10.5 kB view hashes)

Uploaded Source

Built Distribution

pacmap-0.4-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page