Skip to main content

Hierarchical Uniform Manifold Approximation and Projection

Project description

.. -- mode: rst --

|conda_version|_ |conda_downloads|_ |pypi_version|_ |pypi_downloads|_

.. |pypi_version| image:: https://img.shields.io/pypi/v/humap.svg .. _pypi_version: https://pypi.python.org/pypi/humap/

.. |pypi_downloads| image:: https://pepy.tech/badge/humap .. _pypi_downloads: https://pepy.tech/project/humap

.. |conda_version| image:: https://anaconda.org/conda-forge/humap/badges/version.svg .. _conda_version: https://anaconda.org/conda-forge/humap

.. |conda_downloads| image:: https://anaconda.org/conda-forge/humap/badges/downloads.svg .. _conda_downloads: https://anaconda.org/conda-forge/humap

.. image:: images/humap-2M.gif :alt: HUMAP exploration on Fashion MNIST dataset

===== HUMAP

Hierarchical Manifold Approximation and Projection (HUMAP) is a technique based on UMAP <https://github.com/lmcinnes/umap/>_ for hierarchical dimensionality reduction. HUMAP allows to:

  1. Focus on important information while reducing the visual burden when exploring huge datasets;
  2. Drill-down the hierarchy according to information demand.

The details of the algorithm can be found in our paper on ArXiv <https://arxiv.org/abs/2106.07718>_. This repository also features a C++ UMAP implementation.


Installation

HUMAP was written in C++ for performance purposes, and provides an intuitive Python interface. It depends upon common machine learning libraries, such as scikit-learn and NumPy. It also needs the pybind11 due to the interface between C++ and Python.

Requirements:

  • Python 3.6 or greater
  • numpy
  • scipy
  • scikit-learn
  • pybind11
  • pynndescent (for reproducible results)
  • Eigen (C++)

If you have these requirements installed, use PyPI:

.. code:: bash

pip install humap

Alternatively (and preferable), you can use conda to install:

.. code:: bash

conda install humap

If using pip:

HUMAP depends on Eigen <https://eigen.tuxfamily.org/>_. Thus, make it sure to place the headers in /usr/local/include if using Unix or C:\Eigen if using Windows.

Manual installation:

For manually installing HUMAP, download the project and proceed as follows:

.. code:: bash

python setup.py bdist_wheel

.. code:: bash

pip install dist/humap*.whl

Usage examples

The simplest usage of HUMAP is as it follows:

Fitting the hierarchy

.. code:: python

import humap
from sklearn.datasets import fetch_openml


X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

# build a hierarchy with three levels
hUmap = humap.HUMAP([0.2, 0.2])
hUmap.fit(X, y)

# embed level 2
embedding2 = hUmap.transform(2)

Refer to notebooks/ for complete examples.

C++ UMAP implementation

You can also fit a one-level HUMAP hierarchy, which essentially fits UMAP projection.

.. code:: python

umap_reducer = humap.UMAP()
embedding = umap_reducer.fit_transform(X)

Citation

Please, use the following reference to cite HUMAP in your work:

.. code:: bibtex

@misc{marciliojr_humap2021,
  title={HUMAP: Hierarchical Uniform Manifold Approximation and Projection}, 
  author={Wilson E. Marcílio-Jr and Danilo M. Eler and Fernando V. Paulovich and Rafael M. Martins},
  year={2021},
  eprint={2106.07718},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

License

HUMAP follows the 3-clause BSD license and it uses the open-source NNDescent implementation from EFANNA <https://github.com/ZJULearning/efanna>. It also uses a C++ implementation of UMAP <http://github.com/lmcinnes/umap> for embedding hierarchy levels.

E-mail me (wilson_jr at outlook.com) if you like to contribute.

......

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humap-0.2.8.tar.gz (24.2 MB view details)

Uploaded Source

Built Distribution

humap-0.2.8-cp38-cp38-macosx_11_0_arm64.whl (309.0 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file humap-0.2.8.tar.gz.

File metadata

  • Download URL: humap-0.2.8.tar.gz
  • Upload date:
  • Size: 24.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for humap-0.2.8.tar.gz
Algorithm Hash digest
SHA256 be4824c44eb8a67c35ce8f9b22f798125d0066717633c0d139c356c7aa1ba111
MD5 288cb7b6eac9d04c134d6860efbac38f
BLAKE2b-256 989ed12cea38cf34171bbdfaaca1d54bc2b72ac8f0c9e95aedb55875ab3c9895

See more details on using hashes here.

File details

Details for the file humap-0.2.8-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for humap-0.2.8-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 558b68c42366be4c5cf732e05c6c28739cbf12134af288caad446b8d7014e0f4
MD5 29ed37916f8a137696c2c2b0371e669d
BLAKE2b-256 84944de39198487e9d06f9ac732a7bcbc0fa31ae56931e3583179281e0139860

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page