Skip to main content

Fast, accurate learning of data topology with self-adaptive metrics, graphs and layouts

Project description

Latest PyPI version License: MIT Documentation Status Twitter

logo

TopOMetry - Topologically Optimized geoMetry

Table of Contents

Topological metrics, basis, graphs and layouts

TopOMetry is a high-level python library to explore data topology. It allows learning topological metrics, dimensionality reduced basis and graphs from data, as well to visualize them with different layout optimization algorithms. The main objective is to achieve approximations of the Laplace-Beltrami Operator, a natural way to describe data geometry and its high-dimensional topology.

TopOMetry is designed to handle large-scale data matrices containing extreme topological diversity, such as those generated from single-cell omics, and can be used to perform topology-preserving visualizations.

TopoMetry main class is the TopoGraph object. In a TopoGraph, topological metrics are recovered with diffusion harmonics or Continuous-k-Nearest-Neighbors, and used to obtain topological basis (multiscale Diffusion Maps and/or diffuse or continuous versions of Laplacian Eigenmaps).

On top of these basis, new graphs can be learned using k-nearest-neighbors graphs or additional topological operators. The learned metrics, basis and graphs are stored as different attributes of the TopoGraph object.

Finally, different visualizations of the learned topology can be optimized with pyMDE by solving a Minimum-Distortion Embedding problem. TopOMetry also implements an adapted, non-uniform version of the seminal Uniform Manifold Approximation and Projection (UMAP) for graph layout optimization (we call it MAP for short).

Alternatively, you can use TopOMetry to add topological information to your favorite workflow by using its dimensionality reduced basis to compute k-nearest-neighbors instead of PCA.

Installation and dependencies

TopOMetry requires some pre-existing libraries to power its scalability and flexibility. TopOMetry is implemented in python and builds complex, high-level models inherited from scikit-learn BaseEstimator, making it flexible and easy to apply and/or combine with different workflows on virtually any domain.

  • scikit-learn - for general algorithms
  • ANNOY - for optimized neighbor index search
  • nmslib - for fast and accurate k-nearest-neighbors
  • kneed - for finding nice cuttofs
  • pyMDE - for optimizing layouts

Prior to installing TopOMetry, make sure you have cmake, scikit-build and setuptools available in your system. If using Linux:

sudo apt-get install cmake
pip3 install scikit-build setuptools

We're also going to need NMSlib for really fast approximate nearest-neighborhood search across different distance metrics. If your CPU supports advanced instructions, we recommend you install nmslib separately for better performance:

pip3 install --no-binary :all: nmslib

Then, you can install TopOMetry and its other requirements with pip:

pip3 install numpy pandas annoy scipy numba torch scikit-learn kneed pymde
pip3 install topometry

Alternatevely, clone this repo and build from source:

git clone https://github.com/davisidarta/topometry
cd topometry
pip3 install .

Quick-start

From a large data matrix data (np.ndarray, pd.DataFrame or sp.csr_matrix), you can set up a TopoGraph with default parameters:

import topo.models as tp

# Learn topological metrics and basis from data. The default is to use diffusion harmonics.
tg = tp.TopOGraph()
tg = tg.fit(data)

Note: topo.ml is the high-level model module which contains the TopOGraph object.

After learning a topological basis, we can access topological metrics and basis in the TopOGraph object, and build different topological graphs.

# Learn a topological graph. Again, the default is to use diffusion harmonics.
tgraph = tg.transform(data) 

Then, it is possible to optimize the topological graph layout. The first option is to do so with our adaptation of UMAP (MAP), which will minimize the cross-entropy between the topological basis and its graph:

# Graph layout optimization MAP
map_emb, aux = tp.MAP(tg.MSDiffMaps, tgraph)

The second, albeit most interesting option is to use pyMDE to find a Minimum Distortion Embedding. TopOMetry implements some custom MDE problems within the TopOGraph model:

# Set up MDE problem
mde = tg.MDE(tgraph)
mde_emb = mde.embed()

Tutorials and examples

Coming soon! For now, our extended documentation at ReadTheDocs

Contributing

Contributions are very welcome! If you're interested in adding a new feature, just let me know in the Issues section.

License

MIT License

Copyright (c) 2021 Davi Sidarta-Oliveira, davisidarta(at)gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topometry-0.0.2.4.tar.gz (91.1 MB view hashes)

Uploaded Source

Built Distributions

topometry-0.0.2.4-py3.8.egg (177.3 kB view hashes)

Uploaded Source

topometry-0.0.2.4-py3-none-any.whl (83.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page