Fast, accurate learning of data topology with self-adaptive metrics, graphs and layouts
Project description
TopOMetry - Topologically Optimized geoMetry
Table of Contents
Topological metrics, basis, graphs and layouts
TopOMetry is a high-level python library to explore data topology. It allows learning topological metrics, dimensionality reduced basis and graphs from data, as well to visualize them with different layout optimization algorithms. The main objective is to achieve approximations of the Laplace-Beltrami Operator, a natural way to describe data geometry and its high-dimensional topology.
TopOMetry is designed to handle large-scale data matrices containing extreme topological diversity, such as those generated from single-cell omics, and can be used to perform topology-preserving visualizations.
TopoMetry main class is the TopoGraph
object. In a TopoGraph
, topological metrics are recovered with diffusion
harmonics or Continuous-k-Nearest-Neighbors, and used to obtain topological basis (multiscale Diffusion Maps and/or
diffuse or continuous versions of Laplacian Eigenmaps).
On top of these basis, new graphs can be learned using k-nearest-neighbors
graphs or additional topological operators. The learned metrics, basis and graphs are stored as different attributes of the
TopoGraph
object.
Finally, different visualizations of the learned topology can be optimized with pyMDE
by solving a
Minimum-Distortion Embedding problem. TopOMetry also implements an adapted, non-uniform
version of the seminal Uniform Manifold Approximation and Projection (UMAP)
for graph layout optimization (we call it MAP for short).
Alternatively, you can use TopOMetry to add topological information to your favorite workflow by using its dimensionality reduced basis to compute k-nearest-neighbors instead of PCA.
Installation and dependencies
TopOMetry requires some pre-existing libraries to power its scalability and flexibility. TopOMetry is implemented in python and builds complex, high-level models
inherited from scikit-learn
BaseEstimator
, making it flexible and easy to apply and/or combine with different workflows on virtually any domain.
- scikit-learn - for general algorithms
- ANNOY - for optimized neighbor index search
- nmslib - for fast and accurate k-nearest-neighbors
- kneed - for finding nice cuttofs
- pyMDE - for optimizing layouts
Prior to installing TopOMetry, make sure you have cmake, scikit-build and setuptools available in your system. If using Linux:
sudo apt-get install cmake
pip3 install scikit-build setuptools
We're also going to need NMSlib for really fast approximate nearest-neighborhood search across different distance metrics. If your CPU supports advanced instructions, we recommend you install nmslib separately for better performance:
pip3 install --no-binary :all: nmslib
Then, you can install TopOMetry and its other requirements with pip:
pip3 install numpy pandas annoy scipy numba torch scikit-learn kneed pymde
pip3 install topometry
Alternatevely, clone this repo and build from source:
git clone https://github.com/davisidarta/topometry
cd topometry
pip3 install .
Quick-start
From a large data matrix data
(np.ndarray, pd.DataFrame or sp.csr_matrix), you can set up a TopoGraph
with default parameters:
import topo.models as tp
# Learn topological metrics and basis from data. The default is to use diffusion harmonics.
tg = tp.TopOGraph()
tg = tg.fit(data)
Note: topo.ml
is the high-level model module which contains the TopOGraph
object.
After learning a topological basis, we can access topological metrics and basis in the TopOGraph
object, and build different
topological graphs.
# Learn a topological graph. Again, the default is to use diffusion harmonics.
tgraph = tg.transform(data)
Then, it is possible to optimize the topological graph layout. The first option is to do so with our adaptation of UMAP (MAP), which will minimize the cross-entropy between the topological basis and its graph:
# Graph layout optimization MAP
map_emb, aux = tp.MAP(tg.MSDiffMaps, tgraph)
The second, albeit most interesting option is to use pyMDE to find a Minimum Distortion Embedding. TopOMetry implements some custom MDE problems within the TopOGraph model:
# Set up MDE problem
mde = tg.MDE(tgraph)
mde_emb = mde.embed()
Tutorials and examples
Coming soon! For now, our extended documentation at ReadTheDocs
Contributing
Contributions are very welcome! If you're interested in adding a new feature, just let me know in the Issues section.
License
Copyright (c) 2021 Davi Sidarta-Oliveira, davisidarta(at)gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for topometry-0.0.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 173b65361b063863db83733d794d07395378df838f3de64011fad4440db4a72a |
|
MD5 | 7ede7d171033039b6848b27f3ec2948f |
|
BLAKE2b-256 | 02190166ebb721b3fed0a6b421d5750de25143d66b9a6414c6749c9e6cfffb68 |