Skip to main content

Fast, pointwise graph curvature

Project description

Diffusion Curvature

[!INFO] This code is currently in early beta. Some features, particularly those relating to dimension estimation and the construction of comparison spaces, are experimental and will likely change. Please report any issues you encounter to the Github Issues page.

Diffusion curvature is a pointwise extension of Ollivier-Ricci curvature, designed specifically for the often messy world of pointcloud data. Its advantages include:

  1. Unaffected by density fluctuations in data: it inherits the diffusion operator’s denoising properties.
  2. Fast, and scalable to millions of points: it depends only on matrix powering - no optimal transport required.

Install

To install with pip (or better yet, poetry),

pip install diffusion-curvature

or

poetry add diffusion-curvature

Conda releases are pending.

Usage

To compute diffusion curvature, first create a graphtools graph with your data. Graphtools offers extensive support for different kernel types (if creating from a pointcloud), and can also work with graphs in the PyGSP format. We recommend using anistropy=1, and verifying that the supplied knn value encompasses a reasonable portion of the graph.

from diffusion_curvature.datasets import torus
import graphtools
X_torus, torus_gaussian_curvature = torus(n=5000)
G_torus = graphtools.Graph(X_torus, anisotropy=1, knn=30)

Graphtools offers many additional options. For large graphs, you can speed up the powering of the diffusion matrix with landmarking: simply pass n_landmarks=1000 (e.g) when creating the graphtools graph. If you enable landmarking, diffusion-curvature will automatically use it.

Next, instantiate a DiffusionCurvature operator.

from diffusion_curvature.graphtools import DiffusionCurvature
DC = DiffusionCurvature(t=12)

source

DiffusionCurvature

 DiffusionCurvature (t:int, distance_type='PHATE', dimest=None,
                     use_entropy:bool=False, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Type Default Details
t int Number of diffusion steps to use when measuring curvature. TODO: Heuristics
distance_type str PHATE
dimest NoneType None Dimensionality estimator to use. If None, defaults to KNN with default params
use_entropy bool False If true, uses KL Divergence instead of Wasserstein Distances. Faster, seems empirically as good, but less proven.
kwargs

And, finally, pass your graph through it. The DiffusionCurvature operator will store everything it computes – the powered diffusion matrix, the estimated manifold distances, and the curvatures – as attributes of your graph. To get the curvatures, you can run G.ks.

G_torus = DC.curvature(G_torus, dimension=2) # note: this is the intrinsic dimension of the data
plot_3d(X_torus, G_torus.ks, colorbar=True, title="Diffusion Curvature on the torus")

Using on a predefined graph

If you have an adjacency matrix but no pointcloud, diffusion curvature may still be useful. The caveat, currently, is that our intrinsic dimension estimation doesn’t yet support graphs, so you’ll have to compute & provide the dimension yourself – if you want a signed curvature value.

If you’re only comparing relative magnitudes of curvature, you can skip this step.

For predefined graphs, we use our own ManifoldGraph class. You can create one straight from an adjacency matrix:

from diffusion_curvature.manifold_graph import ManifoldGraph, diffusion_curvature, diffusion_entropy_curvature, entropy_of_diffusion, wasserstein_spread_of_diffusion, power_diffusion_matrix, phate_distances, flattened_facsimile_of_graph
# pretend we've computed the adjacency matrix elsewhere in the code
A = G_torus.K.toarray()
# initialize the manifold graph; input your computed dimension along with the adjacency matrix
G_pure = ManifoldGraph(A, dimension=2)
G_pure = diffusion_curvature(G_pure, t=12)
plot_3d(X_torus, G_pure.ks, title = "Diffusion Curvature on Graph")

Alternately, to compute just the relative magnitudes of the pointwise curvatures (without signs), we can directly use either the wasserstein_spread_of_diffusion (which computes the $W_1$ distance from a dirac to its t-step diffusion), or the entropy_of_diffusion function (which computes the entropy of each t-step diffusion). The latter is nice when the manifold’s geodesic distances are hard to estimate – it corresponds to replacing the wasserstein distance with the KL divergence.

# for the wasserstein version, we need manifold distances
G_pure = power_diffusion_matrix(G_pure)
G_pure = phate_distances(G_pure)
ks_wasserstein = wasserstein_spread_of_diffusion(G_pure)
# for the entropic version, we need only power the diffusion operator
G_pure = power_diffusion_matrix(G_pure, t=12)
ks_entropy = entropy_of_diffusion(G_pure)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffusion_curvature-0.0.3.tar.gz (17.2 kB view hashes)

Uploaded Source

Built Distribution

diffusion_curvature-0.0.3-py3-none-any.whl (17.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page