Skip to main content

Fast, pointwise graph curvature

Project description

Diffusion Curvature

[!INFO] This code is currently in early beta. Some features, particularly those relating to dimension estimation and the construction of comparison spaces, are experimental and will likely change. Please report any issues you encounter to the Github Issues page.

Diffusion curvature is a pointwise extension of Ollivier-Ricci curvature, designed specifically for the often messy world of pointcloud data. Its advantages include:

  1. Unaffected by density fluctuations in data: it inherits the diffusion operator’s denoising properties.
  2. Fast, and scalable to millions of points: it depends only on matrix powering - no optimal transport required.

Install

To install with pip (or better yet, poetry),

pip install diffusion-curvature

or

poetry add diffusion-curvature

Conda releases are pending.

Usage

To compute diffusion curvature, first create a graphtools graph with your data. Graphtools offers extensive support for different kernel types (if creating from a pointcloud), and can also work with graphs in the PyGSP format. We recommend using anistropy=1, and verifying that the supplied knn value encompasses a reasonable portion of the graph.

from diffusion_curvature.datasets import torus
import graphtools
X_torus, torus_gaussian_curvature = torus(n=5000)
G_torus = graphtools.Graph(X_torus, anisotropy=1, knn=30)

Graphtools offers many additional options. For large graphs, you can speed up the powering of the diffusion matrix with landmarking: simply pass n_landmarks=1000 (e.g) when creating the graphtools graph. If you enable landmarking, diffusion-curvature will automatically use it.

Next, instantiate a DiffusionCurvature operator.

from diffusion_curvature.graphtools import DiffusionCurvature
DC = DiffusionCurvature(t=12)

source

DiffusionCurvature

 DiffusionCurvature (t:int, distance_type='PHATE', dimest=None,
                     use_entropy:bool=False, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Type Default Details
t int Number of diffusion steps to use when measuring curvature. TODO: Heuristics
distance_type str PHATE
dimest NoneType None Dimensionality estimator to use. If None, defaults to KNN with default params
use_entropy bool False If true, uses KL Divergence instead of Wasserstein Distances. Faster, seems empirically as good, but less proven.
kwargs

And, finally, pass your graph through it. The DiffusionCurvature operator will store everything it computes – the powered diffusion matrix, the estimated manifold distances, and the curvatures – as attributes of your graph. To get the curvatures, you can run G.ks.

G_torus = DC.curvature(G_torus, dimension=2) # note: this is the intrinsic dimension of the data
plot_3d(X_torus, G_torus.ks, colorbar=True, title="Diffusion Curvature on the torus")

Using on a predefined graph

If you have an adjacency matrix but no pointcloud, diffusion curvature may still be useful. The caveat, currently, is that our intrinsic dimension estimation doesn’t yet support graphs, so you’ll have to compute & provide the dimension yourself – if you want a signed curvature value.

If you’re only comparing relative magnitudes of curvature, you can skip this step.

For predefined graphs, we use our own ManifoldGraph class. You can create one straight from an adjacency matrix:

from diffusion_curvature.manifold_graph import ManifoldGraph, diffusion_curvature, diffusion_entropy_curvature, entropy_of_diffusion, wasserstein_spread_of_diffusion, power_diffusion_matrix, phate_distances, flattened_facsimile_of_graph
# pretend we've computed the adjacency matrix elsewhere in the code
A = G_torus.K.toarray()
# initialize the manifold graph; input your computed dimension along with the adjacency matrix
G_pure = ManifoldGraph(A, dimension=2)
G_pure = diffusion_curvature(G_pure, t=12)
plot_3d(X_torus, G_pure.ks, title = "Diffusion Curvature on Graph")

Alternately, to compute just the relative magnitudes of the pointwise curvatures (without signs), we can directly use either the wasserstein_spread_of_diffusion (which computes the $W_1$ distance from a dirac to its t-step diffusion), or the entropy_of_diffusion function (which computes the entropy of each t-step diffusion). The latter is nice when the manifold’s geodesic distances are hard to estimate – it corresponds to replacing the wasserstein distance with the KL divergence.

# for the wasserstein version, we need manifold distances
G_pure = power_diffusion_matrix(G_pure)
G_pure = phate_distances(G_pure)
ks_wasserstein = wasserstein_spread_of_diffusion(G_pure)
# for the entropic version, we need only power the diffusion operator
G_pure = power_diffusion_matrix(G_pure, t=12)
ks_entropy = entropy_of_diffusion(G_pure)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffusion_curvature-0.0.3.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffusion_curvature-0.0.3-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file diffusion_curvature-0.0.3.tar.gz.

File metadata

  • Download URL: diffusion_curvature-0.0.3.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for diffusion_curvature-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3806fad5d10b608b47fb2f29b235de87ef694554253e7393934a8bac75f955e9
MD5 f0672e9f2fa771789ebc4315b9975cab
BLAKE2b-256 600a1ed404400df5d2cc932f84c981278d8369fcb5f0a48719935bd4c9a9cc5e

See more details on using hashes here.

File details

Details for the file diffusion_curvature-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for diffusion_curvature-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 954f31acf1e5cc2fd75d48a004c6abe440710ce31fd6d3ea8e1cac1d5ccc4a13
MD5 3f7ffbe941f1376442e56d3f5e2b2f48
BLAKE2b-256 24800022745e59ef1eb76f6c6fa8737235bfdd401b7c73ddae4e075379cfc884

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page