Skip to main content

computing the Gromov–Hausdorff distance

Project description

dGH

Given the distance matrices of metric spaces $X$ and $Y$, estimates the Gromov–Hausdorff distance

$$d_\text{GH}(X, Y) = \frac{1}{2}\min_{f:X\to Y, g:Y\to X} \text{dis}\Bigg(\Big\{\big(x, f(x)\big): x \in X\Big\} \cup \Big\{\big(g(y), y\big): y \in Y\Big\}\Bigg),$$

where

$$\text{dis}(R) = \max_{(x, y), (x', y') \in R} \left|d_X(x, x') - d_Y(y, y')\right|$$

is the distortion of a relation $R \subseteq X \times Y$.

The distance is estimated from above by solving its parametric relaxation whose minima are guaranteed to deliver $d_\text{GH}(X, Y)$ for a sufficiently large value of the parameter $c>1$. The indefinite quadratic relaxation is solved using conditional gradient descent in $O(n^3)$ time per iteration, where $n$ is the total number of points in $X$ and $Y$. The retrieved approximate minimum is an upper bound of (and in many cases equals to) the Gromov–Hausdorff distance $d_\text{GH}(X, Y)$.

A detailed description of the relaxation (including its optimality guarantees, optimization landscape, and the minimization algorithm), along with guidance for selecting the relaxation parameter $c$, can be found in Computing the Gromov–Hausdorff distance using gradient methods.

Installation

To install the package from Python Package Index:

$ pip install dgh

You also need to install jax because dgh relies on JAX for optimized matrix computations:

$ pip install jax

Make sure to choose the version of JAX that makes use of your hardware accelerator. For example, if you have a CUDA GPU, replace jax with jax[gpu] in the above.

Quickstart

Consider $X$ comprised by the vertices of a $1 \times 10$ rectangle and $Y$ — by the vertices of a unit equilateral triangle together with a point that is 10 away from each of them (see illustration).

Illustration of the example

To create their distance matrices, whose $(i,j)$-th entry stores the distance between the $i$-th and $j$-th points:

>>> import numpy as np
>>> X = np.array([[0, 1, 10, 10],
...               [0, 0, 10, 10],
...               [0, 0, 0, 1],
...               [0, 0, 0, 0]])
>>> X += X.T
>>> Y = np.array([[0, 1, 1, 10],
...               [0, 0, 1, 10],
...               [0, 0, 0, 10],
...               [0, 0, 0, 0]])
>>> Y += Y.T

To compute (an upper bound of) their Gromov–Hausdorff distance $d_\text{GH}(X, Y)$:

>>> import dgh
>>> dGH = dgh.upper(X, Y)
>>> dGH
0.5

In this case, the distance $d_\text{GH}(X, Y)=\frac{1}{2}$ is computed exactly.

Iteration budget

By default, the algorithm is allocated 100 iterations of conditional gradient descent. The algorithm restarts from a random point every time after converging to an approximate solution (i.e. a stationary point) until the iteration budget is depleted. Bigger budget generally means longer run time and better accuracy.

To set the iteration budget:

>>> dGH = dgh.upper(X, Y, iter_budget=20)
>>> dGH
0.5

Optimal mapping pair

Every solution is a mapping pair $(f:X\to Y, g:Y\to X)$. To access the mapping pair delivering the retrieved minimum:

>>> dGH, f, g = dgh.upper(X, Y, return_fg=True)
>>> f
[2, 2, 3, 3]
>>> g
[1, 1, 1, 2]

The $i$-th entry in either mapping stores (the index of) the image of its codomain's $i$-th point. For example, here $g(y_3)=x_2$.

Relaxation parameter $c>1$

Explicitly specifying the relaxation parameter $c$ can improve the performance of the algorithm. Small $c$ makes the relaxation easier to minimize, but its solutions are more likely to deliver the Gromov–Hausdorff distance when $c$ is large.

By default, the method allocates half of the iteration budget to select the best value of $c$ from $1+10^{-4}, 1+10^{-2},\ldots,1+10^8$, and then spends the remaining half on refining the Gromov–Hausdorff distance using this $c$. You can specify $c$ explicitly to see if it results in better accuracy and/or to save iterations on the search.

To see the value of $c$ selected after the search (along with the run summary):

>>> dgh.upper(X, Y, verbose=1)
iteration budget 100 | c=auto | dGH≥0
spent 49 iterations to choose c=1.0001
proved dGH≤0.5 after 40 restarts

To specify $c$ explicitly:

>>> dGH = dgh.upper(X, Y, c=1000)
>>> dGH
0.5

Contributing

If you found a bug or want to suggest an enhancement, you can create a GitHub Issue. Alternatively, you can email vlad.oles (at) proton (dot) me.

Once approved, you can follow steps in CONTRIBUTING.md to contribute the changes.

License

dGH is released under the MIT license.

Research

To cite dGH, you can use the following:

Oles, V. (2023). Computing the Gromov–Hausdorff distance using gradient methods. arXiv preprint arXiv:2307.13660.

@article{oles2023computing,
  title={Computing the Gromov--Hausdorff distance using gradient methods},
  author={Oles, Vladyslav},
  journal={arXiv preprint arXiv:2307.13660},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgh-0.2.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dgh-0.2.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file dgh-0.2.0.tar.gz.

File metadata

  • Download URL: dgh-0.2.0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for dgh-0.2.0.tar.gz
Algorithm Hash digest
SHA256 587cd0db5d68e1014ec485efcf0944bde31aa2db0cd201dec5512e977e490ef7
MD5 6cde767814dcb99c4382c0a0263860e0
BLAKE2b-256 267366a9381933546add356bfbbe5bc205cc5a16119d9c3e9ef3404ca890f1f3

See more details on using hashes here.

File details

Details for the file dgh-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dgh-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for dgh-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71f290fc78b0cf25f6a626e7351d6336923a7220e931dab22668d4d11fbff6d7
MD5 bbc34e22cffbd2f9294873816b91bcf4
BLAKE2b-256 0a5473e9d0581f21772f1d3bd65ec4f0c93edd423e89c8daae2ec1dee5a3ec79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page