Skip to main content

Computing the Gromov–Hausdorff distance

Project description

dGH

Given the distance matrices of metric spaces $X$ and $Y$, estimates the Gromov–Hausdorff distance $$d_\text{GH}(X, Y) = \frac{1}{2}\inf_{f:X\to Y, g:Y\to X} \text{dis}\Bigg(\Big\{\big(x, f(x)\big): x \in X\Big\} \cup \Big\{\big(g(y), y\big): y \in Y\Big\}\Bigg),$$ where $$\text{dis}(R) = \sup_{(x, y), (x', y') \in R} \big|d_X(x, x') - d_Y(y, y')\big|$$ is the distortion of a relation $R \subseteq X \times Y$.

The distance is estimated from above by minimizing its parametric relaxation whose solutions are guaranteed to deliver $d_\text{GH}(X, Y)$ for a sufficiently large value of the parameter $c>1$. The quadratic relaxation with affine constraints is minimized using conditional gradient descent in $O(n^3)$ time per iteration, where $n = \max\big\{|X|, |Y|\big\}$. The retrieved minimum is an upper bound of (and in many cases equals to) the Gromov–Hausdorff distance $d_\text{GH}(X, Y)$.

A detailed description of the relaxation, its optimality guarantees and optimization landscape, and the approach to minimizing it can be found in Computing the Gromov–Hausdorff distance using first-order methods.

Installation

To install the package from Python Package Index:

$ pip install dgh

Quickstart

Consider $X$ comprised by the vertices of a $1 \times 10$ rectangle and $Y$ — by the vertices of a unit equilateral triangle together with a point that is 10 away from each of them (see illustration).

Illustration of the example

To create their distance matrices (the $(i,j)$-th entry stores the distance between the $i$-th and $j$-th points):

>>> import numpy as np
>>> X = np.array([[0, 1, 10, 10],
...               [0, 0, 10, 10],
...               [0, 0, 0, 1],
...               [0, 0, 0, 0]])
>>> X += X.T
>>> Y = np.array([[0, 1, 1, 10],
...               [0, 0, 1, 10],
...               [0, 0, 0, 10],
...               [0, 0, 0, 0]])
>>> Y += Y.T

To compute (an upper bound of) their Gromov–Hausdorff distance $d_\text{GH}(X, Y)$:

>>> import dgh
>>> dGH = dgh.upper(X, Y)
>>> dGH
0.5

In this case, the distance $d_\text{GH}(X, Y)=\frac{1}{2}$ is computed exactly.

Iteration budget

By default, the algorithm is allocated 100 iterations of conditional gradient descent. The algorithm restarts from a random point every time after converging to an approximate solution (i.e. a stationary point) until the iteration budget is depleted. Bigger budget generally means longer run time and better accuracy.

To set the iteration budget:

>>> dGH = dgh.upper(X, Y, iter_budget=20)
>>> dGH
0.5

Optimal mapping pair

Every solution is a mapping pair $(f:X\to Y, g:Y\to X)$. To access the mapping pair delivering the retrieved minimum:

>>> dGH, f, g = dgh.upper(X, Y, return_fg=True)
>>> f
[2, 2, 3, 3]
>>> g
[1, 1, 1, 2]

The $i$-th entry in either mapping stores (the index of) the image of its codomain's $i$-th point. For example, here $g(y_3)=x_2$.

Relaxation parameter $c>1$

Explicitly specifying $c$ can improve the performance of the algorithm. Small $c$ makes the relaxation easier to minimize, but its solutions are more likely to deliver the Gromov–Hausdorff distance when $c$ is large.

By default, the method allocates half of the iteration budget to select the best value of $c$ from $1+10^{-4}, 1+10^{-2},\ldots,1+10^8$, and then spends the remaining half on refining the Gromov–Hausdorff distance using this $c$. You can specify $c$ explicitly to see if it results in better accuracy and/or to save iterations on the search.

To see the value of $c$ selected after the search (along with the run summary):

>>> dgh.upper(X, Y, verbose=1)
iteration budget 100 | c=auto | dGH≥0
spent 49 iterations to choose c=1.0001
proved dGH≤0.5 after 40 restarts

To specify $c$ explicitly:

>>> dGH = dgh.upper(X, Y, c=1000)
>>> dGH
0.5

Contributing

If you found a bug or want to suggest an enhancement, you can create a GitHub Issue. Alternatively, you can email vlad.oles (at) proton (dot) me.

License

dGH is released under the MIT license.

Research

To cite dGH, you can use the following:

Oles, V. (2023). Computing the Gromov–Hausdorff distance using first-order methods. arXiv preprint arXiv:2307.13660.

@article{oles2023computing,
  title={Computing the Gromov--Hausdorff distance using first-order methods},
  author={Oles, Vladyslav},
  journal={arXiv preprint arXiv:2307.13660},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgh-0.1.3.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

dgh-0.1.3-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file dgh-0.1.3.tar.gz.

File metadata

  • Download URL: dgh-0.1.3.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for dgh-0.1.3.tar.gz
Algorithm Hash digest
SHA256 78257ddc6e5ff4dba766564aee26900d154b55913ab27ab1be60afb48b66873d
MD5 991ba042901c89497d9aa5738d52be91
BLAKE2b-256 08fe6999bb5098964d0f113f1ca3f55d9771175844a1638369a71d86cede2c36

See more details on using hashes here.

File details

Details for the file dgh-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: dgh-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for dgh-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1b6b5af8d14121356f0ded2509597c6cf0e319ff70ee19b8ba5f85b8010ca1ef
MD5 953f41c80a6d845ed5d671ee17d155c0
BLAKE2b-256 bc2f40c19af3b5da4a9d7b546aaad1e7e5175270af8a5f672d7633fd150668f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page