Statistical hypothesis testing for persistence diagrams and barcodes

These details have not been verified by PyPI

Project description

TDA-PHANTOM

Topological data analysis - Persistent Homology Analysis via Null Testing On Manifolds (TDA-PHANTOM) is a tool for statistically analysing significance of persistence diagrams and barcodes.

This project implements hypothesis tests from:

Confidence Sets for Persistence Diagrams Fasy et al. (2014) DOI: https://doi.org/10.1214/14-AOS1252
A Universal Null-Distribution for Topological Data Analysis Bobrowski and Skraba (2023) DOI: https://doi.org/10.1038/s41598-023-37842-2

Installation

Via PyPI:

pip install tdaphantom

Or you can clone this repository - TDA-PHANTOM and install it manually:

python setup.py install

Overview

This tool can build a Vietoris-Rips complex from either a point cloud or distance matrix. It can then be used to visualise the persistence diagram for that complex, and run various hypothesis tests for it. The results of these hypothesis tests can be analysed via a return results array, or visualised in a signifiance persistence diagram.

Example Usage

Generate data

def _make_circle(n=2000, noise=0.03, seed=1):
    rng   = np.random.default_rng(seed)
    theta = rng.uniform(0, 2 * np.pi, n)
    pts   = np.stack([np.cos(theta), np.sin(theta)], axis=1)
    return pts + rng.normal(0, noise, pts.shape)
pc_circle = _make_circle()

Init Phantom class

phantom_circle = Phantom(pc_circle)

Calculate persistence diagram

Here we go up to homological dimension 1

phantom_circle.calculate_dgms_from_point_cloud_ripser(k=1)

Display persistence diagram

phantom_circle.display_dgms()

![Persistence diagram for a circle using phatom]()

Run hypothesis test

alpha      = 0.01
correction = None
methods    = ["universal_null"]
phantom_circle.hypothesis_test(alpha, methods=methods, k=1)

Display signifiance persistence diagram

phantom_circle.display_results()

![signifiance Persistence diagram for a circle using phatom]()

Basic useage

The general workflow is:

1. Initalise Phantom

Initalise with either a point cloud or distance matrix. If using a distance matrix, set is_distance_matrix=True.

from tdaphantom.tdaphantom import Phantom

# from a point cloud
phantom = Phantom(point_cloud)

# from a distance matrix
phantom = Phantom(distance_matrix, is_distance_matrix=True)

2. Generate persistence diagrams

Two backends are available — GUDHI and Ripser. Ripser is faster for Vietoris-Rips persistence. Both take k as a convenience argument to set the maximum homological dimension to compute.

This will work even if you have inputted a distance matrix

# gudhi backend
phantom.calculate_dgms_from_point_cloud(k=1)

# ripser backend
phantom.calculate_dgms_from_point_cloud_ripser(k=1)

3. Display persistence diagrams

Call display_dgms to plot the persistence diagram and/or barcode. Each homological dimension is shown in a different colour. You may also pass in a pre-made diagram dict of the form {0: dgm_0, 1: dgm_1, ...}.

phantom.display_dgms()                  # both diagram and barcode
phantom.display_dgms(plot="diagram")    # persistence diagram only
phantom.display_dgms(plot="barcode")    # barcode only

4. Run hypothesis tests

Run one or more hypothesis tests on the persistence diagram for a given dimension k. You may optionally pass per-method options as a list of dicts aligned to the methods list. Note: bottleneck methods require the point cloud or distance matrix to be available.

The below code snippet shows the default values

results = phantom.hypothesis_test(
    alpha   = 0.05,
    methods = ["universal_null", "bottleneck"],
    k = 1,
)

With custom options:

results = phantom.hypothesis_test(
    alpha   = 0.05,
    methods = ["universal_null", "bottleneck"],
    options = [
        {"correction_strategy": "Bonferroni"},   # options for universal_null
        {"max_depth": 100, "b_multiplier": 3.5}, # options for bottleneck
    ],
    k = 1,
)

5. Display results

Call display_results to visualise which bars are significant. Significant bars/points are shown in red, noise in blue. A threshold line is drawn on the signifiance persistence diagram showing the significance boundary.

phantom.display_results()                        # all methods, both plots
phantom.display_results(method="universal_null") # specific method
phantom.display_results(plot="diagram")          # signifiance persistence diagram only

6. Save and load

At any point you can save the full state of Phantom to disk and reload it later.

phantom.save("./saved/circle.phantom")
phantom = Phantom.load("./saved/circle.phantom")

Avaliable hypothesis methods

Method	Alias	Requires
`universal_null:median`	`universal_null`	diagram only
`universal_null:mean`	—	diagram only
`bottleneck:subsample`	`bottleneck`	point cloud or distance matrix

Universal null median

Useage

phantom.hypothesis_test(alpha=0.05, methods=["universal_null"], k=1)

With options:

phantom.hypothesis_test(
    alpha   = 0.05,
    methods = ["universal_null"],
    options = [{"correction_strategy": "Bonferroni", "max_threshold": 10.0}],
    k       = 1,
)

Available options:

Option	Default	Description
`correction_strategy`	`None`	Multiple testing correction. `"Bonferroni"` or `"BH"` for Benjamini hochberg or `None`.
`max_threshold`	10 * max finite death	Threshold substituted for infinite death values.
`max_depth`	`1000`	Maximum iterations for the infinite cycle threshold algorithm.

Theory

Bobrowski and Skraba (2023) show that the death/birth ratio $\pi = \text{death}/\text{birth}$ for noise bars follows a universal limiting distribution — the left-skewed Gumbel (LGumbel) — regardless of the underlying point cloud distribution or geometry.

The normalised statistic for each bar is:

\ell(p) = A \cdot \log \log \pi(p) + B

where:

\pi(p) = \frac{\text{death}}{\text{birth}}, \quad
A = 1 \text{ (Vietoris-Rips)}, \quad
B = -\gamma - A \cdot \hat{L}, \quad
\hat{L} = \text{median}(\log \log \pi) \text{ over all finite bars}, \quad
\gamma \approx 0.5772  \text{ (Euler-Mascheroni constant)},

Under the null hypothesis (noise), $\ell(p)$ follows $\text{LGumbel}(0,1)$. The p-value for each bar is the LGumbel survival function:

p_i = \exp(-\exp(\ell_i))

Significance is assessed after Bonferroni or BH correction across all bars. The threshold line on the persistence diagram is $\text{death} = \text{birth} \cdot \pi^$ where $\pi^$ is the minimum $\pi$ such that a bar would be rejected at the corrected alpha level.

Infinite bars are handled with Bobrowski and Skraba (2023) algorithm 1

τ ← τ₀
do
    D ← dgmₖ(τ)
    I ← { b : (b, d) ∈ D, d = ∞, and τ/b < π_min(α/|D|) }
    τ ← τ_new
while |I| > 0
return τ

where:

\tau_{\text{new}} = \begin{cases} \min(I) \cdot \pi_{\min}(\alpha / |D|) & I \neq \emptyset \\ \tau & I = \emptyset \end{cases}

where $\tau_0$ is the initial threshold, $\text{dgm}k(\tau)$ is the persistence diagram truncated at $\tau$, and $\pi{\min}(\alpha/|D|)$ is the minimum death/birth ratio such that a bar is significant at the Bonferroni-corrected level $\alpha/|D|$.

Universal null mean

Useage

phantom.hypothesis_test(alpha=0.05, methods=["universal_null:mean"], k=1)

Theory

Identical to universal null median, but $\hat{L}$ is estimated using the mean of $\log \log \pi$ rather than the median. The paper's exact formula uses the mean. The median is more robust to contamination from signal bars, but the mean is theoretically motivated by the paper's universality result.

Bottleneck subsampling

Useage

phantom.hypothesis_test(alpha=0.05, methods=["bottleneck"], k=1)

With options:

phantom.hypothesis_test(
    alpha   = 0.05,
    methods = ["bottleneck"],
    options = [{"max_depth": 100, "b_multiplier": 3.5}],
    k       = 1,
)

Available options:

Option	Default	Description
`max_depth`	`50`	Number of bootstrap subsamples N.
`b_multiplier`	`3.5`	The constant used for $O(\frac{n}{\log(n)})$. Larger values give smaller c_n and more power but looser theoretical guarantees.

Theory

Fasy et al. (2014) 4.1 derive a confidence set for the persistence diagram of the form:

\mathcal{C} = \{ Q : W_\infty(\hat{P}, P) \leq c_n \}

where $W_\infty$ is the bottleneck distance and $c_n$ is estimated by subsampling.

For each of $N$ subsamples $S^*_b$ of size $b$ drawn without replacement from $S_n$:

T_j = H(S_n, S^*_b) \quad \text{(Hausdorff distance between point sets)}

c_n = \text{quantile}(\{T_j\},\, 1 - \alpha)

By the bottleneck stability theorem, $W_\infty(\hat{P}, P) \leq H(S_n, M)$, so bars with persistence $> 2 c_n$ are significant at level $\alpha$.

The p-value for bar $i$ is the fraction of bootstrap distances exceeding half its persistence:

p_i = \frac{1}{N} \sum_j \mathbf{1}\!\left(T_j \geq \frac{\ell_i}{2}\right)

The error bound from the paper is:

P(H(S_n, M) > c_n) \leq \alpha + O\!\left(\left(\frac{b}{n}\right)^{1/4}\right)

so the bias term shrinks as $b/n \to 0$. In practice, larger $b$ gives more power at the cost of strict theoretical guarantees on the type I error. The default

b = int(3.5*(n / np.log(n)))

is a practical choice.

Bottleneck shells

Useage

TODO: not yet implemented.

# phantom.hypothesis_test(alpha=0.05, methods=["bottleneck:shells"], k=1)

Available options:

Option	Default	Description
`max_depth`	`50`	Number of bootstrap subsamples N.

Theory

TODO: not yet implemented.

Fasy et al. (2014) 4.3 Note: the method of shells requires that $P$ satisfies a uniform lower bound on local density within each shell. It does not apply to distributions with arbitrarily sparse regions.

Bottleneck density

Useage

TODO: not yet implemented.

# phantom.hypothesis_test(alpha=0.05, methods=["bottleneck:density"], k=1)

Available options:

Option	Default	Description
`max_depth`	`50`	Number of bootstrap subsamples N.
`bandwidth`	`None`	KDE bandwidth h. Estimated from data if not supplied.

Theory

TODO: not yet implemented.

Fasy et al. (2014) 4.4 estimates the underlying density $p$ using a kernel density estimator $\hat{p}_h$ and computes the persistence diagram of the upper level sets of $\hat{p}_h$.

Let $\hat{p}_h$ be a KDE with bandwidth $h$:

The density method is robust to outliers because the KDE 'smooths them out', unlike the distance-function methods which are sensitive to isolated points far from $M$.

Bottleneck concentration

Useage

TODO: not yet implemented.

# phantom.hypothesis_test(alpha=0.05, methods=["bottleneck:concentration"], k=1)

Available options:

Option	Default	Description
`max_depth`	`50`	Number of bootstrap subsamples N.
`rho`	`None`	Minimum local density of P on M. Estimated from data if not supplied.
`d`	`None`	Intrinsic dimension of M. Inferred from point cloud if not supplied.

Theory

TODO: not yet implemented. Fasy et al. (2014) 4.2

TODO

Add bottleneck shells
Add bottleneck density
Add bottleneck concentration
Add more integeration tests
Add more unit tests

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Mathematics

Release history Release notifications | RSS feed

1.0.6

May 18, 2026

1.0.5

May 18, 2026

This version

1.0.4

May 18, 2026

1.0.3

May 18, 2026

1.0.2

May 17, 2026

1.0.1

May 17, 2026

1.0.0

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdaphantom-1.0.4.tar.gz (20.9 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tdaphantom-1.0.4-py3-none-any.whl (19.2 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file tdaphantom-1.0.4.tar.gz.

File metadata

Download URL: tdaphantom-1.0.4.tar.gz
Upload date: May 18, 2026
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for tdaphantom-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`002d3a35c4d6e1d52c71eefd346b603d7727ded7ed4a87ab4858908f80927db1`
MD5	`deb4dbe14550f9d614a5b92df2e30116`
BLAKE2b-256	`9142dd0adcf39c38dafb28aa25ce929d9d199d7f359ebc26e8ae4e66513c5783`

See more details on using hashes here.

File details

Details for the file tdaphantom-1.0.4-py3-none-any.whl.

File metadata

Download URL: tdaphantom-1.0.4-py3-none-any.whl
Upload date: May 18, 2026
Size: 19.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for tdaphantom-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05bcd8635d13906b313489591887a881fef36bdb2a6317c340d62d175c1a5491`
MD5	`7705f538db6127b9cb35c50b317e3235`
BLAKE2b-256	`c3acb158406d785c422b70049d5358eaef38b15f7f7221d013c4215b42bf09b3`

See more details on using hashes here.

tdaphantom 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

TDA-PHANTOM

Installation

Overview

Example Usage

Generate data

Init Phantom class

Calculate persistence diagram

Display persistence diagram

Run hypothesis test

Display signifiance persistence diagram

Basic useage

1. Initalise Phantom

2. Generate persistence diagrams

3. Display persistence diagrams

4. Run hypothesis tests

5. Display results

6. Save and load

Avaliable hypothesis methods

Universal null median

Useage

Theory

Universal null mean

Useage

Theory

Bottleneck subsampling

Useage

Theory

Bottleneck shells

Useage

Theory

Bottleneck density

Useage

Theory

Bottleneck concentration

Useage

Theory

TODO

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes