Skip to main content

Persistent homology on Fisher information distances for probability manifolds

Project description

fisher-homology

Persistent homology on Fisher information distances for probability manifolds.

A pure-Python, zero-dependency implementation of topological data analysis (TDA) designed specifically for probability trajectory analysis. Uses the Fisher information arc length as the filtration metric, which correctly expands distances at the tails of probability distributions — exactly where critical events in fraud detection, medical diagnostics, physics experiments, and financial risk live.

Python 3.8+ License: MIT Zero dependencies


Why Fisher distances?

Standard topological data analysis uses Euclidean distance. For probability trajectories, this is the wrong metric.

Euclidean distance treats p=0.001 and p=0.002 the same as p=0.499 and p=0.500 — both have |Δp| = 0.001. But informationally these are completely different: the first pair are both rare events with a large relative difference (the second is twice the first), while the second pair are near-average values with negligible relative difference.

The Fisher arc length is the geodesic distance on the statistical manifold of Bernoulli distributions, equipped with the Fisher information metric g(p) = 1/(p(1−p)):

d_F(p, q) = 2 · |arcsin(√p) − arcsin(√q)|

Properties:

  • Range [0, π] — the full manifold diameter
  • d_F(0, 1) = π — maximum separation
  • Symmetric and satisfies the triangle inequality (true metric)
  • At p = 0.01, Fisher expands distances ~10× vs Euclidean
  • Produces topologically meaningful features for probability trajectories

Installation

pip install fisher-homology

No dependencies required. Works on any Python 3.8+ installation.

Optional dependencies

pip install fisher-homology[numpy]    # faster distance computation
pip install fisher-homology[plot]     # matplotlib visualization helpers
pip install fisher-homology[dev]      # pytest for running the test suite

From source

git clone https://github.com/williamrwilliamson/fisher-homology
cd fisher-homology
pip install -e .

Quick Start

from fisher_homology import FisherHomology
import numpy as np

# Three-phase trajectory: normal → stress → crisis
np.random.seed(42)
states = []
for t in range(15):
    if   t < 5:  p = [0.10 + 0.01*np.random.randn(), 0.12 + 0.01*np.random.randn()]
    elif t < 10: p = [0.45 + 0.03*np.random.randn(), 0.50 + 0.03*np.random.randn()]
    else:        p = [0.85 + 0.02*np.random.randn(), 0.88 + 0.02*np.random.randn()]
    states.append([float(np.clip(x, 0.01, 0.99)) for x in p])

ph     = FisherHomology()
result = ph.fit(states)

print(result.summary())
# Persistence Diagram (fisher metric)
#   States:            15
#   Max epsilon:       4.284
#   β₀ features:       12
#   Bottleneck width:  1.471
#   Phase gaps at ε:   ['1.249', '1.353']
#   Estimated phases:  3
#   Has cycles (β₁):   False

Core API

FisherHomology

ph = FisherHomology(
    n_steps=50,        # filtration resolution
    max_epsilon=None,  # auto-determined from max pairwise distance
)

fit(states, metric='fisher') → PersistenceDiagram

Compute persistent homology of a probability trajectory.

# states: list of T probability vectors, each of length n
# All probabilities must be in (0, 1)
result = ph.fit(states)

compare_trajectories(states_a, states_b) → dict

Compare two trajectories using bottleneck distance.

comparison = ph.compare_trajectories(trajectory_1, trajectory_2)
print(comparison['bottleneck_b0'])      # topological distance
print(comparison['interpretation'])     # human-readable verdict

rips_at_epsilon(states, epsilon) → dict

Snapshot of the Vietoris-Rips complex at a specific ε.

snapshot = ph.rips_at_epsilon(states, epsilon=0.5)
print(snapshot['beta_0'], snapshot['beta_1'])
print(snapshot['euler_characteristic'])

fit_transform(states, return_both_metrics=False) → dict

Compute Fisher and optionally Euclidean diagrams for comparison.

both = ph.fit_transform(states, return_both_metrics=True)
fisher_diag    = both['fisher']
euclidean_diag = both['euclidean']

PersistenceDiagram

Result container returned by FisherHomology.fit().

Attribute Type Description
persistence_b0 list[(birth,death)] β₀ (component) lifetime pairs
persistence_b1 list[dict] β₁ (loop) birth/death events
betti_curve dict[ε→(β₀,β₁)] Betti numbers at each scale
bottleneck_width float Max β₀ lifetime (signal strength)
phase_gaps list[float] ε values of phase transitions
max_epsilon float Filtration range used
n_states int Number of input states
metric str 'fisher' or 'euclidean'
result.n_phases()     # estimated number of distinct regimes
result.has_cycles()   # True if trajectory contains loops (trapped states)
result.summary()      # human-readable summary string

Distance functions

from fisher_homology import fisher_arc, fisher_distance_matrix
from fisher_homology.distances import fisher_arc_position, fisher_gradient

# Scalar arc distance
d = fisher_arc(0.1, 0.9)                 # float in [0, π]

# Arc position (maps probability to manifold position)
pos = fisher_arc_position(0.5)           # = π/2

# Fisher information (gradient of arc w.r.t. p)
info = fisher_gradient(0.3)              # = 1/√(0.3 × 0.7)

# Full pairwise distance matrix
states = [[0.1, 0.2], [0.5, 0.6], [0.9, 0.8]]
D = fisher_distance_matrix(states)       # 3×3 symmetric matrix

# Tail expansion: how much Fisher expands vs Euclidean at p=0.01
ratio = tail_expansion_ratio(0.01)       # ≈ 10.0

Topology functions

from fisher_homology.topology import (
    b0_persistence,
    betti_curve,
    persistence_diagram,
    bottleneck_distance,
    vietoris_rips_betti,
    UnionFind,
)

# β₀ persistence from a distance matrix
pairs = b0_persistence(D, max_epsilon=5.0)

# Betti curve: (β₀, β₁) at each filtration step
curve = betti_curve(D, n_steps=50)

# Full persistence diagram
diag = persistence_diagram(D, n_steps=50)

# Bottleneck distance between two diagrams
dist = bottleneck_distance(diag_a['persistence_b0'],
                           diag_b['persistence_b0'])

# Vietoris-Rips complex at one ε
rips = vietoris_rips_betti(D, epsilon=1.0)

Utils

from fisher_homology.utils import (
    validate_state_sequence,
    normalize_states,
    trajectory_summary,
)

# Validate and clip probability vectors
clean = validate_state_sequence(raw_states)

# Normalize to (0, 1)
normed = normalize_states(states, method='clip')   # or 'scale'

# Descriptive statistics
stats = trajectory_summary(states)
# {'n_states': 15, 'n_dims': 2, 'mean_probs': [...],
#  'std_probs': [...], 'trajectory_length': 4.28}

Interpretation Guide

Phase transitions

A phase gap in the persistence diagram marks a ε value where a large connected component merge occurs — two topologically distinct regimes that were previously separate become reachable from each other.

Large gap → significant phase transition
Small gap → gradual drift, no sharp regime change

Cyclic trapping

A β₁ feature (loop) indicates the trajectory returned to a previously visited region of probability space without escaping. In protein folding this is a misfolding intermediate. In fraud detection it is a network that almost cascades but recovers. In clinical monitoring it is a patient oscillating between two states.

Bottleneck width interpretation

< 0.05 × max_ε   →  all states are in one continuous cloud
0.05–0.15         →  weak phase structure
0.15–0.40         →  moderate phase separation
> 0.40            →  strong, well-separated phases

Fisher vs Euclidean comparison

both = ph.fit_transform(states, return_both_metrics=True)
fisher_phases    = both['fisher'].n_phases()
euclidean_phases = both['euclidean'].n_phases()

if fisher_phases > euclidean_phases:
    print("Fisher detects additional phase structure at the tails.")
    print("Tail events are driving regime separation.")

Applications

The Fisher metric is particularly valuable for probability trajectories where:

  • Rare events matter: fraud detection (p ≈ 0.001), medical diagnosis, gravitational wave detection (p ≈ 0.9999)
  • Phase transitions are critical: protein folding intermediates, market regime shifts, clinical state changes
  • Cycle detection is needed: misfolding loops, oscillating fraud networks, treatment resistance patterns

Running the Tests

# Install with dev dependencies
pip install fisher-homology[dev]

# Run tests
pytest tests/ -v

# Or directly
python tests/test_fisher_homology.py

All 44 tests are non-tautological — each verifies a mathematically provable property against analytical ground truth.


Mathematical Background

Fisher Information Metric

For a Bernoulli distribution parameterised by p, the Fisher information is:

I(p) = 1 / (p(1-p))

The geodesic distance in this Riemannian metric is the Hellinger arc length:

d_F(p, q) = 2 · |arcsin(√p) − arcsin(√q)|

This is equivalent to the angle between the square-root-transformed probability vectors on the unit sphere — a natural geometric interpretation.

Vietoris-Rips Filtration

Given T states with pairwise Fisher distances D[i,j], the Vietoris-Rips complex VR_ε includes all simplices whose diameter is at most ε:

  • ε = 0: T isolated points, β₀ = T, β₁ = 0
  • As ε grows: components merge (β₀ decreases), loops form and fill (β₁ varies)
  • ε = ∞: fully connected, β₀ = 1, β₁ = 0

Persistent Homology

Features are tracked as (birth_ε, death_ε) pairs. Long-lived features (large death − birth) are robust signal. Short-lived features are noise.

The stability theorem (Cohen-Steiner et al. 2007) guarantees: if the input data changes by at most δ (in Fisher distance), the persistence diagram changes by at most δ in bottleneck distance.


References

  • Edelsbrunner, H. & Harer, J. (2010). Computational Topology: An Introduction. AMS.
  • Rao, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91.
  • Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10(4), 507–521.
  • Cohen-Steiner, D., Edelsbrunner, H. & Harer, J. (2007). Stability of persistence diagrams. Discrete & Computational Geometry 37(1), 103–120.
  • Chazal, F. & Michel, B. (2021). An introduction to topological data analysis. Frontiers in Artificial Intelligence 4.

License

MIT License — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fisher_homology-1.0.0.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fisher_homology-1.0.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file fisher_homology-1.0.0.tar.gz.

File metadata

  • Download URL: fisher_homology-1.0.0.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for fisher_homology-1.0.0.tar.gz
Algorithm Hash digest
SHA256 13c54364497632de724c3e655292e4876ed9293dc9ad17dc99759e3878c8a683
MD5 0d478f05f881c5135541818c8035784e
BLAKE2b-256 f4a8cf657882a5ccc6325964229115bf898b96c65a39ea2ad0b1d126f6188f54

See more details on using hashes here.

File details

Details for the file fisher_homology-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fisher_homology-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e981893475c0cbd81a770064156901e73376df6ba42e05e9313fd83289733d4e
MD5 e7dfe114281c2dc297bbada65e063a1d
BLAKE2b-256 6531de1443643bc60da0bcc59e18b6c910dc1277a1c728e07c4c8013c406d53e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page