Skip to main content

Persistent homology on Fisher information distances for probability manifolds

Project description

fisher-homology

Persistent homology on Fisher information distances for probability manifolds.

A pure-Python, zero-dependency implementation of topological data analysis (TDA) designed specifically for probability trajectory analysis. Uses the Fisher information arc length as the filtration metric, which correctly expands distances at the tails of probability distributions — exactly where critical events in fraud detection, medical diagnostics, physics experiments, and financial risk live.

Python 3.8+ License: MIT Zero dependencies


Why Fisher distances?

Standard topological data analysis uses Euclidean distance. For probability trajectories, this is the wrong metric.

Euclidean distance treats p=0.001 and p=0.002 the same as p=0.499 and p=0.500 — both have |Δp| = 0.001. But informationally these are completely different: the first pair are both rare events with a large relative difference (the second is twice the first), while the second pair are near-average values with negligible relative difference.

The Fisher arc length is the geodesic distance on the statistical manifold of Bernoulli distributions, equipped with the Fisher information metric g(p) = 1/(p(1−p)):

d_F(p, q) = 2 · |arcsin(√p) − arcsin(√q)|

Properties:

  • Range [0, π] — the full manifold diameter
  • d_F(0, 1) = π — maximum separation
  • Symmetric and satisfies the triangle inequality (true metric)
  • At p = 0.01, Fisher expands distances ~10× vs Euclidean
  • Produces topologically meaningful features for probability trajectories

Installation

pip install fisher-homology

No dependencies required. Works on any Python 3.8+ installation.

Optional dependencies

pip install fisher-homology[numpy]    # faster distance computation
pip install fisher-homology[plot]     # matplotlib visualization helpers
pip install fisher-homology[dev]      # pytest for running the test suite

From source

git clone https://github.com/williamrwilliamson/fisher-homology
cd fisher-homology
pip install -e .

Quick Start

from fisher_homology import FisherHomology
import numpy as np

# Three-phase trajectory: normal → stress → crisis
np.random.seed(42)
states = []
for t in range(15):
    if   t < 5:  p = [0.10 + 0.01*np.random.randn(), 0.12 + 0.01*np.random.randn()]
    elif t < 10: p = [0.45 + 0.03*np.random.randn(), 0.50 + 0.03*np.random.randn()]
    else:        p = [0.85 + 0.02*np.random.randn(), 0.88 + 0.02*np.random.randn()]
    states.append([float(np.clip(x, 0.01, 0.99)) for x in p])

ph     = FisherHomology()
result = ph.fit(states)

print(result.summary())
# Persistence Diagram (fisher metric)
#   States:            15
#   Max epsilon:       4.284
#   β₀ features:       12
#   Bottleneck width:  1.471
#   Phase gaps at ε:   ['1.249', '1.353']
#   Estimated phases:  3
#   Has cycles (β₁):   False

Core API

FisherHomology

ph = FisherHomology(
    n_steps=50,        # filtration resolution
    max_epsilon=None,  # auto-determined from max pairwise distance
)

fit(states, metric='fisher') → PersistenceDiagram

Compute persistent homology of a probability trajectory.

# states: list of T probability vectors, each of length n
# All probabilities must be in (0, 1)
result = ph.fit(states)

compare_trajectories(states_a, states_b) → dict

Compare two trajectories using bottleneck distance.

comparison = ph.compare_trajectories(trajectory_1, trajectory_2)
print(comparison['bottleneck_b0'])      # topological distance
print(comparison['interpretation'])     # human-readable verdict

rips_at_epsilon(states, epsilon) → dict

Snapshot of the Vietoris-Rips complex at a specific ε.

snapshot = ph.rips_at_epsilon(states, epsilon=0.5)
print(snapshot['beta_0'], snapshot['beta_1'])
print(snapshot['euler_characteristic'])

fit_transform(states, return_both_metrics=False) → dict

Compute Fisher and optionally Euclidean diagrams for comparison.

both = ph.fit_transform(states, return_both_metrics=True)
fisher_diag    = both['fisher']
euclidean_diag = both['euclidean']

PersistenceDiagram

Result container returned by FisherHomology.fit().

Attribute Type Description
persistence_b0 list[(birth,death)] β₀ (component) lifetime pairs
persistence_b1 list[dict] β₁ (loop) birth/death events
betti_curve dict[ε→(β₀,β₁)] Betti numbers at each scale
bottleneck_width float Max β₀ lifetime (signal strength)
phase_gaps list[float] ε values of phase transitions
max_epsilon float Filtration range used
n_states int Number of input states
metric str 'fisher' or 'euclidean'
result.n_phases()     # estimated number of distinct regimes
result.has_cycles()   # True if trajectory contains loops (trapped states)
result.summary()      # human-readable summary string

Distance functions

from fisher_homology import fisher_arc, fisher_distance_matrix
from fisher_homology.distances import fisher_arc_position, fisher_gradient

# Scalar arc distance
d = fisher_arc(0.1, 0.9)                 # float in [0, π]

# Arc position (maps probability to manifold position)
pos = fisher_arc_position(0.5)           # = π/2

# Fisher information (gradient of arc w.r.t. p)
info = fisher_gradient(0.3)              # = 1/√(0.3 × 0.7)

# Full pairwise distance matrix
states = [[0.1, 0.2], [0.5, 0.6], [0.9, 0.8]]
D = fisher_distance_matrix(states)       # 3×3 symmetric matrix

# Tail expansion: how much Fisher expands vs Euclidean at p=0.01
ratio = tail_expansion_ratio(0.01)       # ≈ 10.0

Topology functions

from fisher_homology.topology import (
    b0_persistence,
    betti_curve,
    persistence_diagram,
    bottleneck_distance,
    vietoris_rips_betti,
    UnionFind,
)

# β₀ persistence from a distance matrix
pairs = b0_persistence(D, max_epsilon=5.0)

# Betti curve: (β₀, β₁) at each filtration step
curve = betti_curve(D, n_steps=50)

# Full persistence diagram
diag = persistence_diagram(D, n_steps=50)

# Bottleneck distance between two diagrams
dist = bottleneck_distance(diag_a['persistence_b0'],
                           diag_b['persistence_b0'])

# Vietoris-Rips complex at one ε
rips = vietoris_rips_betti(D, epsilon=1.0)

Utils

from fisher_homology.utils import (
    validate_state_sequence,
    normalize_states,
    trajectory_summary,
)

# Validate and clip probability vectors
clean = validate_state_sequence(raw_states)

# Normalize to (0, 1)
normed = normalize_states(states, method='clip')   # or 'scale'

# Descriptive statistics
stats = trajectory_summary(states)
# {'n_states': 15, 'n_dims': 2, 'mean_probs': [...],
#  'std_probs': [...], 'trajectory_length': 4.28}

Interpretation Guide

Phase transitions

A phase gap in the persistence diagram marks a ε value where a large connected component merge occurs — two topologically distinct regimes that were previously separate become reachable from each other.

Large gap → significant phase transition
Small gap → gradual drift, no sharp regime change

Cyclic trapping

A β₁ feature (loop) indicates the trajectory returned to a previously visited region of probability space without escaping. In protein folding this is a misfolding intermediate. In fraud detection it is a network that almost cascades but recovers. In clinical monitoring it is a patient oscillating between two states.

Bottleneck width interpretation

< 0.05 × max_ε   →  all states are in one continuous cloud
0.05–0.15         →  weak phase structure
0.15–0.40         →  moderate phase separation
> 0.40            →  strong, well-separated phases

Fisher vs Euclidean comparison

both = ph.fit_transform(states, return_both_metrics=True)
fisher_phases    = both['fisher'].n_phases()
euclidean_phases = both['euclidean'].n_phases()

if fisher_phases > euclidean_phases:
    print("Fisher detects additional phase structure at the tails.")
    print("Tail events are driving regime separation.")

Applications

The Fisher metric is particularly valuable for probability trajectories where:

  • Rare events matter: fraud detection (p ≈ 0.001), medical diagnosis, gravitational wave detection (p ≈ 0.9999)
  • Phase transitions are critical: protein folding intermediates, market regime shifts, clinical state changes
  • Cycle detection is needed: misfolding loops, oscillating fraud networks, treatment resistance patterns

Running the Tests

# Install with dev dependencies
pip install fisher-homology[dev]

# Run tests
pytest tests/ -v

# Or directly
python tests/test_fisher_homology.py

All 44 tests are non-tautological — each verifies a mathematically provable property against analytical ground truth.


Mathematical Background

Fisher Information Metric

For a Bernoulli distribution parameterised by p, the Fisher information is:

I(p) = 1 / (p(1-p))

The geodesic distance in this Riemannian metric is the Hellinger arc length:

d_F(p, q) = 2 · |arcsin(√p) − arcsin(√q)|

This is equivalent to the angle between the square-root-transformed probability vectors on the unit sphere — a natural geometric interpretation.

Vietoris-Rips Filtration

Given T states with pairwise Fisher distances D[i,j], the Vietoris-Rips complex VR_ε includes all simplices whose diameter is at most ε:

  • ε = 0: T isolated points, β₀ = T, β₁ = 0
  • As ε grows: components merge (β₀ decreases), loops form and fill (β₁ varies)
  • ε = ∞: fully connected, β₀ = 1, β₁ = 0

Persistent Homology

Features are tracked as (birth_ε, death_ε) pairs. Long-lived features (large death − birth) are robust signal. Short-lived features are noise.

The stability theorem (Cohen-Steiner et al. 2007) guarantees: if the input data changes by at most δ (in Fisher distance), the persistence diagram changes by at most δ in bottleneck distance.


References

  • Edelsbrunner, H. & Harer, J. (2010). Computational Topology: An Introduction. AMS.
  • Rao, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91.
  • Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10(4), 507–521.
  • Cohen-Steiner, D., Edelsbrunner, H. & Harer, J. (2007). Stability of persistence diagrams. Discrete & Computational Geometry 37(1), 103–120.
  • Chazal, F. & Michel, B. (2021). An introduction to topological data analysis. Frontiers in Artificial Intelligence 4.

License

MIT License — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fisher_homology-1.0.1.tar.gz (31.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fisher_homology-1.0.1-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file fisher_homology-1.0.1.tar.gz.

File metadata

  • Download URL: fisher_homology-1.0.1.tar.gz
  • Upload date:
  • Size: 31.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for fisher_homology-1.0.1.tar.gz
Algorithm Hash digest
SHA256 5c852cd6a86823fc8a7772c782655f153b323f68ca1277db0d571b8c16e78bad
MD5 347a7e5749a9ce3c12f8321b4c4a3d8a
BLAKE2b-256 aaa9ed1e9d1ca56d1ab87197563a49f146139702a54a854b0c698ff7485d63c2

See more details on using hashes here.

File details

Details for the file fisher_homology-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fisher_homology-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5194b518df3c14462de34c9c55848e5ff31b079b64853448fbdb251fbe96ef07
MD5 5030c2ea29be0b3393f1b09a5021bbbc
BLAKE2b-256 3384f1abdd10a925e53990f8ffde0a733c3d68913c70eea8968d21e77e8956bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page