Skip to main content

Manifold-aware semantic and relational affinity metrics using PHATE

Project description

PHATE Manifold Metrics

Python 3.9+ License: MIT

Manifold-aware semantic and relational affinity metrics using PHATE.

Overview

Compute Semantic Affinity (SA) and Relational Affinity (RA) metrics that leverage manifold geometry to capture non-Euclidean structure in embedding spaces.

Key Features

Multi-scale Analysis: Compare metrics at t=1 (baseline) vs t=5-6 (manifold) ✨ Multiple RA Variants: Euclidean, Geodesic, Diffusion ✨ Clustering-free SA: Distribution-based (no labels required) ✨ Analogy Support: Specialized 4-word analogy methods ✨ Optional Loaders: FastText, LaBSE, Ollama, OpenRouter ✨ Dataset Utilities: CSV loading and parsing

Installation

# Core metrics only
pip install phate-manifold-metrics

# With embedding loaders
pip install phate-manifold-metrics[embeddings]

# Development (includes pytest, black, mypy)
pip install phate-manifold-metrics[all]

Quick Start

import numpy as np
from phate_manifold_metrics import PhateManifoldMetrics

# Load/generate embeddings
embeddings = np.random.randn(100, 384)

# Initialize & fit
metrics = PhateManifoldMetrics(knn=5, t=6)
metrics.fit(embeddings)

# Define word pairs
pairs = [(0,1), (2,3), (4,5)]

# Compute SA
sa = metrics.compute_semantic_affinity(pairs)
print(f"SA: {sa['sa_score']:.3f}")

# Compute RA variants
ra_euc = metrics.compute_relational_affinity_euc(pairs)
ra_geo = metrics.compute_relational_affinity_geo(pairs)
ra_dif = metrics.compute_relational_affinity_dif(pairs)

print(f"RA_euc: {ra_euc['ra_euc_score']:.3f}")
print(f"RA_geo: {ra_geo['ra_geo_score']:.3f}")
print(f"RA_dif: {ra_dif['ra_dif_score']:.3f}")

CLI Usage

# Basic test
phate-metrics --knn 5 --t 6

# Dual-scale analysis
phate-metrics --dual-scale

# Euclidean metric
phate-metrics --metric euclidean

Metrics Explained

Semantic Affinity (SA)

Clustering quality in manifold space:

SA = 1 / (1 + CV)
where CV = std(distances) / mean(distances)
  • Range: [0, 1], higher = better clustering
  • No labels required

Relational Affinity (RA)

Directional alignment of relational vectors:

Statistical RA (word pairs):

  • RA_euc: Euclidean (flat space baseline)
  • RA_geo: Geodesic (k-NN graph shortest paths)
  • RA_dif: Diffusion (PHATE manifold)
  • Range: [-1, 1], higher = stronger alignment

Analogy RA (4-word test cases a:b::c:d):

  • RA_euc_analogy: Euclidean parallelogram
  • RA_geo_analogy: Geodesic parallelogram
  • Range: [0, 1], higher = stronger analogy

Parameters

Parameter Description Recommendation
knn k-Nearest neighbors 5-10 (start with 5)
t Diffusion time 1 (baseline), 6 (manifold)
metric Distance metric 'cosine' (normalized), 'euclidean'

Optional: Embedding Loaders

FastText

from phate_manifold_metrics.embeddings import load_fasttext_from_extracted

embeddings = load_fasttext_from_extracted(["cat", "dog"], lang='en')

LaBSE

from phate_manifold_metrics.embeddings import load_labse_embeddings

embeddings = load_labse_embeddings(["hello", "你好", "hola"])

Ollama

from phate_manifold_metrics.embeddings.ollama import get_ollama_embeddings_fixed

embeddings = get_ollama_embeddings_fixed(
    ["cat", "dog"],
    model_name="snowflake-arctic-embed2"
)

OpenRouter API

import os
from phate_manifold_metrics.embeddings.openrouter import load_openrouter_embeddings

os.environ['OPENROUTER_API_KEY'] = 'your-key'
embeddings = load_openrouter_embeddings(
    ["hello", "world"],
    model_path="qwen/qwen3-embedding-8b",
    model_name="Qwen3-8B"
)

Documentation

Full API documentation available in docstrings:

from phate_manifold_metrics import PhateManifoldMetrics
help(PhateManifoldMetrics)

Citation

@software{phate_manifold_metrics,
  title = {PHATE Manifold Metrics},
  author = {Digital Duck},
  year = {2026},
  url = {https://github.com/digital-duck/phate-manifold-metrics}
}

References

  • PHATE: Moon et al., Nature Biotechnology 2019
  • Diffusion Distance: Coifman & Lafon, Applied and Computational Harmonic Analysis 2006

License

MIT License - Copyright (c) 2026 Digital Duck

Authors

Digital Duck (Wen + Claude Sonnet 4.5 + Google Gemini 2.5)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phate_manifold_metrics-1.0.0.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phate_manifold_metrics-1.0.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file phate_manifold_metrics-1.0.0.tar.gz.

File metadata

  • Download URL: phate_manifold_metrics-1.0.0.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for phate_manifold_metrics-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2c4fe93470f51b6600039479a0c0adb3e43e21b0164d7bbdaea69d44dbb21154
MD5 2ccb6e023efc278fbf0164a45192bee8
BLAKE2b-256 36fe24f89b694c570d597bb19b1d3aa84e0d77f98766192790a51975574b05ca

See more details on using hashes here.

File details

Details for the file phate_manifold_metrics-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for phate_manifold_metrics-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 59b9f0735bb9d8f8a107d3e74bd1535417f5b5fc7ecdbacae3e7c6443231f4dd
MD5 827e3b3005ffcc68f4698c2a03d28cf8
BLAKE2b-256 7cc42f3d7bc2337887ded9f926c40cb6c136df8f03fddeca04762a8b1010dda8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page