Information geometry dynamics for knowledge fields on curved manifolds.
Project description
kaft
Geometric dynamics for knowledge fields on curved manifolds.
Standard embedding models project knowledge into flat Euclidean space which has every document equidistant, every domain indistinguishable. kaft replaces flat distance with Fisher-Rao information geometry, letting domain structure emerge from the manifold itself.
pip install kaft
What it does
kaft computes a K-density field over an embedding corpus using a pluggable Riemannian metric, then evolves that field through time using wave dynamics and Jordan boundary detection. Two things become measurable that flat geometry cannot see:
- Domain structure — K-density varies meaningfully across documents; Softmax (flat) returns uniform density
- Soliton locking — the field self-organises into a stable attractor; geometry and noise anneal together until the field holds
Quickstart
from kaft.ingest.router import ArxivRouter
from kaft.ingest.embedder import Embedder
from kaft.geometry import get as get_metric
from kaft.dynamics.kstate import KDensity
from kaft.dynamics.softmax_dynamics import SoftmaxDynamics
# 1. Ingest
router = ArxivRouter()
records = router.fetch("Fisher-Rao information geometry", max_results=20)
# 2. Embed + manifold
embedder = Embedder()
embeddings = embedder.encode([r["text"] for r in records])
metric = get_metric("fisher_rao")
kd = KDensity(embeddings, metric)
kd.compute()
# 3. Flat baseline
flat = SoftmaxDynamics(temperature=1.0).run(embeddings)
# 4. Compare
div = SoftmaxDynamics().distribution_divergence(kd.density, flat.K)
print(f"KAFT K_var : {kd.density.var():.6f}")
print(f"Softmax K_var: {flat.K_var:.6f}")
print(f"Divergence : {div:.4f}")
Output on a real 80-paper, 4-domain arXiv corpus:
| KAFT (Fisher-Rao) | Softmax (Euclidean) | |
|---|---|---|
| K variance | 0.037222 | 0.000000 |
| Divergence | 0.1802 | — |
Softmax returns K=1.000 for every document. Fisher-Rao geometry ranges 0.000–1.000, resolving domain boundaries invisible to flat distance.
Simulation dynamics
from kaft.dynamics.kstate import KDensity, KState
from kaft.dynamics.jordan import JordanBoundary, JordanNoise
from kaft.dynamics.metric_evolution import MetricEvolution
from kaft.simulate.crucible import SimulationCrucible
from kaft.simulate.runner import SimulationRunner
import numpy as np
noise = JordanNoise(sigma=0.05, rng=np.random.default_rng(42))
metric_evolver = MetricEvolution(temperature=1.0, base_step=0.1)
ks = KState(); ks.update(kd.density)
jb = JordanBoundary(kd); jb.detect()
runner = SimulationRunner(
embeddings=embeddings, records=records,
k_density=kd, k_state=ks, jordan=jb,
crucible=SimulationCrucible("demo", "fisher_rao", "output"),
metric_evolver=metric_evolver,
softmax=SoftmaxDynamics(temperature=1.0),
noise=noise,
)
SIGMA_MIN = 0.005
for _ in range(50):
frame = runner.step()
if frame.phase == "locked":
metric_evolver.base_step = 0.0 # freeze geometry
noise.sigma = SIGMA_MIN # hold the field
else:
noise.sigma = max(SIGMA_MIN, 0.05 * (1.0 - frame.K_mean))
metric_evolver.base_step = max(0.001, 0.1 * (1.0 - frame.K_mean))
print(f"step={frame.step:>3} K_mean={frame.K_mean:.4f}"
f" phase={frame.phase:<12} locked={frame.soliton_locked}")
Output on 80-paper corpus, 4 domains:
step= 0 K_mean=0.3887 phase=clustering locked=False
step= 10 K_mean=0.5458 phase=clustering locked=False
step= 20 K_mean=0.6004 phase=locked locked=False
step= 24 K_mean=0.6153 phase=locked locked=True
step= 30 K_mean=0.6153 phase=locked locked=True
...
step= 90 K_mean=0.6153 phase=locked locked=True
Soliton lock at step 24. Field holds identical K_mean=0.6153 from step 30 through 90 — the attractor is stable.
The adaptive schedule reflects the physics: soliton formation requires a frozen medium. Once the phase gates to locked, geometry stops evolving and the field settles.
Metric zoo
from kaft.geometry import get, list_metrics
list_metrics()
# ['fisher_rao', 'kl_divergence', 'jensen_shannon',
# 'alpha_divergence', 'euclidean', 'minkowski', 'alpha_hellinger',
# 'alpha_bhattacharyya', 'bregman_kl', 'bregman_euclidean',
# 'bregman_itakura_saito', 'bregman_tsallis_2'
# 'gaussian_curved', 'wasserstein']
metric = get("jensen_shannon") # drop-in replacement
All metrics implement AbstractMetric — swap one line to run the same experiment on any geometry.
Library structure
kaft/
├── geometry/ Riemannian metrics — Fisher-Rao, KL, Jensen-Shannon,
│ Alpha, Bregman, Wasserstein, Euclidean, Minkowski
├── dynamics/ K-density field, KState, JordanBoundary, WaveEngine,
│ MetricEvolution, GradientFlow, SoftmaxDynamics
├── simulate/ SimulationRunner, SimulationCrucible, Frame log,
│ SimulationVisualiser
├── navigate/ GeodesicNavigator, GeodesicPath, DomainSeeder
└── ingest/ ArxivRouter, Embedder
kaft is domain-agnostic. The same primitives that resolve arXiv domain boundaries work on protein sequence corpora, patent claim graphs, or financial time series. Any corpus that embeds into a vector space.
Experiments
Full experiment scripts and the N-scaling study (soliton lock step as a function of corpus size) are in the GitHub repository.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaft-0.3.0.tar.gz.
File metadata
- Download URL: kaft-0.3.0.tar.gz
- Upload date:
- Size: 49.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2ae95ac456c545ab0faaa3c22205ea63754a01fa4b35d47f00b6659203f1477
|
|
| MD5 |
9a72a2b2dfc680969f3297ba539fd02d
|
|
| BLAKE2b-256 |
70a96a76abd241ff2b870ca1b7946d5abd61ca5e7b48e23a490f1d27ba7b6cd0
|
File details
Details for the file kaft-0.3.0-py3-none-any.whl.
File metadata
- Download URL: kaft-0.3.0-py3-none-any.whl
- Upload date:
- Size: 56.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf1ff4973bd2cb1419698460a68418db5964bd8b76e594e16db8c943ac77a032
|
|
| MD5 |
9cd8876a6324ebde77676a6e64c09302
|
|
| BLAKE2b-256 |
4cedbf53b029e19e4d9dd7d3dad5fb7fd747c762b5162fc4fc228e4b6765ad8f
|