Skip to main content

Information geometry dynamics for knowledge fields on curved manifolds.

Project description

kaft

Geometric dynamics for knowledge fields on curved manifolds.

Standard embedding models project knowledge into flat Euclidean space which has every document equidistant, every domain indistinguishable. kaft replaces flat distance with Fisher-Rao information geometry, letting domain structure emerge from the manifold itself.

pip install kaft

What it does

kaft computes a K-density field over an embedding corpus using a pluggable Riemannian metric, then evolves that field through time using wave dynamics and Jordan boundary detection. Two things become measurable that flat geometry cannot see:

  1. Domain structure — K-density varies meaningfully across documents; Softmax (flat) returns uniform density
  2. Soliton locking — the field self-organises into a stable attractor; geometry and noise anneal together until the field holds

Quickstart

from kaft.ingest.router import ArxivRouter
from kaft.ingest.embedder import Embedder
from kaft.geometry import get as get_metric
from kaft.dynamics.kstate import KDensity
from kaft.dynamics.softmax_dynamics import SoftmaxDynamics

# 1. Ingest
router     = ArxivRouter()
records    = router.fetch("Fisher-Rao information geometry", max_results=20)

# 2. Embed + manifold
embedder   = Embedder()
embeddings = embedder.encode([r["text"] for r in records])
metric     = get_metric("fisher_rao")
kd         = KDensity(embeddings, metric)
kd.compute()

# 3. Flat baseline
flat       = SoftmaxDynamics(temperature=1.0).run(embeddings)

# 4. Compare
div = SoftmaxDynamics().distribution_divergence(kd.density, flat.K)
print(f"KAFT K_var  : {kd.density.var():.6f}")
print(f"Softmax K_var: {flat.K_var:.6f}")
print(f"Divergence  : {div:.4f}")

Output on a real 80-paper, 4-domain arXiv corpus:

KAFT (Fisher-Rao) Softmax (Euclidean)
K variance 0.037222 0.000000
Divergence 0.1802

Softmax returns K=1.000 for every document. Fisher-Rao geometry ranges 0.000–1.000, resolving domain boundaries invisible to flat distance.


Simulation dynamics

from kaft.dynamics.kstate import KDensity, KState
from kaft.dynamics.jordan import JordanBoundary, JordanNoise
from kaft.dynamics.metric_evolution import MetricEvolution
from kaft.simulate.crucible import SimulationCrucible
from kaft.simulate.runner import SimulationRunner
import numpy as np

noise          = JordanNoise(sigma=0.05, rng=np.random.default_rng(42))
metric_evolver = MetricEvolution(temperature=1.0, base_step=0.1)

ks = KState(); ks.update(kd.density)
jb = JordanBoundary(kd); jb.detect()

runner = SimulationRunner(
    embeddings=embeddings, records=records,
    k_density=kd, k_state=ks, jordan=jb,
    crucible=SimulationCrucible("demo", "fisher_rao", "output"),
    metric_evolver=metric_evolver,
    softmax=SoftmaxDynamics(temperature=1.0),
    noise=noise,
)

SIGMA_MIN = 0.005

for _ in range(50):
    frame = runner.step()
    if frame.phase == "locked":
        metric_evolver.base_step = 0.0       # freeze geometry
        noise.sigma              = SIGMA_MIN  # hold the field
    else:
        noise.sigma              = max(SIGMA_MIN, 0.05 * (1.0 - frame.K_mean))
        metric_evolver.base_step = max(0.001, 0.1  * (1.0 - frame.K_mean))

    print(f"step={frame.step:>3}  K_mean={frame.K_mean:.4f}"
          f"  phase={frame.phase:<12}  locked={frame.soliton_locked}")

Output on 80-paper corpus, 4 domains:

step=  0  K_mean=0.3887  phase=clustering   locked=False
step= 10  K_mean=0.5458  phase=clustering   locked=False
step= 20  K_mean=0.6004  phase=locked       locked=False
step= 24  K_mean=0.6153  phase=locked       locked=True
step= 30  K_mean=0.6153  phase=locked       locked=True
...
step= 90  K_mean=0.6153  phase=locked       locked=True

Soliton lock at step 24. Field holds identical K_mean=0.6153 from step 30 through 90 — the attractor is stable.

The adaptive schedule reflects the physics: soliton formation requires a frozen medium. Once the phase gates to locked, geometry stops evolving and the field settles.


Metric zoo

from kaft.geometry import get, list_metrics

list_metrics()
# ['fisher_rao', 'kl_divergence', 'jensen_shannon',
#  'alpha_divergence', 'euclidean', 'minkowski', 'alpha_hellinger',
#  'alpha_bhattacharyya', 'bregman_kl', 'bregman_euclidean', 
#  'bregman_itakura_saito', 'bregman_tsallis_2'
#  'gaussian_curved', 'wasserstein']

metric = get("jensen_shannon")   # drop-in replacement

All metrics implement AbstractMetric — swap one line to run the same experiment on any geometry.


Library structure

kaft/
├── geometry/     Riemannian metrics — Fisher-Rao, KL, Jensen-Shannon,
│                 Alpha, Bregman, Wasserstein, Euclidean, Minkowski
├── dynamics/     K-density field, KState, JordanBoundary, WaveEngine,
│                 MetricEvolution, GradientFlow, SoftmaxDynamics
├── simulate/     SimulationRunner, SimulationCrucible, Frame log,
│                 SimulationVisualiser
├── navigate/     GeodesicNavigator, GeodesicPath, DomainSeeder
└── ingest/       ArxivRouter, Embedder

kaft is domain-agnostic. The same primitives that resolve arXiv domain boundaries work on protein sequence corpora, patent claim graphs, or financial time series. Any corpus that embeds into a vector space.


Experiments

Full experiment scripts and the N-scaling study (soliton lock step as a function of corpus size) are in the GitHub repository.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaft-0.3.0.tar.gz (49.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaft-0.3.0-py3-none-any.whl (56.0 kB view details)

Uploaded Python 3

File details

Details for the file kaft-0.3.0.tar.gz.

File metadata

  • Download URL: kaft-0.3.0.tar.gz
  • Upload date:
  • Size: 49.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for kaft-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e2ae95ac456c545ab0faaa3c22205ea63754a01fa4b35d47f00b6659203f1477
MD5 9a72a2b2dfc680969f3297ba539fd02d
BLAKE2b-256 70a96a76abd241ff2b870ca1b7946d5abd61ca5e7b48e23a490f1d27ba7b6cd0

See more details on using hashes here.

File details

Details for the file kaft-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: kaft-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 56.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for kaft-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf1ff4973bd2cb1419698460a68418db5964bd8b76e594e16db8c943ac77a032
MD5 9cd8876a6324ebde77676a6e64c09302
BLAKE2b-256 4cedbf53b029e19e4d9dd7d3dad5fb7fd747c762b5162fc4fc228e4b6765ad8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page