Skip to main content

A calculus over embedding sequences: weighted folds, smoothers, differential operators, and trajectory distance.

Project description

embflow

A calculus over embedding sequences. Weight generators, smoothers, differential operators, motion statistics, null models, and trajectory distance for paths through R^d. Reference implementation of the calculus in the embedding-dynamics paper.

Install

pip install embflow

Runtime deps are numpy and scikit-learn only; the OpenAI and Ollama embedding backends are optional extras (pip install "embflow[openai]" / pip install "embflow[ollama]").

The lens convention (read this first)

Everything alpha-shaped in embflow uses one convention:

w(j, k) = alpha^(k-j). Higher alpha = longer memory. alpha -> 1 approaches the running mean; the half-life is log(0.5)/log(alpha) steps (ef.alpha_to_half_life(0.85) is about 4.3).

The exponential lens is derived, not chosen: the accumulation s_k = alpha*s_{k-1} + m_k*e_k is the Euler step of the overdamped ODE dx/dt = -lambda*x + f(t) with alpha = e^(-lambda*dt). The state is linear in R^d; unit normalization x_k = s_k/||s_k|| is a readout for cosine comparison, not part of the dynamics. Fitted on real conversations, adaptive alpha lands around 0.78 (ChatGPT) to 0.84 (Claude Code), and the ordering replicates across embedding models.

Quick start

import numpy as np
import embflow as ef

# A sequence of embedding vectors from any source.
vectors = np.random.randn(50, 256)

# Linear state and normalized readout (token counts as masses).
states = ef.leaky_state(vectors, alpha=0.85)          # raw dynamics
traj = ef.trajectory(vectors, alpha=0.85)             # unit readout
traj = ef.trajectory(vectors, 0.85, masses=np.ones(50))

# Fold with a weighted mean; weights compose with numpy *.
w = ef.exponential_weights(len(vectors), 0.85)
emb = ef.weighted_mean(vectors, w)

# Derivatives and motion statistics.
v = ef.velocity(traj)              # first differences
s = ef.speed(traj)                 # |velocity|
t = ef.turning_cosines(vectors)    # cos between consecutive velocities
sig = ef.motion_signature(vectors) # per-sequence "gait" dict
alpha = ef.adaptive_alpha(vectors) # fitted memory length

# Null-correct order statistics (composition vs order).
real, null, diff = ef.null_corrected(
    lambda E: ef.motion_signature(E, with_alpha=False), vectors
)

# Segment at changepoints; compare trajectories.
segments = ef.auto_segment(vectors, alpha=0.85)
dist = ef.trajectory_distance(traj, traj, method="dtw")
G = ef.velocity_gram(vectors)      # rotation/translation-invariant geometry

# Validate an embedding model against the prefix path.
conversations = [
    [{"role": "user", "content": "embeddings as paths"},
     {"role": "assistant", "content": "trajectories, lenses, motion"}],
    [{"role": "user", "content": "an unrelated topic"},
     {"role": "assistant", "content": "entirely different content"}],
]
emb_fn = ef.openai_embed_fn(cache_path="emb.sqlite")   # or ollama_embed_fn;
result = ef.prefix_experiment(conversations, emb_fn)   # gate + curves
# (openai_embed_fn needs `pip install openai` and OPENAI_API_KEY; any
#  callable (list[str]) -> (n, d) ndarray works as emb_fn.)

Why null models?

Composition alone induces structure in motion statistics: for exchangeable unit vectors the expected turning cosine is exactly -1/2 (independent of anisotropy), and message eccentricity creates positive speed autocorrelation even in shuffled sequences. Raw motion statistics conflate composition and order; ef.null_corrected separates them. Validated order effects on 1,768 real conversations (paired Cohen's d, real vs shuffled): speed -2.03, tortuosity +1.39, adaptive alpha -1.83.

Development

pip install -e ".[dev]"
pytest

License

MIT (see LICENSE). Citation metadata in CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embflow-0.3.0.tar.gz (40.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embflow-0.3.0-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file embflow-0.3.0.tar.gz.

File metadata

  • Download URL: embflow-0.3.0.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for embflow-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e4d8392e6b2a6853504584ecc1a6d58433e0233099ac331e2a1135e1d061a20d
MD5 804f3ce5edf904c852c9a7ef4ef93e80
BLAKE2b-256 60ccad6fa2c5e12c1b397a4ddb715ec1fa3593a7ebb11a39c1e6f6c124dd2b31

See more details on using hashes here.

File details

Details for the file embflow-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: embflow-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for embflow-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8adfb8cb550f72a1a841bde03c2afa2cf2af5d625ca0103799278aee37e250e
MD5 716537f402b0ce2aa20f67acf7d96ca3
BLAKE2b-256 ca09d91ac3f8f5ad6b28a334bab4591a6fc9e55ff60825e9c5dafe7087c7dfc7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page