Semvec — patent-pending persistent semantic state engine

These details have not been verified by PyPI

Project links

Project description

Semvec

Patent-pending persistent semantic state engine with a Rust core and Python bindings.

Semvec replaces unbounded conversation history with a constant-size semantic state per dialogue. Input embeddings are blended into a patent-pending update equation that tracks phase, resonance, and drift over time — without growing with the number of turns. The result: fixed input cost per LLM call, regardless of how long the conversation has been running.

Patent-pending — EP 25 188 105.8 (CIP in progress) · novelty acknowledged

Why Semvec?
How it works
Installation
Choose your use case
Quickstart 1 — Core semantic state
Quickstart 2 — Token-reduced LLM context
Quickstart 3 — Drop-in chat proxy
Quickstart 4 — Multi-agent coordination (Cortex)
Quickstart 5 — Coding-agent compaction
Quickstart 6 — REST API server
Concepts you should know
Configuration & environment variables
Error handling
Licensing
Benchmarks
Limitations & non-goals
FAQ
Migration from pss
Telemetry
Support
License

Why Semvec?

Conversational LLM apps face a fundamental scaling problem: each new turn appends to the context, costs grow linearly with conversation length, and at some point you either truncate (forgetting things), summarise (losing fidelity), or hit a hard window limit.

The usual workarounds each have a sharp edge:

Approach	Problem
Sliding window	Forgets older context; bad on long projects
Recursive summarisation	Information loss compounds; fidelity decays each call
Vector RAG over full history	Retrieval cost grows; embedding store grows; quality depends on chunking
Larger context windows	Per-call cost grows linearly with conversation length

Semvec takes a different angle. The persistent semantic state (PSS) is a single fixed-size vector — typically 384 or 768 floats — that absorbs every input via a non-linear update equation. After 10 turns, after 10,000 turns, the state has the same size and the same per-call cost.

What grows with conversation length	What doesn't
Number of consolidated long-term memory clusters (capped)	The semantic state itself
Optional verbatim cache (capped)	Per-LLM-call input tokens
	Update latency per turn

Semvec is complementary to vector RAG, not a replacement: many users layer Semvec on top of an existing retrieval pipeline to compress the conversational signal while keeping document retrieval intact.

How it works

Per turn, SemvecState.update(embedding, text) runs the update equation:

S(t+1) = β(t) · S(t)  +  α(t) · normalise(embedding ⊕ memory_pull(t))

where every coefficient adapts to the current conversation:

β (beta) — adaptive momentum. Stable conversation → β rises, state freezes more. New topic → β drops, state absorbs aggressively.
memory_pull — attention-weighted retrieval over the three-tier memory.
phase detection — Markov × rule FSM emits one of six phases (initialization, exploration, convergence, resonance, stability, instability). Each phase shapes downstream behaviour.
drift anchors — register reference embeddings; if the state drifts too far, gradient realignment kicks in.
resonance triggers — keyword/embedding patterns that force the state to absorb specific inputs at full strength.

The memory tier promotes salient inputs from short-term (default 20 slots) to medium-term (100) to long-term (k-means++ consolidated clusters, up to 500). Eviction is selective forgetting — a composite score (0.4 · importance + 0.35 · recency + 0.25 · access_count) picks the victim, not FIFO.

Everything above runs in Rust behind a stable PyO3 boundary. Same wheel, multiple platforms, fixed-cost.

Diagnostic metrics on `SemvecState`

The Field Stability Metric, the metrics aggregator, and the advanced metrics block are exposed as methods on SemvecState:

state = SemvecState(config=SemvecConfig(dimension=384))

fsm     = state.calculate_fsm(state.norm_history)
metrics = state.calculate_metrics(
    state.norm_history,
    state.similarity_history,
    state.beta_history,
    state.memory,
    state.semantic_clusters,
)
advanced = state.calculate_advanced_metrics(
    metrics, state.semantic_clusters, state.phase_history,
)

SemvecState.to_dict() carries an integrity checksum over the serialised state; from_dict() rejects any snapshot without a valid checksum, so tampering with interaction_count / semantic_state / the rolling history arrays raises ValueError.

Installation

# Core only
pip install semvec

# With multi-agent coordination
pip install "semvec[cortex]"

# With coding-agent compaction (FastMCP server, Claude Code hooks)
pip install "semvec[coding]"

# REST API server
pip install "semvec[api]"
semvec serve --host 0.0.0.0 --port 8080

# Benchmark runners + optional Mem0 baseline
pip install "semvec[benchmarks,mem0]"

# Everything the developers use
pip install "semvec[cortex,coding,api,benchmarks,dev]"

One wheel covers Python 3.10 and newer via the stable ABI (abi3-py310). Pre-built wheels ship for Linux (x86_64 + aarch64), macOS (x86_64 + arm64), and Windows (x86_64).

Extra	Pulls in	When you need it
`[cortex]`	(marker only)	multi-agent coordination — Rust primitives are always available; the extra marks intent
`[coding]`	`fastmcp>=2.0`	MCP server + Claude Code lifecycle hooks
`[api]`	`fastapi`, `uvicorn[standard]`, `slowapi`, `sqlalchemy`, `prometheus-client`, `pydantic`	REST API server (`semvec serve`)
`[benchmarks]`	`sentence-transformers>=3.0`, `datasets>=2.14`, `psutil>=5.9`	running any harness under `semvec.benchmarks.longmemeval`
`[mem0]`	`mem0ai>=0.1`, `faiss-cpu>=1.7`	head-to-head Mem0 comparison
`[dev]`	`ruff`, `mypy`, `pre-commit`, `pytest`, `httpx`	contributing

Embedder requirement

Semvec refuses to silently fall back to hash-based pseudo-embeddings — you bring your own embedder. Any object exposing get_embedding(text) -> np.ndarray and get_dimension() -> int works.

pip install sentence-transformers

Choose the embedder dimension carefully — Semvec's retrieval quality is bounded by what the embedder can separate. Measured on 80 mixed-domain notes:

Embedder	dimension	precision@3	usable for
`all-MiniLM-L6-v2`	384	66.67 %	English-only, tight-domain prototypes only
`paraphrase-multilingual-mpnet-base-v2`	768	86.11 %	German / multilingual mixed-domain (recommended)

The 384-dim MiniLM is the easy pip install sentence-transformers default but on multilingual or domain-mixed text it confuses generic terms (e.g. "filter" → coffee filter vs. data filter) which Semvec then propagates 1:1 into retrieval. For German content, mixed-domain corpora, or anything where you need ≥ 80 % precision@3, use multilingual mpnet 768 d minimum.

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer(
    "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
)

Choose your use case

You want to…	Read
Compress conversation history for any LLM	Quickstart 2 — Token-reduced context
Drop-in replacement for `openai.chat.completions`	Quickstart 3 — Chat proxy
Coordinate many agents (analyst + planner + critic …)	Quickstart 4 — Cortex
Give Claude Code persistent memory across sessions	Quickstart 5 — Coding compaction
Run as a service, talk to it over HTTP	Quickstart 6 — REST API
Just understand the math first	Quickstart 1 — Core state

Quickstart 1 — Core semantic state

import numpy as np
from semvec import SemvecState, SemvecConfig

state = SemvecState(config=SemvecConfig(dimension=384))

# Feed embeddings (e.g. from sentence-transformers) and text.
# Semvec returns a constant-size metrics dict per turn.
for text, embedding in conversation:
    result = state.update(embedding, text)
    print(
        f"phase={result['phase']:14}  "
        f"similarity={result['similarity']:.3f}  "
        f"beta={result['beta']:.3f}  "
        f"norm={result['norm']:.3f}"
    )

# Serialise the full state (SHA-256-checksummed, JSON-safe).
checkpoint = state.to_dict()

# Later: restore without replaying the conversation.
restored = SemvecState.from_dict(checkpoint)

Every update returns these signals (also exposed as rolling histories on the state object):

Metric	What it means	Typical range
`similarity`	Cosine similarity of new input vs current state, before update	`[-1, 1]`
`beta` (β)	Adaptive momentum — `1` freezes the state, `0` replaces it	`[0.05, 0.95]`
`pattern_strength`	How strongly retrieved memories pull the state	`[0, ~1.5]`
`norm`	L2 norm of the new state after adaptive normalisation	`[0, 1.2]`
`phase`	One of six phases (see Concepts)	enum
`novelty_score`	How surprising the input is vs system-vector history	`[0, 1]`
`topic_switch`	Magnitude of detected topic switch (0 = none)	`[0, 1]`

Activating the USP — anchors, topic-switches, and Resonance

Quickstart 1 above is the passive vector-store path. The advanced features only kick in once you wire them up:

from semvec import SemvecState, SemvecConfig
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer(
    "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
)

def embed(text):
    return embedder.encode(text, normalize_embeddings=True)

state = SemvecState(config=SemvecConfig(
    dimension=768,
    enable_topic_switch=True,
    auto_anchor_on_topic_switch=True,   # opt-in (default off)
))

# 1) Drift anchors — bias retrieval toward your known domains.
for prototype in [
    "SAP Business One Service Layer OData REST API",
    "Python MCP Model Context Protocol Server",
    "italienische Kueche Kochen Pasta Pizza",
    "Kaffee Espresso Roesterei Brewing",
]:
    state.add_anchor(embed(prototype))

# 2) Resonance triggers — fire on keyword OR vector similarity.
trigger = state.create_resonance_trigger(
    keyword="security review",
    embedding=embed("security audit threat model"),
    threshold=0.7,
)

# 3) Ingest. detect_topic_switch fires automatically; with
#    auto_anchor_on_topic_switch=True each switch snapshots the
#    current semantic_state as a fresh anchor (capped at
#    max_auto_anchors).
for text, vec in conversation:
    state.update(vec, text)

# 4) Inspect what happened.
print(f"anchors registered: {state.anchor_count}")
print(f"topic-switch events: {len(state.topic_switch_history)}")
for ev in state.topic_switch_history[:5]:
    print(f"  ts={ev['timestamp']} mag={ev['magnitude']:.2f} "
          f"phase={ev['phase']} auto_anchored={ev['auto_anchored']}")

# 5) Retrieval is now anchor-biased: candidates aligned with one of
#    your domain anchors win the tie-break against generic phrases.
top = state.memory.get_relevant_memories(embed("OData filter syntax"), top_k=3)

What each piece adds (measured on mpnet 768 d, 80 mixed German notes):

Variant	precision@3
passive `update()` only	86.11 %
+ 4 domain anchors	91.67 % (+ 5.56 pp)
+ 4 resonance triggers	86.11 %
anchors + triggers	91.67 %

Without anchors, anchor_retrieval_boost is a no-op and you stay on the passive path — flipping these features on costs nothing if you do not need them.

Quickstart 2 — Token-reduced LLM context

Use the serializer to compress an arbitrarily long conversation into a 150–350-token system prompt:

from semvec.token_reduction import SemvecStateSerializer

serializer = SemvecStateSerializer()
context = serializer.serialize(state, query_text="what did we decide about auth?")
# context is a compact string suitable for the system prompt of any LLM.

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": context},
        {"role": "user",   "content": "what did we decide about auth?"},
    ],
)

Compared to the raw history, the compressed context does not grow with conversation length — your input cost converges to a constant.

Quickstart 3 — Drop-in chat proxy

SemvecChatProxy wraps any callable LLM behind PSS-compressed context and tracks both compressed and full-history token counts per turn:

from semvec.token_reduction import SemvecChatProxy, create_llm_client

llm = create_llm_client("openai")  # reads OPENAI_BASE_URL/MODEL/API_KEY from env
proxy = SemvecChatProxy(
    llm_call=llm,
    system_prompt="You are a helpful assistant.",
    embedding_service=my_embedder,   # required — see Installation
)

for question in ["summarise Q3", "compare with Q2", "what was the biggest miss?"]:
    result = proxy.chat(question)
    print(f"turn {result.turn_number} ({result.phase}): {result.response}")
    print(f"  compressed tokens: {result.tokens.compressed}")
    print(f"  full-history tokens: {result.tokens.full_history}")

print(proxy.get_summary())

Built-in clients: OpenAIClient (works with the OpenAI API and any compatible endpoint such as vLLM, LiteLLM, OpenRouter), OllamaClient. You can pass any callable (list[ChatMessage]) -> str.

Break-even is around ten turns. The compressed prompt carries a constant ~110-token header (phase prompt + state metrics + relevant memories). For very short conversations (≤ 5 turns) plain history concatenation is cheaper; from ~10 turns onward SemvecChatProxy undercuts naive concatenation, and the gap widens linearly with conversation length. Measured on a 48-turn run: ~76 % token reduction vs. full-history.

Quickstart 4 — Multi-agent coordination (Cortex)

from semvec.cortex import SemvecAgentNetwork, AttentionAggregation

# `dimension=` propagates to every agent's inner SemvecState.
# Use 768 for mpnet (recommended floor for German/multilingual);
# 384 for MiniLM-style embedders. Mismatched embeddings raise
# ValueError.
network = SemvecAgentNetwork(
    aggregation_strategy=AttentionAggregation(dimension=768),
    dimension=768,
)
network.add_local_instance("analyst")
network.add_local_instance("planner")

network.process_input("analyst", "quarterly revenue is up 23%")
network.process_input("planner", "we should redirect Q4 spend to retention")

state = network.get_network_state()
print(f"active agents: {state['active_instances']}/{state['total_instances']}")

# Pull per-agent feedback for the next turn (consensus-aware)
feedback = network.get_feedback_for_agent("analyst")

ConsensusEngine adds proposal voting with five levels (SIMPLE_MAJORITY, QUALIFIED_MAJORITY, UNANIMOUS, WEIGHTED_VOTE, ADAPTIVE_THRESHOLD). Quorum is measured against the registered voter pool, not just votes-cast-so-far. StateVectorPacket round-trips bit-exactly via serialize()/deserialize() and verify_integrity() confirms byte equality.

Aggregation strategies: WeightedAverageAggregation, AttentionAggregation. See docs/api/cortex.md for the full surface and SemvecCortexService (async-coroutine facade — register_agent is sync, the others are async).

Quickstart 5 — Coding-agent compaction

Compaction engine tuned for Claude Code / Cursor / Aider workflows — code-pointers, anti-resonance error patterns, prompt builder.

from semvec.coding import CodingEngine

engine = CodingEngine(state_dir="~/.semvec/project-x", embedder=my_embedder)
engine.ingest_transcript("path/to/claude_code_session.jsonl")

context = engine.get_compacted_context(
    "implement password reset flow",
    invariants=["never log plaintext passwords"],
)

Multi-session memory via `LiteralCache`

Below the high-level CodingEngine, state.literal_cache is a structured memory of design decisions, error patterns, invariants, and per-checkpoint test results. Use it directly when you want fine-grained control over what survives across sessions:

import semvec

state = semvec.SemvecState(semvec.SemvecConfig(dimension=768))
cache = state.literal_cache

# Track decisions, errors, invariants as the agent works
cache.record_decision("Use mpnet 768d for German content", checkpoint=1)
cache.record_error_pattern(
    pattern="catastrophic recency bias on blocked-domain ingest",
    example="500-note 4-domain blocked sequence",
    fix="raise long_term_size and use tier weights 1.0/0.95/0.9",
    checkpoint=1,
)
cache.add_invariant("State must round-trip via to_dict/from_dict")
cache.record_test_results(
    checkpoint=1,
    passed_tests=["test_a", "test_b", "test_c"],
    failed_tests=[],
)

# Build the LLM hand-off context for the next session
ctx = cache.build_handoff_context(next_checkpoint=2)
# ### INVARIANTS — Do NOT break these:
# - State must round-trip via to_dict/from_dict
#
# ### Test Status (CP1: 100%, 3/3)
#
# ### Known Error Patterns
# - `catastrophic recency bias on blocked-domain ingest` (x1): raise long_term_size...
#
# ### Design Decisions
# - [CP1] Use mpnet 768d for German content

# Persist the full cache (including decisions, error_patterns,
# invariants, test_history, code_structures) — the round-trip
# preserves everything `to_dict()` writes.
blob = state.to_bytes()
restored = semvec.SemvecState.from_bytes(blob)
assert restored.literal_cache.build_handoff_context(2) == ctx

build_handoff_context() produces a Markdown block ready for the system prompt of the next session — it is the multi-session-memory USP for coding agents. See docs/api/coding.md for the full LiteralCache API surface.

Claude Code integration (MCP + hooks)

Wire it directly into Claude Code via the bundled FastMCP server and two lifecycle hooks. Add to .claude/settings.json:

{
  "mcpServers": {
    "semvec": {
      "command": "python",
      "args": ["-m", "semvec.coding.mcp_server"],
      "env": {
        "SEMVEC_STATE_DIR": ".semvec",
        "SEMVEC_EMBED_MODEL": "all-MiniLM-L6-v2"
      }
    }
  },
  "hooks": {
    "PreCompact":  [{"command": "python -m semvec.coding.hooks.pre_compact",  "timeout": 30000}],
    "SessionStart":[{"command": "python -m semvec.coding.hooks.session_start", "timeout": 10000}]
  }
}

The MCP server exposes six tools — pss_get_context, pss_update, pss_check_anti_resonance, pss_register_code, pss_record_error, pss_save. FastMCP is installed automatically via the [coding] extra.

The same FastMCP server plugs into Cursor via .cursor/mcp.json plus a Cursor Rule that replaces Claude Code's lifecycle hooks. Full step-by-step in docs/guides/cursor.md.

Quickstart 6 — REST API server

pip install "semvec[api]"

# Dev mode — anonymous community-tier auth, in-memory SQLite
SEMVEC_ALLOW_ANONYMOUS=1 semvec serve --host 0.0.0.0 --port 8080

# Production — license JWT required, Postgres-backed metadata
export SEMVEC_LICENSE_KEY="eyJhbGciOiJFZERTQSI..."
export DATABASE_URL="postgresql://user:pw@host/semvec"
semvec serve --host 0.0.0.0 --port 8080

Talk HTTP:

# Health check (no auth)
curl http://localhost:8080/v1/health

# Single turn
curl -X POST http://localhost:8080/v1/run \
  -H "Authorization: Bearer $SEMVEC_LICENSE_KEY" \
  -H "Content-Type: application/json" \
  -d '{"session_id": "demo", "query": "what was the Q3 miss?"}'

# Retrieve compressed context
curl "http://localhost:8080/v1/state/context?session_id=demo&top_k=5" \
  -H "Authorization: Bearer $SEMVEC_LICENSE_KEY"

Endpoint groups: Layer 1 session CRUD + run/store/context, Layer 1b session-control (resonance triggers, drift anchors, isolation, export/import/verify), Layer 2 clusters, Layer 3 regions with consensus, Layer 4 global observer, Layer 5 network (delta-vector transfer, user partitioning, trust-based consensus), literal cache, Prometheus /metrics.

Auth is via Authorization: Bearer <jwt> or X-API-Key: <jwt> — same Ed25519-signed JWT as the in-process licensing system.

Concepts you should know

The 6 conversation phases

Phase detection is a hybrid Markov × rule-based FSM on top of the core metrics. Every update may emit a phase transition; downstream code can dispatch on it.

Phase	Emitted when
`initialization`	first N turns, state still warming up
`exploration`	high novelty + low resonance — new topic
`convergence`	similarity rising, β rising, memories aligning
`resonance`	stable alignment; most inputs short-circuit against memory
`stability`	long steady resonance without drift
`instability`	phase oscillation; drift anchors trigger realignment

Drift anchors and resonance triggers

Drift — when the semantic state migrates away from a set of reference embeddings. Measured as cosine against drift anchors registered via add_anchor(embedding). When the score falls below drift_threshold the state is gradually realigned (gradient over several turns, not a snap).
Anchor score — the mean cosine between the current state and all registered anchors.
Anchor retrieval boost — every get_relevant_memories candidate's score is multiplied by (1 + α · max_anchor_sim) where α is anchor_retrieval_boost (default 0.6) and max_anchor_sim is the largest non-negative cosine of the candidate to any registered anchor. With no anchors registered or α=0 this is a no-op. This is how add_anchor() changes retrieval ranking.

Choosing between anchors and triggers

Anchors and triggers solve different jobs and compose via max(), not addition. Quick heuristic:

Goal	Use
Bias retrieval toward known domains, prototype-style	Anchors (one per domain)
Boost memories on a specific keyword or hard-match phrase	Triggers (keyword)
Boost memories whose embeddings are near a reference point but the user has no specific keyword	Triggers (embedding + threshold)
Both anchor-style and keyword-style signals on the same workload	Anchors + Triggers — the kernel takes `max(α · anchor_sim, γ · trigger_strength)` per candidate, so redundant signals do not double-count

When in doubt: start with anchors only at α=0.6, then add triggers at γ=0.3 if you have a clear keyword/embedding cue separate from your anchor prototypes. Anchors are usually the dominant signal because they cover the full cosine range; triggers shine when you can spell out the cue exactly.

Tuning rule of thumb: keep anchor_retrieval_boost ≥ trigger_retrieval_boost, both in the [0.1, 0.6] range. Pushing either past 0.7 mostly stops moving the needle (the max() composition saturates at the strongest signal regardless), so spend your budget on choosing better anchor prototypes or sharper trigger thresholds rather than dialling the boosts higher.

Resonance trigger — a keyword or embedding that, when matched, forces β = beta_basis for that turn and pins importance = 1.0, making the state absorb the input aggressively.
Topic-switch detection — when enable_topic_switch=True the kernel watches consecutive-turn embedding similarity for sudden drops. Every detected switch is logged on state.topic_switch_history (always-on observability). With auto_anchor_on_topic_switch=True (opt-in) the current semantic_state is also snapshotted as a fresh anchor — useful when you want topic boundaries to feed back into retrieval.

Memory tiers

SemvecState.memory is a three-tier MultiResolutionMemory:

Tier	Default capacity	Promotion
Short-term	15 slots	every turn lands here
Medium-term	50 slots	promoted on access + importance
Long-term	200 slots	k-means++ consolidated clusters at 80% fill

Capacities are configurable via SemvecConfig(short_term_size=…, medium_term_size=…, long_term_size=…).

Retrieval, recency, and forgetting

get_relevant_memories(query, top_k) scores every candidate as cosine(query, memory) · tier_weight, optionally multiplied by a per-anchor boost (see Drift anchors).

Knob	Default	What it does
`short_term_weight`	`1.0`	scoring weight for the most recent tier
`medium_term_weight`	`0.95`	mid-term tier — almost flat with short-term
`long_term_weight`	`0.9`	long-term tier — kept competitive on purpose so older domains stay reachable
`cluster_fallback_threshold`	`0.85`	when the best long-term cluster centroid cosine is below this, the kernel scans long-term in full instead of the top-3 clusters — keeps older domain memories reachable
`anchor_retrieval_boost` (α)	`0.6`	each candidate score is multiplied by `(1 + α · max_anchor_sim)`; with no anchors registered this is a no-op
`trigger_retrieval_boost` (γ)	`0.3`	each candidate score is multiplied by `1 + γ` if any registered `ResonanceTrigger` matches (keyword substring in `text` OR cosine of `memory.embedding` to `trigger.trigger_embedding` ≥ `trigger.threshold`). When both anchor and trigger boosts apply to the same memory, the kernel takes `max(α · anchor_sim, γ · trigger_match)` (not the product) — anchors and triggers compete for the same boost slot rather than stacking, so redundant matches do not double-count and high-α/γ combinations cannot regress below baseline

Recency / forgetting behaviour you should know about:

No tier is purely FIFO. When a tier overflows, Semvec keeps memories with higher retention score (a mix of importance, recency, access count, and Fisher-protection — see unit_retention_score in bindings.rs). A frequently-accessed older memory survives over a never-accessed newer one.
Long-term consolidation merges. Once long-term hits 80% capacity, k-means++ clusters its members and merges each cluster into a single MemoryUnit whose text is the cluster members joined by |. Original texts are not preserved across consolidation — your post-consolidation lookup logic should split on | if it relies on the original strings.
Recency bias is mitigated, not eliminated. With the default tier weights (1.0 / 0.95 / 0.9), older domains stay competitive on mixed-domain workloads. On extreme blocked workloads (e.g. 500 memories in 4 domain blocks) precision@3 still favours the most-recently-inserted block — raise long_term_size (e.g. to 300) and use flatter weights (1.0 / 1.0 / 1.0) to restore cross-domain recall to ~100 %.
add_anchor() actively biases retrieval. Set 4 domain anchors before ingestion and the boost re-ranks candidates toward those domains; on a mixed-domain workload this lifts precision@3 from 86 % to 91.7 %.
auto_anchor_on_topic_switch=True snapshots semantic_state as a fresh anchor whenever detect_topic_switch fires (capped by max_auto_anchors, default 8). Off by default because it tends to capture per-turn noise rather than domain prototypes on real-world embeddings — flip it on if your domain genuinely has clean topic boundaries you want surfaced as anchors.
topic_switch_history (always on when enable_topic_switch=True) is a bounded list of {timestamp, magnitude, phase, auto_anchored} events — useful for diagnostics regardless of whether you opt in to auto-anchoring.

Persistence — JSON `to_dict()` and binary `to_bytes()`

state.to_dict() is a JSON-safe checkpoint with embedded SHA-256 checksum — best when the snapshot has to round-trip through systems that only speak JSON.

state.to_bytes(compress=True) is the compact binary equivalent (gzip-compressed JSON with a magic header + SHA-256 corruption check) — best for cold-storage checkpoints. state.to_bytes(compress=False) is the speed-optimised variant: same byte footprint as JSON, but kept as a self-describing binary blob with corruption check — best for hot-path persistence. Both paths preserve the full state on round-trip:

semantic state, all four rolling histories, three memory tiers
drift anchors and topic_switch_history
the complete LiteralCache: entities, decisions, error patterns, invariants, test history, code structures

Restore with SemvecState.from_bytes(blob); the version byte distinguishes the two to_bytes modes automatically.

Practical sizing on mpnet 768 d:

Memories	JSON	`to_bytes(compress=True)`	`to_bytes(compress=False)`
110 (small)	18 ms / 8.8 kB / memory	157 ms / 3.7 kB / memory	36 ms / 8.8 kB / memory
1 000 (extrapolated)	~ 0.2 s / 9 MB	~ 1.4 s / 3.7 MB	~ 0.3 s / 9 MB
100 000	~ 17 s / 1.7 GB	~ 2.5 min / 400 MB	~ 30 s / 1.7 GB

Pick the variant by use case:

Cold-storage checkpoint (occasional, durability matters) → compress=True. ~ 2.4× smaller than JSON; pay the gzip cost once.
Hot-path persistence (every-turn or per-request) → compress=False. Same size as JSON, only ~ 1.9× slower than json.dumps, but kept as a self-describing binary blob with SHA-256 corruption check.

For very large footprints (> 100 k memories) wrap your own NPZ/Parquet around the embedding payload to save another factor.

The "PSS / Semvec" naming

You will still see pss in some source paths, tests, and class deprecation aliases. Mental model:

PSS (Persistent Semantic State) — the algorithm. Originally published as the pss Python package.
Semvec (semantic vector) — the productised, Rust-backed library.

Legacy aliases (PSS_State_V4, PSSConfig, PSSChatProxy, …) remain importable with DeprecationWarning; scheduled for removal in 1.0. The pre-0.2.0 free _core.calculate_* functions follow the same path — kept for byte-identical pss-port behaviour, but the recommended path is the state-bound methods on SemvecState introduced in 0.2.0a1.

Configuration & environment variables

Variable	Default	Used by
`SEMVEC_LICENSE_KEY`	—	Pro/Enterprise gates; REST API auth
`SEMVEC_ALLOW_ANONYMOUS`	unset	REST API: bypass auth (dev only)
`SEMVEC_STATE_DIR`	`.semvec`	`CodingEngine` state persistence
`SEMVEC_EMBED_MODEL`	`all-MiniLM-L6-v2`	MCP server / hooks default embedder (consider overriding to `paraphrase-multilingual-mpnet-base-v2` for German/multilingual)
`SEMVEC_EMBED_DEVICE`	`cpu`	MCP server / hooks: `cpu` or `cuda`
`DATABASE_URL`	`sqlite:///semvec.db`	REST API persistence (also accepts `postgresql://…`)
`METRICS_USER` / `METRICS_PASSWORD`	—	Basic Auth on Prometheus `/metrics`
`OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL`	—	`OpenAIClient`
`OLLAMA_BASE_URL`, `OLLAMA_MODEL`	`http://localhost:11434`, —	`OllamaClient`

Error handling

import time
from semvec import RateLimitError, LicenseExpiredError, ConfigurationError

try:
    result = state.update(embedding, text)
except RateLimitError as e:
    # e.retry_after is a datetime.timedelta; e.upgrade_url is set
    time.sleep(e.retry_after.total_seconds())
    result = state.update(embedding, text)
except LicenseExpiredError as e:
    # Hard fail — re-import won't help. Renew at e.upgrade_url.
    logger.error("semvec license expired — renew at %s", e.upgrade_url)
    raise
except ConfigurationError as e:
    # Wrong dimension, missing embedder, malformed config, etc.
    raise

All semvec exceptions inherit from SemvecError. License-related exceptions (RateLimitError, LicenseExpiredError, LicenseError) inherit from LicenseError → SemvecError.

Licensing

Three tiers; Community works without a key, Pro and Enterprise require a signed Ed25519 JWT:

Tier	Rate limit	Backends	Retrieval modes
Community (no key)	5 QPS sustained / 50 burst	In-memory only	Base retrieval
Pro	200 / 2000 QPS	All	Extended
Enterprise	Unthrottled	All	All

JWTs are Ed25519-signed with a 30-day TTL. Expiry is a hard fail — the next gated call raises LicenseExpiredError with the renewal URL in the message. Rate-limit exhaustion raises RateLimitError with a retry_after (a datetime.timedelta) and the upgrade URL pointing to https://www.semvec.io.

export SEMVEC_LICENSE_KEY="eyJhbGciOiJFZERTQSI..."

Benchmarks

Semvec ships with a 500-entry LongMemEval harness (LLM-as-judge scoring) and a Mem0 head-to-head baseline:

pip install "semvec[benchmarks]"

python -m semvec.benchmarks.longmemeval \
    --variant S --multi-pss --temperature 0.0 \
    --per-type 10 --n-judges 3 \
    --output results/semvec_multipss.json

Flags: --variant, --local-file, --max-entries, --skip-entries, --per-type, --question-types, --temperature, --n-judges, --embed-model, --embed-device, --multi-pss. The harness writes JSON results compatible with the pss reference for direct comparison.

Limitations & non-goals

Honest list of what Semvec does not do:

Not a vector database. Long-term memory is bounded to 500 clusters by default; if you need recall over a million documents, run a dedicated vector store and treat Semvec as a conversational compressor on top.
Not a drop-in for stateless completion. The whole point is persistent state; if you only do single-shot prompts, you do not need Semvec.
No silent embedder fallback. If you do not pass an embedder, methods that need one raise a descriptive RuntimeError. This is intentional — silent hash-based fallbacks gave the original pss package surprising failure modes.
License gate is a licensing feature, not a hard security boundary. Use it to enforce your subscription tiers, not to keep determined adversaries out.
No mobile / WASM build today. abi3-py310 Linux/macOS/Windows only.
REST API persistence is metadata-only. Hot semantic state lives in-memory per process; only session/cluster/member/region/audit metadata is persisted. Plan accordingly for restarts.

FAQ

Q: Is this RAG? Not in the usual sense. RAG retrieves documents at query time. Semvec compresses the conversation itself into a fixed-size state. They compose well — many users run Semvec for conversational signal + a vector DB for document retrieval.

Q: Does the state ever grow? No, the semantic state vector itself is fixed-size. The associated memory tiers are bounded by the configured capacities (defaults 20 / 100 / 500 slots) — when full, the lowest-scoring entry is evicted, not the oldest.

Q: Can I run it offline / air-gapped? Yes for Community tier. Pro/Enterprise tiers verify Ed25519 JWT signatures locally — no network call to a license server at runtime. Contact support@versino.de for offline-issued JWTs with custom TTLs.

Q: How fast is it? Per-turn update() is sub-millisecond on a recent x86_64 CPU at dimension 384, dominated by NumPy/Rust matrix ops, not Python overhead. The whole point of the Rust port was to keep the math out of the GIL.

Q: Why is the source on a private GitHub repo? Compiled wheels are public on PyPI; the source is held closed for now.

Q: GPU support? Embedders run on whatever device you configure (cuda, mps, cpu); the Semvec core itself is CPU-only — the math is small enough that GPU offload would lose more in transfer than it gains.

Q: Is the patent granted? Patent-pending. EP 25 188 105.8, CIP in progress, novelty acknowledged.

Migration from `pss`

Most code needs only an import rewrite — from pss.X import Y → from semvec.X import Y. Two subsystems were renamed:

pss namespace	semvec namespace
`pss.network`	`semvec.cortex`
`pss.compaction`	`semvec.coding`

All class names inside these namespaces are unchanged except CompactionEngine → CodingEngine. Legacy class names (PSS_State_V4, PSSChatProxy, PSSStateSerializer) remain importable with DeprecationWarning — slated for removal in 0.2.0. The full import table and numerical-fidelity envelope are bundled in MIGRATION.md inside the source distribution; for a copy, email support@versino.de.

Telemetry

Semvec sends one anonymous init ping per Python process — and nothing else. No heartbeat, no per-call event, no inference data, no licensing JWT contents. Default-on; opt out with SEMVEC_TELEMETRY=0.

The ping contains:

the semvec version
a pseudonymous machine identifier (no IP, no hostname)
OS, architecture, Python version

The full schema and retention policy are documented at https://www.semvec.io/privacy.

Variable	Effect
(unset)	Telemetry is on, one ping on first import, stderr notice prints once
`SEMVEC_TELEMETRY=0`	Telemetry is off, no ping, no notice
`SEMVEC_TELEMETRY_QUIET=1`	Keep telemetry on but silence the stderr notice
`SEMVEC_TELEMETRY_ENDPOINT=https://your.host/init`	Route the ping to a self-hosted endpoint (air-gapped enterprise)

Support

Pricing & licensing: https://www.semvec.io
Pro / Enterprise support: support@versino.de (priority response)
Security disclosures: security@versino.de — please do not open public issues for vulnerabilities; coordinated disclosure with 48 h acknowledgement, fix-or-mitigation in 30 days for high-severity issues

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.6

May 5, 2026

0.5.5

May 4, 2026

0.5.4

May 3, 2026

0.5.3

May 3, 2026

0.5.2 yanked

May 3, 2026

0.5.1 yanked

May 3, 2026

0.5.0 yanked

May 3, 2026

0.4.5 yanked

May 2, 2026

0.4.4 yanked

May 2, 2026

0.4.3 yanked

May 2, 2026

0.4.2 yanked

May 1, 2026

0.4.1 yanked

May 1, 2026

This version

0.4.0 yanked

May 1, 2026

0.3.8 yanked

Apr 30, 2026

0.3.7 yanked

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

semvec-0.4.0-cp310-abi3-win_amd64.whl (1.2 MB view details)

Uploaded May 1, 2026 CPython 3.10+Windows x86-64

semvec-0.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded May 1, 2026 CPython 3.10+manylinux: glibc 2.17+ x86-64

semvec-0.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded May 1, 2026 CPython 3.10+manylinux: glibc 2.17+ ARM64

semvec-0.4.0-cp310-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded May 1, 2026 CPython 3.10+macOS 11.0+ ARM64

semvec-0.4.0-cp310-abi3-macosx_10_12_x86_64.whl (1.3 MB view details)

Uploaded May 1, 2026 CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file semvec-0.4.0-cp310-abi3-win_amd64.whl.

File metadata

Download URL: semvec-0.4.0-cp310-abi3-win_amd64.whl
Upload date: May 1, 2026
Size: 1.2 MB
Tags: CPython 3.10+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semvec-0.4.0-cp310-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`792e45839c7b44ed988627c91b11dd34627f3813cfac48ea75370bf77f3c41c3`
MD5	`ab59ef9492333923a03f2adf76c2c5b8`
BLAKE2b-256	`8471575afbcb3a64ec649a684c4712d05eed309e1f45410511ad1190fa45fad2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for semvec-0.4.0-cp310-abi3-win_amd64.whl:

Publisher: release.yml on MichaelNeuberger/semvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: semvec-0.4.0-cp310-abi3-win_amd64.whl
- Subject digest: 792e45839c7b44ed988627c91b11dd34627f3813cfac48ea75370bf77f3c41c3
- Sigstore transparency entry: 1417810170
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: MichaelNeuberger/semvec@1207fd3bfbbbb864672736174594a3fa5f583855
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/MichaelNeuberger
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1207fd3bfbbbb864672736174594a3fa5f583855
- Trigger Event: push

File details

Details for the file semvec-0.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: semvec-0.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 1, 2026
Size: 1.4 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semvec-0.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e2fb75ad1d2c37aa4fac8c7e4c7bea48cba589f5d97ba97dfbcc0842cc663b22`
MD5	`244d4316312d040d9297775a18013b27`
BLAKE2b-256	`13fabe4e03718f00fbb6fa58ff35c2a548ff6c09e65605aa816fb0e2dcfd1480`

See more details on using hashes here.

Provenance

The following attestation bundles were made for semvec-0.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on MichaelNeuberger/semvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: semvec-0.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: e2fb75ad1d2c37aa4fac8c7e4c7bea48cba589f5d97ba97dfbcc0842cc663b22
- Sigstore transparency entry: 1417810166
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: MichaelNeuberger/semvec@1207fd3bfbbbb864672736174594a3fa5f583855
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/MichaelNeuberger
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1207fd3bfbbbb864672736174594a3fa5f583855
- Trigger Event: push

File details

Details for the file semvec-0.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: semvec-0.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: May 1, 2026
Size: 1.3 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semvec-0.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`247a9bcbc86692a6100a3538085d62cbf770c23a89808576311f24fd6fbb4bc7`
MD5	`f334291ef431ae0061bf490c2d2450e2`
BLAKE2b-256	`502dd89337b004c134a36b0dc71aeccd418e0ab8e5dd8d6fed5ed05d3cb95bea`

See more details on using hashes here.

Provenance

The following attestation bundles were made for semvec-0.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on MichaelNeuberger/semvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: semvec-0.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Subject digest: 247a9bcbc86692a6100a3538085d62cbf770c23a89808576311f24fd6fbb4bc7
- Sigstore transparency entry: 1417810152
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: MichaelNeuberger/semvec@1207fd3bfbbbb864672736174594a3fa5f583855
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/MichaelNeuberger
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1207fd3bfbbbb864672736174594a3fa5f583855
- Trigger Event: push

File details

Details for the file semvec-0.4.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: semvec-0.4.0-cp310-abi3-macosx_11_0_arm64.whl
Upload date: May 1, 2026
Size: 1.2 MB
Tags: CPython 3.10+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semvec-0.4.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`192c083bc398303ca0da06536b489ff80404661995f20c1f4140653d94aa6627`
MD5	`d49156d3ed3a785e220f788ed43da24b`
BLAKE2b-256	`b8b10003ea6cabb719d8509d94eec1c23d0934129c4e5c0e57058692e8cc77c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for semvec-0.4.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on MichaelNeuberger/semvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: semvec-0.4.0-cp310-abi3-macosx_11_0_arm64.whl
- Subject digest: 192c083bc398303ca0da06536b489ff80404661995f20c1f4140653d94aa6627
- Sigstore transparency entry: 1417810162
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: MichaelNeuberger/semvec@1207fd3bfbbbb864672736174594a3fa5f583855
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/MichaelNeuberger
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1207fd3bfbbbb864672736174594a3fa5f583855
- Trigger Event: push

File details

Details for the file semvec-0.4.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: semvec-0.4.0-cp310-abi3-macosx_10_12_x86_64.whl
Upload date: May 1, 2026
Size: 1.3 MB
Tags: CPython 3.10+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semvec-0.4.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`45b5ddec787b8ab8437acfdd181a8a1f02d0c120aed7db1bfae0263c1b6ee668`
MD5	`9b07cb788d6df8916885d96f733e9fea`
BLAKE2b-256	`f7c34ae0b63ab39ee6bed5d953eedc13c604535c5bcbc846dd2eefe3e9756f63`

See more details on using hashes here.

Provenance

The following attestation bundles were made for semvec-0.4.0-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on MichaelNeuberger/semvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: semvec-0.4.0-cp310-abi3-macosx_10_12_x86_64.whl
- Subject digest: 45b5ddec787b8ab8437acfdd181a8a1f02d0c120aed7db1bfae0263c1b6ee668
- Sigstore transparency entry: 1417810155
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: MichaelNeuberger/semvec@1207fd3bfbbbb864672736174594a3fa5f583855
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/MichaelNeuberger
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1207fd3bfbbbb864672736174594a3fa5f583855
- Trigger Event: push

semvec 0.4.0

Navigation

Verified details

Project links

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Semvec

Table of contents

Why Semvec?

How it works

Diagnostic metrics on SemvecState

Installation

Embedder requirement

Choose your use case

Quickstart 1 — Core semantic state

Activating the USP — anchors, topic-switches, and Resonance

Quickstart 2 — Token-reduced LLM context

Quickstart 3 — Drop-in chat proxy

Quickstart 4 — Multi-agent coordination (Cortex)

Quickstart 5 — Coding-agent compaction

Multi-session memory via LiteralCache

Claude Code integration (MCP + hooks)

Quickstart 6 — REST API server

Concepts you should know

The 6 conversation phases

Drift anchors and resonance triggers

Choosing between anchors and triggers

Memory tiers

Retrieval, recency, and forgetting

Persistence — JSON to_dict() and binary to_bytes()

The "PSS / Semvec" naming

Configuration & environment variables

Error handling

Licensing

Benchmarks

Limitations & non-goals

FAQ

Migration from pss

Telemetry

Support

License

Project details

Verified details

Project links

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Diagnostic metrics on `SemvecState`

Multi-session memory via `LiteralCache`

Persistence — JSON `to_dict()` and binary `to_bytes()`

Migration from `pss`