Semvec — patent-pending persistent semantic state engine
Project description
Semvec
Constant-cost semantic memory for LLM agents — drop-in alternative to mem0, Letta, and LangChain Memory.
Semvec replaces unbounded conversation history with a fixed-size 384-d semantic state vector plus a tiered, content-aware memory. The cost of every LLM call stays constant — turn 10 and turn 10 000 carry the same input footprint — and the agent still has structured access to decisions, invariants, error patterns, and prior context across sessions.
pip install semvec
from semvec import SemvecState, SemvecConfig
from semvec.token_reduction import SemvecStateSerializer
state = SemvecState(config=SemvecConfig(dimension=768))
for text, embedding in conversation:
state.update(embedding, text)
context = SemvecStateSerializer().serialize(state, query_text="what did we decide?")
# `context` is a 150–350-token block — paste it into any LLM system prompt.
Patent-pending technology — EP 25 188 105.8 - novelty acknowledged
Architectural differences vs. mem0, Letta, LangChain Memory
This table compares architectural properties, not measured performance. The benchmarks below were run head-to-head against mem0 only.
| Property | semvec | mem0 | Letta (MemGPT) | LangChain Memory |
|---|---|---|---|---|
| Per-turn input footprint | O(1) — fixed-size state | O(retrieved records) | O(in-context blocks) | depends on class (buffer ≈ O(n); summary ≈ bounded) |
| LLM calls during ingest | 0 (deterministic EMA) | LLM fact-extraction per turn | LLM-managed page-in/out | varies (none for buffer/vector; LLM for summary classes) |
| Recall procedure | Deterministic (vector + literal cache) | LLM-extracted facts | LLM-managed swap | Deterministic retrieval (when vector-based) |
| Numeric / exact-value safety | Verbatim cache with Decimal |
Embedded → lossy | Embedded → lossy | Not addressed by the framework |
| Self-hosted / air-gap | Yes (proprietary, on-prem) | Yes (OSS) | Yes (OSS) | Yes (OSS) |
| Patent-pending core | EP 25 188 105.8 | — | — | — |
| Multi-agent coordination | Built-in (Cortex) | Manual | Manual | Manual |
→ Deep-dive comparisons: vs mem0 · vs Letta · vs LangChain Memory
Benchmarks (where we measured head-to-head)
- LongMemEval-S — vs. mem0 (gpt-oss-120b on H100): 42.8 % vs. mem0 36.2 % — +6.6 pp accuracy at 17× shorter wall-clock (2.77 h vs. 47.04 h). McNemar p = 0.020. Wins 4 of 6 question categories; strongest at
single-session-assistant(+34 pp) andtemporal-reasoning(+10.6 pp). - LongBench v2 — vs. stateless full-history baseline (503 turns over 100k-token conversations): 82× fewer input tokens per turn (627 vs. 51 292 mean). The stateless baseline collapses into confusion-loops past ~50 turns; semvec stays coherent.
- SlopCodeBench — vs. plain
continue-style coding agent (16 problems, Kimi K2,just-solveprompt): −20.1 % cost at higher pass rate (+5.5 pp) and lower cyclomatic complexity (−18.0 %).
We have not benchmarked against Letta or LangChain Memory directly; the comparison pages above describe the architectural differences, not measured performance gaps.
Table of contents
- What you get
- Installation
- Choose your use case
- Token-reduced LLM context
- Drop-in chat proxy
- Multi-agent coordination
- Coding-agent compaction
- REST API server
- Persistence
- Configuration & environment variables
- Error handling
- Licensing
- Limitations & non-goals
- FAQ
- Telemetry
- Support
- License
What you get
| Capability | What it solves |
|---|---|
| Constant-size compressed context | Per-call LLM input cost stops growing with conversation length. ~76 % token reduction on 48-turn runs. |
| Tiered memory with selective forgetting | Three tiers (short / medium / long term) with retention scoring — frequently-accessed older memories outlive never-touched newer ones. |
| Domain anchors + resonance triggers | Bias retrieval toward known domains or specific keywords without re-training. Lifts precision@3 from 86 % → 91.7 % on mixed-domain workloads. |
| Drop-in chat proxy | Wrap any OpenAI-compatible LLM and get compressed context for free. Works with vLLM, LiteLLM, OpenRouter, Ollama out of the box. |
| Multi-agent coordination (Cortex) | Run several agents that share an aggregated view, vote on proposals, and exchange checksummed state vectors. |
| Coding-agent compaction | Persistent memory across coding sessions — design decisions, invariants, error patterns, code-pointer index, anti-resonance checks. MCP server for Claude Code & Cursor included. |
| REST API server | semvec serve exposes the full surface over FastAPI: sessions, clusters, regions, observer, network, literal cache, Prometheus metrics. |
| Compliance pack | Append-only event store, deterministic replay, GDPR Art. 17 forget with signed certificates, HMAC request signing, RS256 user JWTs. |
| Bring-your-own embedder | Anything exposing get_embedding(text) → np.ndarray and get_dimension() → int works. SentenceTransformers, OpenAI, ONNX int8 — see the embedders guide. |
| One wheel, all platforms | Python 3.10–3.14 via stable ABI. Pre-built wheels for Linux glibc + Alpine musl (x86_64 + aarch64), macOS (x86_64 + arm64), Windows (x86_64). |
New in 0.5.0 — production-stable
| Surface | What's new |
|---|---|
| Per-memory provenance | state.update(emb, text, meta={"confidence": 0.9, "source": "kg"}) — Source / Confidence dicts now travel with every memory and survive every snapshot. The compliance event store carries them too, identically. |
| Retrieval policy filter | state.memory.get_relevant_memories(query, top_k, meta_filter=lambda u: u.meta["confidence"] >= 0.7) — Python predicate over the new per-memory meta runs after sort, before truncation. |
| Per-trigger weight | ResonanceTrigger(keyword="SAP", weight=5.0) — one trigger can outrank topic-default triggers; weight=0 silences boost without touching input-isolation. |
| Anti-resonance in standard retrieval | state.add_negative_attractor(error_vector, description=…) — was coding-only; now influences get_relevant_memories via a configurable score penalty. |
| DELETE / forget triggers async rebuild | Configure set_compliance_dependencies(rebuild_worker=…) — the running SemvecState no longer carries deleted memories until the next process restart. Same wiring on RetentionSweeper (rebuild_worker=, on_before_delete=, on_after_delete= hooks). |
| Ed25519 deletion certificates | sign_certificate / verify_certificate auto-detect RSA-PSS-SHA256 vs Ed25519 from the loaded key — 64-byte sigs on Ed25519, ~256-byte on RSA-3072. |
| Per-user embedding encryption | SqliteEventStore(encryption_seed=…) — opt-in AES-GCM with per-user HKDF-derived keys. Backup leak no longer hands an attacker the raw vectors for cosine re-identification. |
| Snapshot privacy toggles | to_dict(include_adaptive_params=False) joins the existing include_memory_text=False / include_literal_cache_text=False so a third-party-facing snapshot can be redacted in three independent dimensions. |
| musllinux wheels | pip install semvec now works on Alpine / k8s-slim / Lambda-custom-runtime without a compiler toolchain. |
[jwt] extra |
pip install "semvec[jwt]" for user-JWT issuance (verify-only path stays cryptography-only). |
Backwards-compatible — every 0.4.x call site keeps working untouched. See the new Correcting Memories guide for the full playbook.
Installation
# Core only
pip install semvec
# With multi-agent coordination
pip install "semvec[cortex]"
# With coding-agent compaction (FastMCP server, Claude Code hooks)
pip install "semvec[coding]"
# Compliance pack (event store, retention, DSGVO forget, HMAC, RS256)
pip install "semvec[compliance]"
# When you also want the FastAPI compliance routes + middleware:
pip install "semvec[api,compliance]"
# REST API server
pip install "semvec[api]"
semvec serve --host 0.0.0.0 --port 8080
# Benchmark runners + optional Mem0 baseline
pip install "semvec[benchmarks,mem0]"
# Everything the developers use
pip install "semvec[cortex,coding,api,benchmarks,dev]"
| Extra | Pulls in | When you need it |
|---|---|---|
[cortex] |
(marker only) | multi-agent coordination — primitives are always available; the extra marks intent |
[coding] |
fastmcp>=2.0 |
MCP server + Claude Code lifecycle hooks |
[compliance] |
cryptography>=42 |
Event store, retention sweeper, deletion certificate signer, HMAC + RS256 signing. FastAPI routes need [api] on top. See the Compliance guide. |
[api] |
fastapi, uvicorn[standard], slowapi, sqlalchemy, prometheus-client, pydantic |
REST API server (semvec serve) |
[benchmarks] |
sentence-transformers>=3.0, datasets>=2.14, psutil>=5.9 |
running the LongMemEval harness or other benchmarks |
[mem0] |
mem0ai>=0.1, faiss-cpu>=1.7 |
head-to-head Mem0 comparison |
[dev] |
ruff, mypy, pre-commit, pytest, httpx |
contributing |
Embedder requirement
Semvec is embedder-agnostic and refuses silent hash-based fallbacks — you bring your own. Any object exposing get_embedding(text) → np.ndarray and get_dimension() → int works.
pip install sentence-transformers
Choose the embedder dimension carefully — Semvec's retrieval quality is bounded by what the embedder can separate. Measured on 80 mixed-domain notes:
| Embedder | dimension | precision@3 | usable for |
|---|---|---|---|
all-MiniLM-L6-v2 |
384 | 66.67 % | English-only, tight-domain prototypes only |
paraphrase-multilingual-mpnet-base-v2 |
768 | 86.11 % | German / multilingual mixed-domain (recommended) |
The 384-dim MiniLM is the easy default but on multilingual or domain-mixed text it confuses generic terms (e.g. "filter" → coffee filter vs. data filter). For German content, mixed-domain corpora, or anything where you need ≥ 80 % precision@3, use multilingual mpnet 768 d minimum.
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer(
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
)
Choose your use case
| You want to… | Jump to |
|---|---|
| Compress conversation history for any LLM | Token-reduced LLM context |
Drop-in replacement for openai.chat.completions |
Drop-in chat proxy |
| Coordinate many agents (analyst + planner + critic …) | Multi-agent coordination |
| Give Claude Code / Cursor persistent memory across sessions | Coding-agent compaction |
| Run as a service, talk to it over HTTP | REST API server |
| Process regulated data (GDPR, audit, retention) | Compliance pack |
Token-reduced LLM context
The single most-used path: produce a compact system-prompt block from any conversation, regardless of length.
from semvec import SemvecState, SemvecConfig
from semvec.token_reduction import SemvecStateSerializer
state = SemvecState(config=SemvecConfig(dimension=768))
for text, embedding in conversation:
state.update(embedding, text)
serializer = SemvecStateSerializer()
context = serializer.serialize(state, query_text="what did we decide about auth?")
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": context},
{"role": "user", "content": "what did we decide about auth?"},
],
)
Compared to raw history concatenation, the compressed context does not grow with conversation length — input cost converges to a constant. The serializer fits prior context into a 150–350-token block sized for a system prompt.
Lift retrieval quality with anchors and triggers
The passive ingest above gives you retrieval that already beats sliding-window concatenation. To bias retrieval toward known domains or specific cues, register anchors and resonance triggers:
from semvec import SemvecState, SemvecConfig
state = SemvecState(config=SemvecConfig(
dimension=768,
enable_topic_switch=True,
auto_anchor_on_topic_switch=True, # opt-in (default off)
))
# Anchors — bias retrieval toward your known domains.
for prototype in [
"SAP Business One Service Layer OData REST API",
"Python MCP Model Context Protocol Server",
"italienische Kueche Kochen Pasta Pizza",
"Kaffee Espresso Roesterei Brewing",
]:
state.add_anchor(embed(prototype))
# Triggers — boost memories on a keyword OR vector match.
state.create_resonance_trigger(
keyword="security review",
embedding=embed("security audit threat model"),
threshold=0.7,
)
for text, vec in conversation:
state.update(vec, text)
# Retrieval is now anchor-biased: candidates aligned with one of
# your domain anchors win the tie-break against generic phrases.
top = state.memory.get_relevant_memories(embed("OData filter syntax"), top_k=3)
What each piece adds (measured on mpnet 768 d, 80 mixed German notes):
| Variant | precision@3 |
|---|---|
passive update() only |
86.11 % |
| + 4 domain anchors | 91.67 % (+ 5.56 pp) |
| + 4 resonance triggers | 86.11 % |
| anchors + triggers | 91.67 % |
Without anchors, the retrieval boost is a no-op — flipping these features on costs nothing if you do not need them. Anchors and triggers compete for the same boost slot (max(...), not addition), so redundant signals do not double-count.
Tuning rule of thumb: keep anchor_retrieval_boost ≥ trigger_retrieval_boost, both in the [0.1, 0.6] range. Pushing either past 0.7 mostly stops moving the needle — spend your budget on better anchor prototypes or sharper trigger thresholds rather than dialling the boosts higher.
Drop-in chat proxy
SemvecChatProxy wraps any callable LLM behind compressed context and tracks both compressed and full-history token counts per turn:
from semvec.token_reduction import SemvecChatProxy, create_llm_client
llm = create_llm_client("openai") # reads OPENAI_BASE_URL/MODEL/API_KEY from env
proxy = SemvecChatProxy(
llm_call=llm,
system_prompt="You are a helpful assistant.",
embedding_service=my_embedder,
)
for question in ["summarise Q3", "compare with Q2", "biggest miss?"]:
result = proxy.chat(question)
print(f"turn {result.turn_number}: {result.response}")
print(f" compressed tokens: {result.tokens.compressed}")
print(f" full-history tokens: {result.tokens.full_history}")
print(proxy.get_summary())
Built-in clients: OpenAIClient (works with the OpenAI API and any compatible endpoint such as vLLM, LiteLLM, OpenRouter), OllamaClient. You can pass any callable (list[ChatMessage]) -> str.
Break-even is around ten turns. The compressed prompt carries a constant ~110-token header. For very short conversations (≤ 5 turns) plain history concatenation is cheaper; from ~10 turns onward the proxy undercuts naive concatenation, and the gap widens linearly with conversation length. Measured on a 48-turn run: ~76 % token reduction vs. full-history.
Multi-agent coordination
Run several agents (analyst, planner, critic, …) that share an aggregated view, vote on proposals, and exchange checksummed state vectors.
from semvec.cortex import SemvecAgentNetwork, AttentionAggregation
network = SemvecAgentNetwork(
aggregation_strategy=AttentionAggregation(dimension=768),
dimension=768,
)
network.add_local_instance("analyst")
network.add_local_instance("planner")
network.process_input("analyst", "quarterly revenue is up 23%")
network.process_input("planner", "we should redirect Q4 spend to retention")
state = network.get_network_state()
print(f"active agents: {state['active_instances']}/{state['total_instances']}")
# Pull per-agent feedback for the next turn (consensus-aware)
feedback = network.get_feedback_for_agent("analyst")
Aggregation strategies: WeightedAverageAggregation, AttentionAggregation. ConsensusEngine adds proposal voting with five levels (SIMPLE_MAJORITY, QUALIFIED_MAJORITY, UNANIMOUS, WEIGHTED_VOTE, ADAPTIVE_THRESHOLD); quorum is measured against the registered voter pool, not just votes-cast-so-far. StateVectorPacket round-trips bit-exactly via serialize()/deserialize() and verify_integrity() confirms byte equality.
See the Cortex API reference for the full surface.
Coding-agent compaction
Persistent memory across coding sessions for Claude Code, Cursor, Aider — code pointers, anti-resonance error patterns, structured handoff context.
from semvec.coding import CodingEngine
engine = CodingEngine(state_dir="~/.semvec/project-x", embedder=my_embedder)
engine.ingest_transcript("path/to/claude_code_session.jsonl")
context = engine.get_compacted_context(
"implement password reset flow",
invariants=["never log plaintext passwords"],
)
Multi-session memory via LiteralCache
Below the high-level CodingEngine, state.literal_cache is a structured memory of design decisions, error patterns, invariants, and per-checkpoint test results — anything you want to survive across sessions verbatim:
import semvec
state = semvec.SemvecState(semvec.SemvecConfig(dimension=768))
cache = state.literal_cache
cache.record_decision("Use mpnet 768d for German content", checkpoint=1)
cache.record_error_pattern(
pattern="catastrophic recency bias on blocked-domain ingest",
example="500-note 4-domain blocked sequence",
fix="raise long_term_size and use tier weights 1.0/0.95/0.9",
checkpoint=1,
)
cache.add_invariant("State must round-trip via to_dict/from_dict")
cache.record_test_results(
checkpoint=1,
passed_tests=["test_a", "test_b", "test_c"],
failed_tests=[],
)
# Build the LLM hand-off context for the next session
ctx = cache.build_handoff_context(next_checkpoint=2)
# ### INVARIANTS — Do NOT break these:
# - State must round-trip via to_dict/from_dict
#
# ### Test Status (CP1: 100%, 3/3)
#
# ### Known Error Patterns
# - `catastrophic recency bias on blocked-domain ingest` (x1): raise long_term_size...
#
# ### Design Decisions
# - [CP1] Use mpnet 768d for German content
# Persist + restore — round-trip preserves decisions, error_patterns,
# invariants, test_history, code_structures.
blob = state.to_bytes()
restored = semvec.SemvecState.from_bytes(blob)
assert restored.literal_cache.build_handoff_context(2) == ctx
build_handoff_context() produces a Markdown block ready for the system prompt of the next session. See the Coding API reference for the full surface.
Claude Code integration (MCP + hooks)
Wire it directly into Claude Code via the bundled FastMCP server and two lifecycle hooks. Add to .claude/settings.json:
{
"mcpServers": {
"semvec": {
"command": "python",
"args": ["-m", "semvec.coding.mcp_server"],
"env": {
"SEMVEC_STATE_DIR": ".semvec",
"SEMVEC_EMBED_MODEL": "all-MiniLM-L6-v2"
}
}
},
"hooks": {
"PreCompact": [{"command": "python -m semvec.coding.hooks.pre_compact", "timeout": 30000}],
"SessionStart":[{"command": "python -m semvec.coding.hooks.session_start", "timeout": 10000}]
}
}
The MCP server exposes six tools — pss_get_context, pss_update, pss_check_anti_resonance, pss_register_code, pss_record_error, pss_save. FastMCP is installed automatically via the [coding] extra.
The same FastMCP server plugs into Cursor via .cursor/mcp.json plus a Cursor Rule that replaces Claude Code's lifecycle hooks. Full step-by-step in the Cursor guide.
REST API server
pip install "semvec[api]"
# Dev mode — anonymous community-tier auth, in-memory SQLite
SEMVEC_ALLOW_ANONYMOUS=1 semvec serve --host 0.0.0.0 --port 8080
# Production — license JWT required, Postgres-backed metadata
export SEMVEC_LICENSE_KEY="eyJhbGciOiJFZERTQSI..."
export DATABASE_URL="postgresql://user:pw@host/semvec"
semvec serve --host 0.0.0.0 --port 8080
Talk HTTP:
# Health check (no auth)
curl http://localhost:8080/v1/health
# Single turn
curl -X POST http://localhost:8080/v1/run \
-H "Authorization: Bearer $SEMVEC_LICENSE_KEY" \
-H "Content-Type: application/json" \
-d '{"session_id": "demo", "query": "what was the Q3 miss?"}'
# Retrieve compressed context
curl "http://localhost:8080/v1/state/context?session_id=demo&top_k=5" \
-H "Authorization: Bearer $SEMVEC_LICENSE_KEY"
Endpoint groups: sessions (CRUD + run/store/context), session-control (resonance triggers, anchors, isolation, export/import/verify), clusters, regions (consensus-driven realignment), global observer (anomaly detection across regions), network (state transfer, user partitioning, trust-based consensus), literal cache, Prometheus /metrics.
Auth is via Authorization: Bearer <jwt> or X-API-Key: <jwt> — same Ed25519-signed JWT as the in-process licensing system.
See the REST API reference for every endpoint and the CLI reference for semvec serve flags.
Persistence
state.to_dict() is a JSON-safe checkpoint with embedded SHA-256 checksum — best when the snapshot has to round-trip through systems that only speak JSON.
state.to_bytes(compress=True) is the compact binary equivalent (gzip-compressed JSON, magic header, SHA-256 corruption check) — best for cold-storage checkpoints. state.to_bytes(compress=False) is the speed-optimised variant: same byte footprint as JSON, but kept as a self-describing binary blob with corruption check — best for hot-path persistence. Both paths preserve the full state on round-trip:
- the semantic state and its rolling histories
- all three memory tiers
- domain anchors and topic-switch history
- the complete
LiteralCache: entities, decisions, error patterns, invariants, test history, code structures
Restore with SemvecState.from_bytes(blob); the version byte distinguishes the two to_bytes modes automatically.
Practical sizing on mpnet 768 d:
| Memories | JSON | to_bytes(compress=True) |
to_bytes(compress=False) |
|---|---|---|---|
| 110 (small) | 18 ms / 8.8 kB / memory | 157 ms / 3.7 kB / memory | 36 ms / 8.8 kB / memory |
| 1 000 (extrapolated) | ~ 0.2 s / 9 MB | ~ 1.4 s / 3.7 MB | ~ 0.3 s / 9 MB |
| 100 000 | ~ 17 s / 1.7 GB | ~ 2.5 min / 400 MB | ~ 30 s / 1.7 GB |
Pick the variant by use case:
- Cold-storage checkpoint (occasional, durability matters) →
compress=True. ~ 2.4× smaller than JSON; pay the gzip cost once. - Hot-path persistence (every-turn or per-request) →
compress=False. Same size as JSON, only ~ 1.9× slower thanjson.dumps, but kept as a self-describing binary blob with corruption check.
For very large footprints (> 100 k memories) wrap your own NPZ/Parquet around the embedding payload to save another factor.
Configuration & environment variables
| Variable | Default | Used by |
|---|---|---|
SEMVEC_LICENSE_KEY |
— | Pro/Enterprise gates; REST API auth |
SEMVEC_ALLOW_ANONYMOUS |
unset | REST API: bypass auth (dev only) |
SEMVEC_STATE_DIR |
.semvec |
CodingEngine state persistence |
SEMVEC_EMBED_MODEL |
all-MiniLM-L6-v2 |
MCP server / hooks default embedder (consider overriding to paraphrase-multilingual-mpnet-base-v2 for German/multilingual) |
SEMVEC_EMBED_DEVICE |
cpu |
MCP server / hooks: cpu or cuda |
DATABASE_URL |
sqlite:///semvec.db |
REST API persistence (also accepts postgresql://…) |
METRICS_USER / METRICS_PASSWORD |
— | Basic Auth on Prometheus /metrics |
OPENAI_BASE_URL, OPENAI_API_KEY, OPENAI_MODEL |
— | OpenAIClient |
OLLAMA_BASE_URL, OLLAMA_MODEL |
http://localhost:11434, — |
OllamaClient |
Error handling
import time
from semvec import RateLimitError, LicenseExpiredError, ConfigurationError
try:
result = state.update(embedding, text)
except RateLimitError as e:
# e.retry_after is a datetime.timedelta; e.upgrade_url is set
time.sleep(e.retry_after.total_seconds())
result = state.update(embedding, text)
except LicenseExpiredError as e:
# Hard fail — re-import won't help. Renew at e.upgrade_url.
logger.error("semvec license expired — renew at %s", e.upgrade_url)
raise
except ConfigurationError as e:
# Wrong dimension, missing embedder, malformed config, etc.
raise
All Semvec exceptions inherit from SemvecError. License-related exceptions (RateLimitError, LicenseExpiredError, LicenseError) inherit from LicenseError → SemvecError.
Licensing
Three tiers; Community works without a key, Pro and Enterprise require a signed Ed25519 JWT:
| Tier | Rate limit | Retrieval modes |
|---|---|---|
| Community (no key) | 5 QPS sustained / 50 burst | Base retrieval |
| Pro | 200 / 2000 QPS | Extended |
| Enterprise | Unthrottled | All |
JWTs have a 30-day TTL. Expiry is a hard fail — the next gated call raises LicenseExpiredError with the renewal URL in the message. Rate-limit exhaustion raises RateLimitError with a retry_after (a datetime.timedelta) and the upgrade URL.
export SEMVEC_LICENSE_KEY="eyJhbGciOiJFZERTQSI..."
Limitations & non-goals
Honest list of what Semvec does not do:
- Not a vector database. Long-term memory is bounded; if you need recall over a million documents, run a dedicated vector store and treat Semvec as a conversational compressor on top.
- Not a drop-in for stateless completion. The whole point is persistent state; if you only do single-shot prompts, you do not need Semvec.
- No silent embedder fallback. If you do not pass an embedder, methods that need one raise a descriptive
RuntimeError. Intentional — silent hash fallbacks gave surprising failure modes in earlier iterations. - License gate is a licensing feature, not a hard security boundary. Use it to enforce subscription tiers, not to keep determined adversaries out.
- No mobile / WASM build today.
abi3-py310Linux/macOS/Windows only. - REST API persistence is metadata-only. Hot semantic state lives in-memory per process; only session/cluster/member/region/audit metadata is persisted. Plan accordingly for restarts.
FAQ
Is this RAG? Not in the usual sense. RAG retrieves documents at query time. Semvec compresses the conversation itself into a fixed-size state. They compose well — many users run Semvec for conversational signal + a vector DB for document retrieval.
Does the state ever grow? No, the state vector itself is fixed-size. The associated memory tiers are bounded by configured capacities — when full, the lowest-scoring entry is evicted (not the oldest).
Can I run it offline / air-gapped? Yes for Community tier. Pro/Enterprise tiers verify Ed25519 JWT signatures locally — no network call to a license server at runtime. Contact support@versino.de for offline-issued JWTs with custom TTLs.
How fast is it? Per-turn update() is sub-millisecond on a recent x86_64 CPU at dimension 384, dominated by NumPy/Rust matrix ops, not Python overhead. The whole point of the Rust port was to keep the math out of the GIL.
Is the source available? Compiled wheels are public on PyPI; the Rust source is held closed. Source access for Enterprise terms — contact support@versino.de.
GPU support? Embedders run on whatever device you configure (cuda, mps, cpu); the Semvec core itself is CPU-only — the math is small enough that GPU offload would lose more in transfer than it gains.
Telemetry
Semvec sends one anonymous init ping per Python process — and nothing else. No heartbeat, no per-call event, no inference data, no licensing JWT contents. Default-on; opt out with SEMVEC_TELEMETRY=0.
The ping contains:
- the
semvecversion - a pseudonymous machine identifier (no IP, no hostname)
- OS, architecture, Python version
The full schema and retention policy are documented at https://www.semvec.io/privacy.
| Variable | Effect |
|---|---|
| (unset) | Telemetry is on, one ping on first import, stderr notice prints once |
SEMVEC_TELEMETRY=0 |
Telemetry is off, no ping, no notice |
SEMVEC_TELEMETRY_QUIET=1 |
Keep telemetry on but silence the stderr notice |
SEMVEC_TELEMETRY_ENDPOINT=https://your.host/init |
Route the ping to a self-hosted endpoint (air-gapped enterprise) |
Support
- Documentation: https://semvec-docs.pages.dev
- Pricing & licensing: https://www.semvec.io
- Pro / Enterprise support:
support@versino.de(priority response) - Security disclosures:
security@versino.de— please do not open public issues for vulnerabilities; coordinated disclosure with 48 h acknowledgement, fix-or-mitigation in 30 days for high-severity issues
License
Proprietary — all rights reserved. Commercial use requires a Pro or Enterprise license. The full license text ships inside the wheel as LICENSE; for procurement, see https://www.semvec.io.
Copyright © 2026 Michael Neuberger · Versino PsiOmega.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semvec-0.5.1-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: semvec-0.5.1-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19b2dcf6cf1e322d69698321695706295e05716478d901200c6246f0f33a9f43
|
|
| MD5 |
6a73c7fd79763e64643d5e6348104b12
|
|
| BLAKE2b-256 |
0869e329588be586d5fd2a224bd97262c4368c4786fb028fe7c1871d82ca8e40
|
Provenance
The following attestation bundles were made for semvec-0.5.1-cp310-abi3-win_amd64.whl:
Publisher:
release.yml on MichaelNeuberger/semvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semvec-0.5.1-cp310-abi3-win_amd64.whl -
Subject digest:
19b2dcf6cf1e322d69698321695706295e05716478d901200c6246f0f33a9f43 - Sigstore transparency entry: 1433432012
- Sigstore integration time:
-
Permalink:
MichaelNeuberger/semvec@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/MichaelNeuberger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file semvec-0.5.1-cp310-abi3-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: semvec-0.5.1-cp310-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.10+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22a82f8b6411167a623a2c13a99a49e356709cd22adf38f76be6aa71cc9eeecb
|
|
| MD5 |
06e2b8f28f8d952a4718a8eda49ee779
|
|
| BLAKE2b-256 |
c3d53b5149db4a20963d43003cc5ad8faced6def3de0417de6b970cf72dcea31
|
Provenance
The following attestation bundles were made for semvec-0.5.1-cp310-abi3-musllinux_1_2_x86_64.whl:
Publisher:
release.yml on MichaelNeuberger/semvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semvec-0.5.1-cp310-abi3-musllinux_1_2_x86_64.whl -
Subject digest:
22a82f8b6411167a623a2c13a99a49e356709cd22adf38f76be6aa71cc9eeecb - Sigstore transparency entry: 1433432298
- Sigstore integration time:
-
Permalink:
MichaelNeuberger/semvec@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/MichaelNeuberger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file semvec-0.5.1-cp310-abi3-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: semvec-0.5.1-cp310-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
421039fac5e5d5c2f42d9fd33bcb22cba600849df1708a4bdf36c0e4be244d20
|
|
| MD5 |
cbb21b8d66145ccb684b06f10451125a
|
|
| BLAKE2b-256 |
870b3325401b6770fc9f16c1fb581e6e5ee5bca8167a3e9f07b595dd95408f3a
|
Provenance
The following attestation bundles were made for semvec-0.5.1-cp310-abi3-musllinux_1_2_aarch64.whl:
Publisher:
release.yml on MichaelNeuberger/semvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semvec-0.5.1-cp310-abi3-musllinux_1_2_aarch64.whl -
Subject digest:
421039fac5e5d5c2f42d9fd33bcb22cba600849df1708a4bdf36c0e4be244d20 - Sigstore transparency entry: 1433431943
- Sigstore integration time:
-
Permalink:
MichaelNeuberger/semvec@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/MichaelNeuberger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file semvec-0.5.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: semvec-0.5.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
527a86cfa7fe87b5722865e4db5d06c25e69c2f53d7b9f8e2bffaa4934a3fd73
|
|
| MD5 |
14388c05541e6de5e492689f1f5dfc67
|
|
| BLAKE2b-256 |
0c8b6bd442f6eb846c4bfe07048cc6f0ded3b42e5e0dce31e85f41e4fb9913ea
|
Provenance
The following attestation bundles were made for semvec-0.5.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on MichaelNeuberger/semvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semvec-0.5.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
527a86cfa7fe87b5722865e4db5d06c25e69c2f53d7b9f8e2bffaa4934a3fd73 - Sigstore transparency entry: 1433432487
- Sigstore integration time:
-
Permalink:
MichaelNeuberger/semvec@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/MichaelNeuberger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file semvec-0.5.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: semvec-0.5.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c15b8281f53b53375e5270cabe179f3f069402ee3d38abc7350fc77474182c8f
|
|
| MD5 |
f1d068e6f8cae19bd9e2bcfc4ed76657
|
|
| BLAKE2b-256 |
b9c1db58666b09e43fbb4d34d9006785c9227faa323f1417c36aea93e0075f4f
|
Provenance
The following attestation bundles were made for semvec-0.5.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on MichaelNeuberger/semvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semvec-0.5.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
c15b8281f53b53375e5270cabe179f3f069402ee3d38abc7350fc77474182c8f - Sigstore transparency entry: 1433432383
- Sigstore integration time:
-
Permalink:
MichaelNeuberger/semvec@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/MichaelNeuberger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file semvec-0.5.1-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: semvec-0.5.1-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bc90fa161c370b42b5f39550407bd970435f4c77524ba786f6471884141bbfa
|
|
| MD5 |
40b39c936ec84891ca5211f14ed19191
|
|
| BLAKE2b-256 |
05f4439e86328d324704429b603b2b6fe5ba6daf8dc751fe159f160e83e55a5b
|
Provenance
The following attestation bundles were made for semvec-0.5.1-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on MichaelNeuberger/semvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semvec-0.5.1-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
2bc90fa161c370b42b5f39550407bd970435f4c77524ba786f6471884141bbfa - Sigstore transparency entry: 1433432344
- Sigstore integration time:
-
Permalink:
MichaelNeuberger/semvec@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/MichaelNeuberger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file semvec-0.5.1-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: semvec-0.5.1-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
517f386d2688414ebfe9b67fbc7cc245e7ecd1eeaa6de3f8fbd71319a3587b7c
|
|
| MD5 |
14d87a2c256c8c466c67d0e35b17200e
|
|
| BLAKE2b-256 |
6665b3003c3358add82be64dfe95151616b153169fdef34422d7abac510b59df
|
Provenance
The following attestation bundles were made for semvec-0.5.1-cp310-abi3-macosx_10_12_x86_64.whl:
Publisher:
release.yml on MichaelNeuberger/semvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semvec-0.5.1-cp310-abi3-macosx_10_12_x86_64.whl -
Subject digest:
517f386d2688414ebfe9b67fbc7cc245e7ecd1eeaa6de3f8fbd71319a3587b7c - Sigstore transparency entry: 1433432761
- Sigstore integration time:
-
Permalink:
MichaelNeuberger/semvec@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/MichaelNeuberger
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ad7df5ca8f41e47fe8d11eda5a47a08f1df44bb4 -
Trigger Event:
push
-
Statement type: