Trustworthy memory and fabrication detection for LLM companions
Project description
Mistikguard
Keep your LLM companion's memory honest.
Mistikguard is a small, dependency-light library that stops an LLM's fabrications from becoming permanent, trusted, defended memory. It is the extracted memory-integrity core of Mistik, a local-first AI companion, generalized for reuse.
Most "long-term memory" for LLM apps works like this: the model periodically summarizes the conversation and writes "facts" about the user into a store, which are then injected back into future prompts as established truth. The flaw is that the model's output is treated as ground truth the moment it is written. A hallucinated detail becomes indistinguishable from something the user actually said — recalled with the same confidence, defended with the same conviction, and reinforced every time it is injected.
Mistikguard governs that boundary. The model proposes; Mistikguard decides.
The benchmark
The grounding judge — the component that decides whether a memory-claim in a reply is actually supported — was measured on a 44-case adversarial benchmark (true recollections, outright fabrications, and deliberately hard borderline inferences):
| Metric | Value |
|---|---|
| Precision | 1.000 |
| Recall | 0.909 |
| F1 | 0.952 |
| False-positive rate | 0.000 |
The judge is deliberately safety-biased: on uncertainty or error it defaults to grounded, so it never raises a false alarm against a true statement. The price of that 0% false-positive rate is a handful of soft inferences it declines to flag — a trade chosen on purpose, because for a companion a missed soft inference costs one sentence, while a false alarm could make the system deny something real about the user.
The benchmark is reproducible: python tests/benchmark.py (needs an
OpenAI-compatible API key). It is a measurement of the judge in isolation on a
small constructed set, not a claim about end-to-end fabrication rates in
production. Forty-four cases is modest; treat the figures as indicative.
What it does
Four cooperating pieces:
-
Provenance. Every stored fact carries a source —
confirmed(the user stated or ratified it) orinferred(the model generated it). Model writes default toinferred. This single distinction is the keystone everything else builds on. -
The write-gate. A deterministic check every model-proposed write must pass. It rejects self-narration (the assistant describing its own state), contradictions of confirmed fact, and previously-corrected material.
-
Tombstones. When the user corrects something, it is removed and a tombstone is recorded. The gate consults tombstones so corrected material cannot be silently re-introduced later. A correction, once made, stays made.
-
The grounding audit. After a reply, a cheap pattern detector finds memory-claims ("you mentioned…", "I remember that you…"); each is then checked by an LLM grounding judge against actual stored memory. Unsupported claims are surfaced — not silently rewritten — so a human stays the authority.
The deterministic pieces (provenance, gate, tombstones, the claim detector) have zero external dependencies — pure standard library. Only the grounding judge needs an LLM client.
Install
pip install mistikguard # core only, no external dependencies
pip install mistikguard[llm] # adds the OpenAI-compatible client for the judge
Usage
Governed facts with provenance:
from mistikguard.long_memory import LongTermMemory
mem = LongTermMemory(storage_path="./user_memory.json")
# A fact the user stated directly is trusted.
mem.add_fact("User lives in Berlin", source="confirmed")
# A fact the model inferred is held lightly.
mem.add_fact("User probably likes jazz", source="inferred")
# Corrections remove and tombstone — they don't just contradict.
mem.forget_fact("jazz")
print(mem.fact_texts())
The write-gate (deterministic, no API key needed):
from mistikguard import memory_gate as gate
# Configure once for your assistant and user.
gate.configure(assistant_name="Aria", user_name="Sarah",
corrections_log_path="./corrections.json")
confirmed = ["User lives in Berlin"]
gate.gate_fact("User lives in Lisbon", confirmed) # (False, 'contradicts confirmed: ...')
gate.gate_fact("Aria feels calm today", confirmed) # (False, 'noise/self-narration')
gate.gate_fact("User enjoys hiking", confirmed) # (True, 'ok')
Auditing a reply for fabricated memory-claims:
from mistikguard.memory_audit import audit_reply
reply = "Of course — I remember that time we went skydiving together!"
memory_texts = ["User lives in Berlin", "User has a dog named Pixel"]
recent = ["what should I do this weekend?"]
flagged = audit_reply(reply, memory_texts, recent)
# -> [{'phrase': 'i remember that', 'sentence': '... skydiving together!'}]
# Empty list means no fabricated memory-claims were detected.
For the LLM-backed judge directly:
from mistikguard.audit_judge import judge_claim
from openai import OpenAI
client = OpenAI(api_key="...", base_url="https://api.groq.com/openai/v1")
grounded, reason = judge_claim(
client, "model-name",
"I remember you said your sister is named Olia",
memory_texts=["User's sister is named Olia"],
recent_user_msgs=[],
)
# grounded == True
What it does not do
In the spirit of being honest about its limits:
- It does not stop the underlying model from producing a false sentence in the moment. That is a property of the model, outside the reach of any surrounding structure. What Mistikguard guarantees is narrower and harder: that a false sentence does not become permanent, trusted, defended memory.
- The claim detector is pattern-based. A fabrication phrased in a sufficiently novel way can slip the first stage. Coverage is an expanding approximation, not a complete solution; the upstream gate is the deeper net.
- The judge borrows intelligence it does not own. The grounding decision is made by whatever LLM you point it at. Mistikguard's contribution is the structure and discipline around the model, not model capability.
- It is not a content-safety or mental-health tool. It governs what the system treats as true about the user. It is not a clinical resource and not a substitute for human or professional support.
The honest description is the useful one: Mistikguard closes the specific failure that makes most companion memory untrustworthy — model fabrication quietly becoming durable fact — and is candid about everything it does not close.
Development
pip install -e ".[dev]"
python -m pytest # run the test suite
python tests/benchmark.py # reproduce the benchmark (needs an API key)
Security & Limitations
Mistikguard is designed to improve memory integrity in LLM companions, but it is not a security tool and has important limitations you should understand.
Prompt Injection
The LLM-based components (audit_judge and corrections) send user messages and memory content into prompts. Like any system that uses LLMs this way, they are potentially vulnerable to prompt injection attacks.
An attacker could potentially craft input that influences:
- What the grounding judge accepts or rejects
- How corrections are interpreted and applied
Recommendation: If you are building a public-facing or high-stakes system, you should treat the output of Mistikguard as advisory rather than authoritative, or add additional guardrails at a higher level.
What Mistikguard Does NOT Protect Against
- Prompt injection / jailbreaks against your main model
- Context window overflow attacks
- Malicious or adversarial user input in general
- Data exfiltration through memory
What Mistikguard Helps With
- Reducing accidental memory fabrication by the model
- Making corrections more reliable through tombstones
- Giving you visibility into when the model makes memory claims
General Advice
- Always validate important facts with the user when possible
- Consider using a smaller, cheaper, or more restricted model for the judge component
- Do not rely on Mistikguard alone if memory accuracy is critical for safety or trust
We are transparent about these limitations because we believe honest documentation leads to better systems.
License
Apache License 2.0 — see LICENSE.
Mistikguard was originally developed as part of Mistik, a local-first AI companion. See NOTICE for attribution.
Status
Alpha (0.1.0). The core is extracted, generalized, tested, and benchmarked. APIs may change.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mistikguard-0.1.1.tar.gz.
File metadata
- Download URL: mistikguard-0.1.1.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb0ed62f0d804d8dc20c5306cb4e2200a72c972ffb6d5d562a75aecbcc4ad77e
|
|
| MD5 |
6ae00aa4df2c13151ce2beabdf0d78e6
|
|
| BLAKE2b-256 |
343a04a37dffb79d6d4d8421cf6d1814c2d973ff6f7ca3ff6907da47a9292dd9
|
File details
Details for the file mistikguard-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mistikguard-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
187e0f93927495f5048af5a0ec95ef8e7ffd10b99eb8a194c144c5a2bcb42fcf
|
|
| MD5 |
044c77527be0a3bf50de1ffee464c7d0
|
|
| BLAKE2b-256 |
cff8f402ba30b082f9816f6f8a0c4cde461efe752d20906177583c29afc75c3d
|