Skip to main content

Trustworthy memory and fabrication detection for LLM companions

Project description

Mistikguard

Keep your LLM companion's memory honest.

Mistikguard is a small, dependency-light library that stops an LLM's fabrications from becoming permanent, trusted, defended memory. It is the extracted memory-integrity core of Mistik, a local-first AI companion, generalized for reuse.

Most "long-term memory" for LLM apps works like this: the model periodically summarizes the conversation and writes "facts" about the user into a store, which are then injected back into future prompts as established truth. The flaw is that the model's output is treated as ground truth the moment it is written. A hallucinated detail becomes indistinguishable from something the user actually said — recalled with the same confidence, defended with the same conviction, and reinforced every time it is injected.

Mistikguard governs that boundary. The model proposes; Mistikguard decides.


The benchmark

The grounding judge — the component that decides whether a memory-claim in a reply is actually supported — was measured on a 44-case adversarial benchmark (true recollections, outright fabrications, and deliberately hard borderline inferences):

Metric Value
Precision 1.000
Recall 0.909
F1 0.952
False-positive rate 0.000

The judge is deliberately safety-biased: on uncertainty or error it defaults to grounded, so it never raises a false alarm against a true statement. The price of that 0% false-positive rate is a handful of soft inferences it declines to flag — a trade chosen on purpose, because for a companion a missed soft inference costs one sentence, while a false alarm could make the system deny something real about the user.

The benchmark is reproducible: python tests/benchmark.py (needs an OpenAI-compatible API key). It is a measurement of the judge in isolation on a small constructed set, not a claim about end-to-end fabrication rates in production. Forty-four cases is modest; treat the figures as indicative.


What it does

Four cooperating pieces:

  1. Provenance. Every stored fact carries a source — confirmed (the user stated or ratified it) or inferred (the model generated it). Model writes default to inferred. This single distinction is the keystone everything else builds on.

  2. The write-gate. A deterministic check every model-proposed write must pass. It rejects self-narration (the assistant describing its own state), contradictions of confirmed fact, and previously-corrected material.

  3. Tombstones. When the user corrects something, it is removed and a tombstone is recorded. The gate consults tombstones so corrected material cannot be silently re-introduced later. A correction, once made, stays made.

  4. The grounding audit. After a reply, a cheap pattern detector finds memory-claims ("you mentioned…", "I remember that you…"); each is then checked by an LLM grounding judge against actual stored memory. Unsupported claims are surfaced — not silently rewritten — so a human stays the authority.

The deterministic pieces (provenance, gate, tombstones, the claim detector) have zero external dependencies — pure standard library. Only the grounding judge needs an LLM client.


Install

pip install mistikguard          # core only, no external dependencies
pip install mistikguard[llm]     # adds the OpenAI-compatible client for the judge

Usage

Governed facts with provenance:

from mistikguard.long_memory import LongTermMemory

mem = LongTermMemory(storage_path="./user_memory.json")

# A fact the user stated directly is trusted.
mem.add_fact("User lives in Berlin", source="confirmed")

# A fact the model inferred is held lightly.
mem.add_fact("User probably likes jazz", source="inferred")

# Corrections remove and tombstone — they don't just contradict.
mem.forget_fact("jazz")

print(mem.fact_texts())

The write-gate (deterministic, no API key needed):

from mistikguard import memory_gate as gate

# Configure once for your assistant and user.
gate.configure(assistant_name="Aria", user_name="Sarah",
               corrections_log_path="./corrections.json")

confirmed = ["User lives in Berlin"]

gate.gate_fact("User lives in Lisbon", confirmed)   # (False, 'contradicts confirmed: ...')
gate.gate_fact("Aria feels calm today", confirmed)  # (False, 'noise/self-narration')
gate.gate_fact("User enjoys hiking", confirmed)     # (True, 'ok')

Auditing a reply for fabricated memory-claims:

from mistikguard.memory_audit import audit_reply

reply = "Of course — I remember that time we went skydiving together!"
memory_texts = ["User lives in Berlin", "User has a dog named Pixel"]
recent = ["what should I do this weekend?"]

flagged = audit_reply(reply, memory_texts, recent)
# -> [{'phrase': 'i remember that', 'sentence': '... skydiving together!'}]
# Empty list means no fabricated memory-claims were detected.

For the LLM-backed judge directly:

from mistikguard.audit_judge import judge_claim
from openai import OpenAI

client = OpenAI(api_key="...", base_url="https://api.groq.com/openai/v1")
grounded, reason = judge_claim(
    client, "model-name",
    "I remember you said your sister is named Olia",
    memory_texts=["User's sister is named Olia"],
    recent_user_msgs=[],
)
# grounded == True

What it does not do

In the spirit of being honest about its limits:

  • It does not stop the underlying model from producing a false sentence in the moment. That is a property of the model, outside the reach of any surrounding structure. What Mistikguard guarantees is narrower and harder: that a false sentence does not become permanent, trusted, defended memory.
  • The claim detector is pattern-based. A fabrication phrased in a sufficiently novel way can slip the first stage. Coverage is an expanding approximation, not a complete solution; the upstream gate is the deeper net.
  • The judge borrows intelligence it does not own. The grounding decision is made by whatever LLM you point it at. Mistikguard's contribution is the structure and discipline around the model, not model capability.
  • It is not a content-safety or mental-health tool. It governs what the system treats as true about the user. It is not a clinical resource and not a substitute for human or professional support.

The honest description is the useful one: Mistikguard closes the specific failure that makes most companion memory untrustworthy — model fabrication quietly becoming durable fact — and is candid about everything it does not close.


Development

pip install -e ".[dev]"
python -m pytest                 # run the test suite
python tests/benchmark.py        # reproduce the benchmark (needs an API key)

Security & Limitations

Mistikguard is designed to improve memory integrity in LLM companions, but it is not a security tool and has important limitations you should understand.

Prompt Injection

The LLM-based components (audit_judge and corrections) send user messages and memory content into prompts. Like any system that uses LLMs this way, they are potentially vulnerable to prompt injection attacks.

An attacker could potentially craft input that influences:

  • What the grounding judge accepts or rejects
  • How corrections are interpreted and applied

Recommendation: If you are building a public-facing or high-stakes system, you should treat the output of Mistikguard as advisory rather than authoritative, or add additional guardrails at a higher level.

What Mistikguard Does NOT Protect Against

  • Prompt injection / jailbreaks against your main model
  • Context window overflow attacks
  • Malicious or adversarial user input in general
  • Data exfiltration through memory

What Mistikguard Helps With

  • Reducing accidental memory fabrication by the model
  • Making corrections more reliable through tombstones
  • Giving you visibility into when the model makes memory claims

General Advice

  • Always validate important facts with the user when possible
  • Consider using a smaller, cheaper, or more restricted model for the judge component
  • Do not rely on Mistikguard alone if memory accuracy is critical for safety or trust

We are transparent about these limitations because we believe honest documentation leads to better systems.

License

Apache License 2.0 — see LICENSE.

Mistikguard was originally developed as part of Mistik, a local-first AI companion. See NOTICE for attribution.

Status

Alpha (0.1.0). The core is extracted, generalized, tested, and benchmarked. APIs may change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistikguard-0.1.1.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mistikguard-0.1.1-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file mistikguard-0.1.1.tar.gz.

File metadata

  • Download URL: mistikguard-0.1.1.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mistikguard-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fb0ed62f0d804d8dc20c5306cb4e2200a72c972ffb6d5d562a75aecbcc4ad77e
MD5 6ae00aa4df2c13151ce2beabdf0d78e6
BLAKE2b-256 343a04a37dffb79d6d4d8421cf6d1814c2d973ff6f7ca3ff6907da47a9292dd9

See more details on using hashes here.

File details

Details for the file mistikguard-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mistikguard-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mistikguard-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 187e0f93927495f5048af5a0ec95ef8e7ffd10b99eb8a194c144c5a2bcb42fcf
MD5 044c77527be0a3bf50de1ffee464c7d0
BLAKE2b-256 cff8f402ba30b082f9816f6f8a0c4cde461efe752d20906177583c29afc75c3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page