Skip to main content

Trustworthy memory and fabrication detection for LLM companions

Project description

Mistikguard

Keep your LLM companion's memory honest.

Mistikguard is a small, dependency-light library that stops an LLM's fabrications from becoming permanent, trusted, defended memory. It is the extracted memory-integrity core of Mistik, a local-first AI companion, generalized for reuse.

Most "long-term memory" for LLM apps works like this: the model periodically summarizes the conversation and writes "facts" about the user into a store, which are then injected back into future prompts as established truth. The flaw is that the model's output is treated as ground truth the moment it is written. A hallucinated detail becomes indistinguishable from something the user actually said — recalled with the same confidence, defended with the same conviction, and reinforced every time it is injected.

Mistikguard governs that boundary. The model proposes; Mistikguard decides.


The benchmark

The grounding judge — the component that decides whether a memory-claim in a reply is actually supported — was measured on a 44-case adversarial benchmark (true recollections, outright fabrications, and deliberately hard borderline inferences):

Metric Value
Precision 1.000
Recall 0.909
F1 0.952
False-positive rate 0.000

The judge is deliberately safety-biased: on uncertainty or error it defaults to grounded, so it never raises a false alarm against a true statement. The price of that 0% false-positive rate is a handful of soft inferences it declines to flag — a trade chosen on purpose, because for a companion a missed soft inference costs one sentence, while a false alarm could make the system deny something real about the user.

The benchmark is reproducible: python tests/benchmark.py (needs an OpenAI-compatible API key). It is a measurement of the judge in isolation on a small constructed set, not a claim about end-to-end fabrication rates in production. Forty-four cases is modest; treat the figures as indicative.


What it does

Four cooperating pieces:

  1. Provenance. Every stored fact carries a source — confirmed (the user stated or ratified it) or inferred (the model generated it). Model writes default to inferred. This single distinction is the keystone everything else builds on.

  2. The write-gate. A deterministic check every model-proposed write must pass. It rejects self-narration (the assistant describing its own state), contradictions of confirmed fact, and previously-corrected material.

  3. Tombstones. When the user corrects something, it is removed and a tombstone is recorded. The gate consults tombstones so corrected material cannot be silently re-introduced later. A correction, once made, stays made.

  4. The grounding audit. After a reply, a cheap pattern detector finds memory-claims ("you mentioned…", "I remember that you…"); each is then checked by an LLM grounding judge against actual stored memory. Unsupported claims are surfaced — not silently rewritten — so a human stays the authority.

The deterministic pieces (provenance, gate, tombstones, the claim detector) have zero external dependencies — pure standard library. Only the grounding judge needs an LLM client.


Install

# Install from source (PyPI release coming soon):
pip install git+https://github.com/obscuraknight/mistikguard.git

# With the optional grounding-judge client (OpenAI-compatible):
pip install "git+https://github.com/obscuraknight/mistikguard.git#egg=mistikguard[llm]"

Usage

Governed facts with provenance:

from mistikguard.long_memory import LongTermMemory

mem = LongTermMemory(storage_path="./user_memory.json")

# A fact the user stated directly is trusted.
mem.add_fact("User lives in Berlin", source="confirmed")

# A fact the model inferred is held lightly.
mem.add_fact("User probably likes jazz", source="inferred")

# Corrections remove and tombstone — they don't just contradict.
mem.forget_fact("jazz")

print(mem.fact_texts())

The write-gate (deterministic, no API key needed):

from mistikguard import memory_gate as gate

# Configure once for your assistant and user.
gate.configure(assistant_name="Aria", user_name="Sarah",
               corrections_log_path="./corrections.json")

confirmed = ["User lives in Berlin"]

gate.gate_fact("User lives in Lisbon", confirmed)   # (False, 'contradicts confirmed: ...')
gate.gate_fact("Aria feels calm today", confirmed)  # (False, 'noise/self-narration')
gate.gate_fact("User enjoys hiking", confirmed)     # (True, 'ok')

Auditing a reply for fabricated memory-claims:

from mistikguard.memory_audit import audit_reply

reply = "Of course — I remember that time we went skydiving together!"
memory_texts = ["User lives in Berlin", "User has a dog named Pixel"]
recent = ["what should I do this weekend?"]

flagged = audit_reply(reply, memory_texts, recent)
# -> [{'phrase': 'i remember that', 'sentence': '... skydiving together!'}]
# Empty list means no fabricated memory-claims were detected.

For the LLM-backed judge directly:

from mistikguard.audit_judge import judge_claim
from openai import OpenAI

client = OpenAI(api_key="...", base_url="https://api.groq.com/openai/v1")
grounded, reason = judge_claim(
    client, "model-name",
    "I remember you said your sister is named Olia",
    memory_texts=["User's sister is named Olia"],
    recent_user_msgs=[],
)
# grounded == True

What it does not do

In the spirit of being honest about its limits:

  • It does not stop the underlying model from producing a false sentence in the moment. That is a property of the model, outside the reach of any surrounding structure. What Mistikguard guarantees is narrower and harder: that a false sentence does not become permanent, trusted, defended memory.
  • The claim detector is pattern-based. A fabrication phrased in a sufficiently novel way can slip the first stage. Coverage is an expanding approximation, not a complete solution; the upstream gate is the deeper net.
  • The judge borrows intelligence it does not own. The grounding decision is made by whatever LLM you point it at. Mistikguard's contribution is the structure and discipline around the model, not model capability.
  • It is not a content-safety or mental-health tool. It governs what the system treats as true about the user. It is not a clinical resource and not a substitute for human or professional support.

The honest description is the useful one: Mistikguard closes the specific failure that makes most companion memory untrustworthy — model fabrication quietly becoming durable fact — and is candid about everything it does not close.


Development

pip install -e ".[dev]"
python -m pytest                 # run the test suite
python tests/benchmark.py        # reproduce the benchmark (needs an API key)

Security & Limitations

Mistikguard is designed to improve memory integrity in LLM companions, but it is not a security tool and has important limitations you should understand.

Prompt Injection

The LLM-based components (audit_judge and corrections) send user messages and memory content into prompts. Like any system that uses LLMs this way, they are potentially vulnerable to prompt injection attacks.

An attacker could potentially craft input that influences:

  • What the grounding judge accepts or rejects
  • How corrections are interpreted and applied

Recommendation: If you are building a public-facing or high-stakes system, you should treat the output of Mistikguard as advisory rather than authoritative, or add additional guardrails at a higher level.

What Mistikguard Does NOT Protect Against

  • Prompt injection / jailbreaks against your main model
  • Context window overflow attacks
  • Malicious or adversarial user input in general
  • Data exfiltration through memory

What Mistikguard Helps With

  • Reducing accidental memory fabrication by the model
  • Making corrections more reliable through tombstones
  • Giving you visibility into when the model makes memory claims

General Advice

  • Always validate important facts with the user when possible
  • Consider using a smaller, cheaper, or more restricted model for the judge component
  • Do not rely on Mistikguard alone if memory accuracy is critical for safety or trust

We are transparent about these limitations because we believe honest documentation leads to better systems.

License

Apache License 2.0 — see LICENSE.

Mistikguard was originally developed as part of Mistik, a local-first AI companion. See NOTICE for attribution.

Status

Alpha (0.1.0). The core is extracted, generalized, tested, and benchmarked. APIs may change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistikguard-0.1.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mistikguard-0.1.0-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file mistikguard-0.1.0.tar.gz.

File metadata

  • Download URL: mistikguard-0.1.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mistikguard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9d752b86583dbf9afc71b25712aa26dce33a1d8ca241f9e9d3750fc1bb1c10b2
MD5 5c36fee4a3fc08ddbf2738b0077605cd
BLAKE2b-256 2d23e6ace310c58ecd5ee15149d989821d1fc87bfab429dc7a598565b8650a91

See more details on using hashes here.

File details

Details for the file mistikguard-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mistikguard-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mistikguard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7812bd658a6a8dae96492392ea7b183f2a9ca93505afe456a36e6df9787b957
MD5 c9f4415d81d57a5877a0674d62b1b2b8
BLAKE2b-256 2695fb1f5706d1ac242229a7f583a030b6fe8871255f04c534d86cdc83dbd7b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page