Trustworthy memory and fabrication detection for LLM companions

These details have not been verified by PyPI

Project links

Project description

Mistikguard

Keep your LLM companion's memory honest.

Mistikguard is a small, dependency-light library that stops an LLM's fabrications from becoming permanent, trusted, defended memory. It is the extracted memory-integrity core of Mistik, a local-first AI companion, generalized for reuse.

Most "long-term memory" for LLM apps works like this: the model periodically summarizes the conversation and writes "facts" about the user into a store, which are then injected back into future prompts as established truth. The flaw is that the model's output is treated as ground truth the moment it is written. A hallucinated detail becomes indistinguishable from something the user actually said — recalled with the same confidence, defended with the same conviction, and reinforced every time it is injected.

Mistikguard governs that boundary. The model proposes; Mistikguard decides.

The benchmark

The grounding judge — the component that decides whether a memory-claim in a reply is actually supported — was measured on a 44-case adversarial benchmark (true recollections, outright fabrications, and deliberately hard borderline inferences):

Metric	Value
Precision	1.000
Recall	0.909
F1	0.952
False-positive rate	0.000

The judge is deliberately safety-biased: on uncertainty or error it defaults to grounded, so it never raises a false alarm against a true statement. The price of that 0% false-positive rate is a handful of soft inferences it declines to flag — a trade chosen on purpose, because for a companion a missed soft inference costs one sentence, while a false alarm could make the system deny something real about the user.

The benchmark is reproducible: python tests/benchmark.py (needs an OpenAI-compatible API key). It is a measurement of the judge in isolation on a small constructed set, not a claim about end-to-end fabrication rates in production. Forty-four cases is modest; treat the figures as indicative.

What it does

Four cooperating pieces:

Provenance. Every stored fact carries a source — confirmed (the user stated or ratified it) or inferred (the model generated it). Model writes default to inferred. This single distinction is the keystone everything else builds on.
The write-gate. A deterministic check every model-proposed write must pass. It rejects self-narration (the assistant describing its own state), contradictions of confirmed fact, and previously-corrected material.
Tombstones. When the user corrects something, it is removed and a tombstone is recorded. The gate consults tombstones so corrected material cannot be silently re-introduced later. A correction, once made, stays made.
The grounding audit. After a reply, a cheap pattern detector finds memory-claims ("you mentioned…", "I remember that you…"); each is then checked by an LLM grounding judge against actual stored memory. Unsupported claims are surfaced — not silently rewritten — so a human stays the authority.

The deterministic pieces (provenance, gate, tombstones, the claim detector) have zero external dependencies — pure standard library. Only the grounding judge needs an LLM client.

Install

pip install mistikguard          # core only, no external dependencies
pip install mistikguard[llm]     # adds the OpenAI-compatible client for the judge

Usage

Governed facts with provenance:

from mistikguard.long_memory import LongTermMemory

mem = LongTermMemory(storage_path="./user_memory.json")

# A fact the user stated directly is trusted.
mem.add_fact("User lives in Berlin", source="confirmed")

# A fact the model inferred is held lightly.
mem.add_fact("User probably likes jazz", source="inferred")

# Corrections remove and tombstone — they don't just contradict.
mem.forget_fact("jazz")

print(mem.fact_texts())

The write-gate (deterministic, no API key needed):

from mistikguard import memory_gate as gate

# Configure once for your assistant and user.
gate.configure(assistant_name="Aria", user_name="Sarah",
               corrections_log_path="./corrections.json")

confirmed = ["User lives in Berlin"]

gate.gate_fact("User lives in Lisbon", confirmed)   # (False, 'contradicts confirmed: ...')
gate.gate_fact("Aria feels calm today", confirmed)  # (False, 'noise/self-narration')
gate.gate_fact("User enjoys hiking", confirmed)     # (True, 'ok')

Auditing a reply for fabricated memory-claims:

from mistikguard.memory_audit import audit_reply

reply = "Of course — I remember that time we went skydiving together!"
memory_texts = ["User lives in Berlin", "User has a dog named Pixel"]
recent = ["what should I do this weekend?"]

flagged = audit_reply(reply, memory_texts, recent)
# -> [{'phrase': 'i remember that', 'sentence': '... skydiving together!'}]
# Empty list means no fabricated memory-claims were detected.

For the LLM-backed judge directly:

from mistikguard.audit_judge import judge_claim
from openai import OpenAI

client = OpenAI(api_key="...", base_url="https://api.groq.com/openai/v1")
grounded, reason = judge_claim(
    client, "model-name",
    "I remember you said your sister is named Olia",
    memory_texts=["User's sister is named Olia"],
    recent_user_msgs=[],
)
# grounded == True

What it does not do

In the spirit of being honest about its limits:

It does not stop the underlying model from producing a false sentence in the moment. That is a property of the model, outside the reach of any surrounding structure. What Mistikguard guarantees is narrower and harder: that a false sentence does not become permanent, trusted, defended memory.
The claim detector is pattern-based. A fabrication phrased in a sufficiently novel way can slip the first stage. Coverage is an expanding approximation, not a complete solution; the upstream gate is the deeper net.
The judge borrows intelligence it does not own. The grounding decision is made by whatever LLM you point it at. Mistikguard's contribution is the structure and discipline around the model, not model capability.
It is not a content-safety or mental-health tool. It governs what the system treats as true about the user. It is not a clinical resource and not a substitute for human or professional support.

The honest description is the useful one: Mistikguard closes the specific failure that makes most companion memory untrustworthy — model fabrication quietly becoming durable fact — and is candid about everything it does not close.

Development

pip install -e ".[dev]"
python -m pytest                 # run the test suite
python tests/benchmark.py        # reproduce the benchmark (needs an API key)

Security & Limitations

Mistikguard is designed to improve memory integrity in LLM companions, but it is not a security tool and has important limitations you should understand.

Prompt Injection

The LLM-based components (audit_judge and corrections) send user messages and memory content into prompts. Like any system that uses LLMs this way, they are potentially vulnerable to prompt injection attacks.

An attacker could potentially craft input that influences:

What the grounding judge accepts or rejects
How corrections are interpreted and applied

Recommendation: If you are building a public-facing or high-stakes system, you should treat the output of Mistikguard as advisory rather than authoritative, or add additional guardrails at a higher level.

What Mistikguard Does NOT Protect Against

Prompt injection / jailbreaks against your main model
Context window overflow attacks
Malicious or adversarial user input in general
Data exfiltration through memory

What Mistikguard Helps With

Reducing accidental memory fabrication by the model
Making corrections more reliable through tombstones
Giving you visibility into when the model makes memory claims

General Advice

Always validate important facts with the user when possible
Consider using a smaller, cheaper, or more restricted model for the judge component
Do not rely on Mistikguard alone if memory accuracy is critical for safety or trust

We are transparent about these limitations because we believe honest documentation leads to better systems.

License

Apache License 2.0 — see LICENSE.

Mistikguard was originally developed as part of Mistik, a local-first AI companion. See NOTICE for attribution.

Status

Alpha (0.1.0). The core is extracted, generalized, tested, and benchmarked. APIs may change.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 27, 2026

0.1.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistikguard-0.1.1.tar.gz (26.3 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mistikguard-0.1.1-py3-none-any.whl (23.7 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file mistikguard-0.1.1.tar.gz.

File metadata

Download URL: mistikguard-0.1.1.tar.gz
Upload date: Jun 27, 2026
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mistikguard-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`fb0ed62f0d804d8dc20c5306cb4e2200a72c972ffb6d5d562a75aecbcc4ad77e`
MD5	`6ae00aa4df2c13151ce2beabdf0d78e6`
BLAKE2b-256	`343a04a37dffb79d6d4d8421cf6d1814c2d973ff6f7ca3ff6907da47a9292dd9`

See more details on using hashes here.

File details

Details for the file mistikguard-0.1.1-py3-none-any.whl.

File metadata

Download URL: mistikguard-0.1.1-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 23.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mistikguard-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`187e0f93927495f5048af5a0ec95ef8e7ffd10b99eb8a194c144c5a2bcb42fcf`
MD5	`044c77527be0a3bf50de1ffee464c7d0`
BLAKE2b-256	`cff8f402ba30b082f9816f6f8a0c4cde461efe752d20906177583c29afc75c3d`

See more details on using hashes here.

mistikguard 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mistikguard

The benchmark

What it does

Install

Usage

What it does not do

Development

Security & Limitations

Prompt Injection

What Mistikguard Does NOT Protect Against

What Mistikguard Helps With

General Advice

License

Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes