The intelligence layer between your agents and oblivion
Project description
Mnemon
Stop paying for work your agent already did. Watch it get better every run.
Install · Quickstart · Benchmarks · API
Mnemon gives your agent two things:
Token and latency savings — repeated tasks skip the LLM entirely. 2.66ms instead of 20 seconds. Zero tokens instead of 1,250. The more your agent runs, the more it saves.
A learning loop — every outcome is observed, every pattern is detected, every failure is quarantined. Your agent doesn't just cache work — it accumulates intelligence that makes the next run cheaper and better than the last.
One line of code. No infrastructure. No changes to your existing agent.
import mnemon
m = mnemon.init()
The Problem
Every agent framework — LangChain, CrewAI, AutoGen, LangGraph — is stateless by default.
Your agent generates a security report for Acme Corp every Monday. Every Monday it starts from zero: re-reads the same context, re-reasons through the same structure, re-generates the same plan. You pay full LLM price each time. It never gets faster. It never gets smarter.
You built a smart agent. You got an amnesiac that invoices you twice.
Two Things That Fix This
1. Execution Memory Engine — save tokens and time
The EME is a generalised execution cache for any expensive recurring computation. After the first run, Mnemon fingerprints the plan and stores it. Every subsequent run with the same — or semantically similar — goal is served from cache.
First run: 20,000ms · 1,250 tokens · full cost
Every repeat: 2.66ms · 0 tokens · $0.00
It works in two modes:
- System 1 — exact fingerprint match. Sub-millisecond. Zero LLM calls.
- System 2 — partial segment match. Only the changed parts go to the LLM. You pay for the delta, not the whole plan.
Failed segments are quarantined by the Retrospector — bad patterns can't recycle into future plans.
Ships with 100 pre-warmed segments from real enterprise runs so the cache starts warm on day one.
2. Experience Bus — a learning loop that never stops
The Bus is a passive observer. Every computation outcome — success, failure, latency, pattern — is recorded and analysed in the background. You never call it directly. It's always running.
What it detects:
| Signal | What it means |
|---|---|
DEGRADATION |
latency spike vs rolling baseline |
PATTERN_FOUND |
a task type is failing at >30% — before you notice |
ANOMALY |
sudden failure after a string of successes |
RECOVERY |
the agent is stable again after a failure streak |
What it does with that intelligence: feeds it back to the EME. Success patterns strengthen the fragment library. Failure patterns trigger quarantine. The cache gets smarter on every run — not just bigger.
This is the loop: EME saves the work. Bus learns from it. Both get better.
The Numbers
Execution cache — EME benchmarks
| System 1 hit (exact match) | 2.66ms |
| Fresh LLM generation | ~20,000ms |
| Speedup | 7,500× |
| 50 concurrent agents, burst | 0 LLM calls · 0.18s total |
| Tokens saved (50 agents) | 62,500 |
| Cost saved (50 agents) | $0.94 |
At scale (80% System 1 + 15% System 2 hit rate)
| Daily plans | Monthly cost saved |
|---|---|
| 100 | $56 |
| 1,000 | $503 |
| 10,000 | $5,034 |
| 100,000 | $50,344 |
What your session looks like
First run (cache miss — plan is stored):
Mnemon: 1 plan(s) cached → next run saves ~1,250 tokens (~$0.0038)
Every run after (cache hit):
Mnemon: ~1,250 tokens saved · ~$0.0038 · 20.0s faster
Full runs, methodology, and raw data: reports/
Zero Code Changes
Mnemon patches your installed frameworks at the call level. One import, nothing else:
import mnemon
m = mnemon.init()
# everything below is unchanged
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-6")
response = llm.invoke("Generate weekly security report for Acme Corp")
Supported frameworks:
| Framework | What gets patched |
|---|---|
| Anthropic SDK | client.messages.create |
| OpenAI SDK | client.chat.completions.create |
| LangChain | BaseChatModel.invoke / ainvoke |
| LangGraph | CompiledGraph.invoke / ainvoke |
| CrewAI | crew kickoff via event bus hook |
| AutoGen | ConversableAgent.generate_reply |
Framework notes:
- LangGraph — call
mnemon.init()before compiling your graph. - CrewAI — import
crewaibefore callingmnemon.init().
vs. Everything Else
| Mnemon | Mem0 | LangMem | Roll your own | |
|---|---|---|---|---|
| Execution caching (skip LLM entirely) | ✅ | ❌ | ❌ | ❌ |
| System learning loop | ✅ | ❌ | ❌ | ❌ |
| Zero-code auto-instrumentation | ✅ | ❌ | ❌ | ❌ |
| Runs fully local (no cloud, no API) | ✅ | ❌ | ❌ | ✅ |
| Drift detection | ✅ | ❌ | ❌ | ❌ |
| Multi-tenant isolation | ✅ | ✅ | ❌ | ⚠️ |
| One-line setup | ✅ | ❌ | ❌ | ❌ |
Every other library makes your prompt slightly better. Mnemon eliminates the LLM call on repeated work and makes the next run cheaper than the last.
Install
pip install mnemon-ai
pip install mnemon-ai[embeddings] # sentence-transformers — recommended for production
pip install mnemon-ai[full] # embeddings + all LLM providers
Set one environment variable (used only for gap-fill — retrieval never calls the LLM):
export GROQ_API_KEY=gsk_... # pip install mnemon-ai[groq] ← free tier, start here
export ANTHROPIC_API_KEY=sk-... # pip install mnemon-ai[anthropic]
export OPENAI_API_KEY=sk-... # pip install mnemon-ai[openai]
export GOOGLE_API_KEY=AIza... # pip install mnemon-ai[google]
Mnemon detects the key automatically.
No API key? Try the demo:
mnemon demo
Quickstart
Path 1 — zero code changes (recommended)
Already using Anthropic SDK, OpenAI, LangChain, LangGraph, CrewAI, or AutoGen? Add two lines. Everything else stays the same.
import mnemon
mnemon.init() # auto-detects installed frameworks and patches them
# your existing code — completely unchanged
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Generate weekly security report for Acme Corp"}]
)
Mnemon intercepts the call, checks its cache, and returns a cached response instantly on repeat runs — no API call made, no tokens spent.
Path 2 — explicit caching with m.run()
For any expensive recurring computation that isn't a direct framework call.
generation_fn is your real logic — only called on a cache miss.
import mnemon
from anthropic import Anthropic
client = Anthropic()
m = mnemon.init()
def generate_report(goal, inputs, context, capabilities, constraints):
# only runs on a cache miss — put your real LLM call here
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": goal}],
)
return response.content[0].text
result = m.run(
goal="weekly security audit for Acme Corp",
inputs={"client": "Acme Corp", "week": "Apr 21-25"},
generation_fn=generate_report,
)
print(result["output"]) # your actual result — the return value of generation_fn
print(result["cache_level"]) # "system1" | "system2" | "miss"
print(result["tokens_saved"]) # 1250 on a cache hit, 0 on first run
print(result["latency_saved_ms"]) # 20000.0 on a cache hit
generation_fn also accepts async def — both work.
See what you've saved
m = mnemon.get() # retrieve the running instance from anywhere
print(m.waste_report) # repeated queries and their cumulative cost
print(m.get_stats()) # EME stats, bus signals, DB stats
API
Init
m = mnemon.init() # global singleton
m = mnemon.init(tenant_id="acme_corp") # explicit tenant
m = mnemon.init(silent=True) # suppress session summary
m = mnemon.init(eme_enabled=False) # bus + MOTH only
m = mnemon.init(bus_enabled=False) # EME + MOTH only
m = mnemon.get() # retrieve the running instance
Diagnostics
report = m.drift_report() # cross-session degradation analysis
stats = m.get_stats() # EME, bus, watchdog, DB stats
print(m.waste_report) # repeated queries + cost
CLI:
mnemon doctor # health check
mnemon demo # live demo
Async
from mnemon import Mnemon
async with Mnemon(tenant_id="my_company") as m:
result = await m.run(
goal="weekly security audit for Acme Corp",
inputs={"client": "Acme Corp", "week": "Apr 21-25"},
generation_fn=my_planning_function,
)
print(result["output"]) # your actual result
print(result["cache_level"])
print(result["tokens_saved"])
Production — multi-tenant
from mnemon import Mnemon
from mnemon.security.manager import TenantSecurityConfig
m = Mnemon(
tenant_id="acme_corp",
security_config=TenantSecurityConfig(
tenant_id="acme_corp",
blocked_categories=["pii", "medical_records"],
encrypt_privileged=True,
),
enable_watchdog=True,
enable_telemetry=True,
)
Each tenant_id gets an isolated SQLite database — no cross-tenant leakage.
From config
m = Mnemon.from_config("./mnemon.config.json")
Fail-Safe
Mnemon never crashes the system it wraps.
| What fails | What happens |
|---|---|
| EME cache | generation_fn called directly |
| Experience bus | agent continues unmonitored |
| Database unavailable | in-memory fallback |
All failures are logged, never raised. You can't break your agent by adding Mnemon.
Why This Exists
These aren't hypothetical — we filed these issues before writing a line of Mnemon:
- CrewAI #4415 — context pollution and DB write contention in multi-agent runs
- Dify #32306 — redundant reasoning tax in agent nodes
- Kimi CLI #1058 — context saturation in 100-agent swarms
- E2B #1207 — environmental amnesia across sandbox restarts
License
MIT. Free to use, free to build on.
Your agents have a Mnemon now.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mnemon_ai-1.0.6.tar.gz.
File metadata
- Download URL: mnemon_ai-1.0.6.tar.gz
- Upload date:
- Size: 175.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f5e847c5ef6a76e93dabdee4436c1eeb15133f0c1d959851698ba20cceda53e
|
|
| MD5 |
74c402ea7f302219a98aeea6abd5cb7d
|
|
| BLAKE2b-256 |
998dd69f20f2ff291d2799b4f242a49a59f96acbcb2c3a6e66d8fd3972ea7705
|
Provenance
The following attestation bundles were made for mnemon_ai-1.0.6.tar.gz:
Publisher:
publish.yml on smartass-4ever/Mnemon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mnemon_ai-1.0.6.tar.gz -
Subject digest:
8f5e847c5ef6a76e93dabdee4436c1eeb15133f0c1d959851698ba20cceda53e - Sigstore transparency entry: 1533030543
- Sigstore integration time:
-
Permalink:
smartass-4ever/Mnemon@f6dd37f899fb5ced6baa55f88dc30cd7b0f39773 -
Branch / Tag:
refs/tags/v1.0.6 - Owner: https://github.com/smartass-4ever
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f6dd37f899fb5ced6baa55f88dc30cd7b0f39773 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mnemon_ai-1.0.6-py3-none-any.whl.
File metadata
- Download URL: mnemon_ai-1.0.6-py3-none-any.whl
- Upload date:
- Size: 191.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b96f000af88aaa091eb8c03dd356a84a7738b05df64544654c838f5ffdfda7d7
|
|
| MD5 |
0eca3c988c972ffcf1340dc15f5afc49
|
|
| BLAKE2b-256 |
633453a0439ec34ebf33eed4d6c568113b08389f0816d0f466898f36848e4b4a
|
Provenance
The following attestation bundles were made for mnemon_ai-1.0.6-py3-none-any.whl:
Publisher:
publish.yml on smartass-4ever/Mnemon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mnemon_ai-1.0.6-py3-none-any.whl -
Subject digest:
b96f000af88aaa091eb8c03dd356a84a7738b05df64544654c838f5ffdfda7d7 - Sigstore transparency entry: 1533030773
- Sigstore integration time:
-
Permalink:
smartass-4ever/Mnemon@f6dd37f899fb5ced6baa55f88dc30cd7b0f39773 -
Branch / Tag:
refs/tags/v1.0.6 - Owner: https://github.com/smartass-4ever
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f6dd37f899fb5ced6baa55f88dc30cd7b0f39773 -
Trigger Event:
release
-
Statement type: