mnemon-ai

Cut LLM agent token costs by 93%. Execution cache for LangChain, CrewAI, AutoGen, LangGraph — zero tokens on repeat runs, 2.66ms latency vs 20s.

These details have not been verified by PyPI

Project links

Homepage

Project description

Mnemon

Stop paying for work your agent already did. Watch it get better every run.

Quickstart · Benchmarks · How it works · Install · API

Mnemon gives your agent two things:

Token and latency savings — repeated tasks skip the LLM entirely. 2.66ms instead of 20 seconds. Zero tokens instead of 1,250. The more your agent runs, the more it saves.

A learning loop — every outcome is observed, every pattern is detected, every failure is quarantined. Your agent doesn't just cache work — it accumulates intelligence that makes the next run cheaper and better than the last.

One line of code. No infrastructure. No changes to your existing agent.

Built for: agentic workflows that repeat · production AI agents with high API costs · teams optimizing LLM infrastructure spend · LangChain, CrewAI, AutoGen, and LangGraph pipelines running at scale.

import mnemon
m = mnemon.init()

The Problem

Every agent framework — LangChain, CrewAI, AutoGen, LangGraph — is stateless by default.

Your agent generates a security report for Acme Corp every Monday. Every Monday it starts from zero: re-reads the same context, re-reasons through the same structure, re-generates the same plan. You pay full LLM price each time. It never gets faster. It never gets smarter.

This is the core problem with agentic workflow cost optimization: every run is treated as the first run. Token usage doesn't decrease. API costs don't decrease. Latency doesn't decrease. You scale your agent, you scale your bill.

You built a smart agent. You got an amnesiac that invoices you twice.

Quickstart

Path 1 — zero code changes (recommended)

Already using Anthropic SDK, OpenAI, LangChain, LangGraph, CrewAI, or AutoGen? Add two lines. Everything else stays the same.

import mnemon
mnemon.init()   # auto-detects installed frameworks and patches them

# your existing code — completely unchanged
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Generate weekly security report for Acme Corp"}]
)

Mnemon intercepts the call, checks its cache, and returns a cached response instantly on repeat runs — no API call made, no tokens spent.

Path 2 — explicit caching with `m.run()`

For any expensive recurring computation that isn't a direct framework call. generation_fn is your real logic — only called on a cache miss.

import mnemon
from anthropic import Anthropic

client = Anthropic()
m = mnemon.init()

def generate_report(goal, inputs, context, capabilities, constraints):
    # only runs on a cache miss — put your real LLM call here
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": goal}],
    )
    return response.content[0].text

result = m.run(
    goal="weekly security audit for Acme Corp",
    inputs={"client": "Acme Corp", "week": "Apr 21-25"},
    generation_fn=generate_report,
)

print(result["output"])            # your actual result — the return value of generation_fn
print(result["cache_level"])       # "system1" | "system2" | "miss"
print(result["tokens_saved"])      # 1250 on a cache hit, 0 on first run
print(result["latency_saved_ms"])  # 20000.0 on a cache hit

generation_fn also accepts async def — both work.

See what you've saved

m = mnemon.get()             # retrieve the running instance from anywhere
print(m.waste_report)        # repeated queries and their cumulative cost
print(m.get_stats())         # EME stats, bus signals, DB stats

The Numbers

Execution cache — EME benchmarks


System 1 hit (exact match)	2.66ms
Fresh LLM generation	~20,000ms
Speedup	7,500×
50 concurrent agents, burst	0 LLM calls · 0.18s total
Tokens saved (50 agents)	62,500
Cost saved (50 agents)	$0.94

At scale (80% System 1 + 15% System 2 hit rate)

Daily plans	Monthly cost saved
100	$56
1,000	$503
10,000	$5,034
100,000	$50,344

What your session looks like

First run (cache miss — plan is stored):

Mnemon: 1 plan(s) cached → next run saves ~1,250 tokens (~$0.0038)

Every run after (cache hit):

Mnemon: ~1,250 tokens saved · ~$0.0038 · 20.0s faster

Full runs, methodology, and raw data: reports/

Academic validation: Stanford researchers published Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents at NeurIPS 2025, measuring 50.31% cost reduction with the same approach. Mnemon is the production implementation of this idea — one import, works today.

Two Things That Fix This

1. Execution Memory Engine — save tokens and time

The EME is a generalised execution cache for any expensive recurring computation. After the first run, Mnemon fingerprints the plan and stores it. Every subsequent run with the same — or semantically similar — goal is served from cache.

First run:  20,000ms · 1,250 tokens · full cost
Every repeat:  2.66ms · 0 tokens   · $0.00

It works in two modes:

System 1 — exact fingerprint match. Sub-millisecond. Zero LLM calls.
System 2 — partial segment match. Only the changed parts go to the LLM. You pay for the delta, not the whole plan.

Failed segments are quarantined by the Retrospector — bad patterns can't recycle into future plans.

Ships with 100 pre-warmed segments from real enterprise runs so the cache starts warm on day one.

2. Experience Bus — a learning loop that never stops

The Bus is a passive observer. Every computation outcome — success, failure, latency, pattern — is recorded and analysed in the background. You never call it directly. It's always running.

What it detects:

Signal	What it means
`DEGRADATION`	latency spike vs rolling baseline
`PATTERN_FOUND`	a task type is failing at >30% — before you notice
`ANOMALY`	sudden failure after a string of successes
`RECOVERY`	the agent is stable again after a failure streak

What it does with that intelligence: feeds it back to the EME. Success patterns strengthen the fragment library. Failure patterns trigger quarantine. The cache gets smarter on every run — not just bigger.

This is the loop: EME saves the work. Bus learns from it. Both get better.

Zero Code Changes

Mnemon patches your installed frameworks at the call level. One import, nothing else:

import mnemon
m = mnemon.init()

# everything below is unchanged
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-6")
response = llm.invoke("Generate weekly security report for Acme Corp")

Supported frameworks:

Framework	What gets patched
Anthropic SDK	`client.messages.create`
OpenAI SDK	`client.chat.completions.create`
LangChain	`BaseChatModel.invoke` / `ainvoke`
LangGraph	`CompiledGraph.invoke` / `ainvoke`
CrewAI	crew kickoff via event bus hook
AutoGen	`ConversableAgent.generate_reply`

Framework notes:

LangGraph — call mnemon.init() before compiling your graph.
CrewAI — import crewai before calling mnemon.init().

vs. Everything Else

	Mnemon	Mem0	LangMem	Roll your own
Execution caching (skip LLM entirely)	✅	❌	❌	❌
System learning loop	✅	❌	❌	❌
Zero-code auto-instrumentation	✅	❌	❌	❌
Runs fully local (no cloud, no API)	✅	❌	❌	✅
Drift detection	✅	❌	❌	❌
Multi-tenant isolation	✅	✅	❌	⚠️
One-line setup	✅	❌	❌	❌

Every other library makes your prompt slightly better. Mnemon eliminates the LLM call on repeated work and makes the next run cheaper than the last.

Install

pip install mnemon-ai

No API key needed to start:

mnemon demo     # see it working in 30 seconds
mnemon doctor   # health check

pip install mnemon-ai[embeddings]   # sentence-transformers — recommended for production
pip install mnemon-ai[full]         # embeddings + all LLM providers

Optional: set an API key to enable System 2 gap-fill (only needed for partial segment regeneration):

export GROQ_API_KEY=gsk_...      # pip install mnemon-ai[groq]   ← free tier, start here
export ANTHROPIC_API_KEY=sk-...  # pip install mnemon-ai[anthropic]
export OPENAI_API_KEY=sk-...     # pip install mnemon-ai[openai]
export GOOGLE_API_KEY=AIza...    # pip install mnemon-ai[google]

Mnemon detects the key automatically.

API

Init

m = mnemon.init()                             # global singleton
m = mnemon.init(tenant_id="acme_corp")        # explicit tenant
m = mnemon.init(silent=True)                  # suppress session summary
m = mnemon.init(eme_enabled=False)            # bus + MOTH only
m = mnemon.init(bus_enabled=False)            # EME + MOTH only
m = mnemon.get()                              # retrieve the running instance

Diagnostics

report = m.drift_report()   # cross-session degradation analysis
stats  = m.get_stats()      # EME, bus, watchdog, DB stats
print(m.waste_report)       # repeated queries + cost

CLI:

mnemon doctor   # health check
mnemon demo     # live demo

Async

from mnemon import Mnemon

async with Mnemon(tenant_id="my_company") as m:
    result = await m.run(
        goal="weekly security audit for Acme Corp",
        inputs={"client": "Acme Corp", "week": "Apr 21-25"},
        generation_fn=my_planning_function,
    )
    print(result["output"])         # your actual result
    print(result["cache_level"])
    print(result["tokens_saved"])

Production — multi-tenant

from mnemon import Mnemon
from mnemon.security.manager import TenantSecurityConfig

m = Mnemon(
    tenant_id="acme_corp",
    security_config=TenantSecurityConfig(
        tenant_id="acme_corp",
        blocked_categories=["pii", "medical_records"],
        encrypt_privileged=True,
    ),
    enable_watchdog=True,
    enable_telemetry=True,
)

Each tenant_id gets an isolated SQLite database — no cross-tenant leakage.

From config

m = Mnemon.from_config("./mnemon.config.json")

Fail-Safe

Mnemon never crashes the system it wraps.

What fails	What happens
EME cache	`generation_fn` called directly
Experience bus	agent continues unmonitored
Database unavailable	in-memory fallback

All failures are logged, never raised. You can't break your agent by adding Mnemon.

Why This Exists

These aren't hypothetical — we filed these issues before writing a line of Mnemon:

CrewAI #4415 — context pollution and DB write contention in multi-agent runs
Dify #32306 — redundant reasoning tax in agent nodes
Kimi CLI #1058 — context saturation in 100-agent swarms
E2B #1207 — environmental amnesia across sandbox restarts

License

MIT. Free to use, free to build on.

_{Mnemon was Alexander the Great's personal historian — the one whose only job was to ensure nothing was ever forgotten, so every campaign built on the total accumulated knowledge of every campaign before it.
Your agents have a Mnemon now.}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.9

May 22, 2026

This version

1.0.7

May 21, 2026

1.0.6

May 14, 2026

1.0.5

Apr 17, 2026

1.0.4

Apr 17, 2026

1.0.3

Mar 26, 2026

1.0.2

Mar 25, 2026

1.0.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnemon_ai-1.0.7.tar.gz (180.4 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mnemon_ai-1.0.7-py3-none-any.whl (193.9 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file mnemon_ai-1.0.7.tar.gz.

File metadata

Download URL: mnemon_ai-1.0.7.tar.gz
Upload date: May 21, 2026
Size: 180.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for mnemon_ai-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`93310698e9dd51f16d08f1fd7c2c187c2fd9c4108d3efb52489547ab8c551533`
MD5	`63de0cdd04efe6b8caeef284394583fd`
BLAKE2b-256	`a452d8e7031c785bfceee465eb87b0482f17d29a3cd0cf4607a64295e530f9a2`

See more details on using hashes here.

File details

Details for the file mnemon_ai-1.0.7-py3-none-any.whl.

File metadata

Download URL: mnemon_ai-1.0.7-py3-none-any.whl
Upload date: May 21, 2026
Size: 193.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for mnemon_ai-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`166aedf4eca1bf4fda564eb32cd6130bead48f7431885e9905f948c414ce3c9d`
MD5	`9ea2aca6390079354c932516fd9829ac`
BLAKE2b-256	`41c292c7d5d4211aaa1ade401d20d743027445aba744f77d97d8a0131fc1b349`

See more details on using hashes here.

mnemon-ai 1.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mnemon

The Problem

Quickstart

Path 1 — zero code changes (recommended)

Path 2 — explicit caching with m.run()

See what you've saved

The Numbers

Execution cache — EME benchmarks

At scale (80% System 1 + 15% System 2 hit rate)

What your session looks like

Two Things That Fix This

1. Execution Memory Engine — save tokens and time

2. Experience Bus — a learning loop that never stops

Zero Code Changes

vs. Everything Else

Install

API

Init

Diagnostics

Async

Production — multi-tenant

From config

Fail-Safe

Why This Exists

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Path 2 — explicit caching with `m.run()`