Cognition layer for AI agents — persistent memory, performance tracking, and insight synthesis
Project description
Dhee
Stop burning tokens on context your agent doesn't need this turn.
Keep your CLAUDE.md, your skills, your AGENTS.md — exactly as they are.
Dhee selects what's relevant per prompt and injects only that.
The Problem
Every AI coding agent today dumps your entire CLAUDE.md into every LLM call. 200 lines of project rules, coding conventions, testing guidelines — loaded into the prompt whether you're running tests or writing a docstring. Every turn. Full price.
Over a 20-turn session on Opus, that's 40,000+ tokens of mostly-irrelevant context. You're paying for the model to read your git commit conventions while it's fixing an auth bug.
And it gets worse over time. After 6 months, your CLAUDE.md is 500 lines. Your skills directory has 12 files. Your AGENTS.md has grown. But the agent still loads all of it, every turn, at full token cost. The markdown files that were supposed to make your agent smarter are now your biggest line item.
How Dhee Fixes It
Before Dhee: CLAUDE.md (2000 tokens) → loaded every turn → 40K tokens/session
With Dhee: CLAUDE.md → chunked + vectorized → ~300 tokens of relevant rules per turn
Dhee sits between your documentation and the LLM. It chunks your markdown files into heading-scoped pieces, embeds them once, and on each prompt selects only the chunks that match what the user is actually asking about.
"How do I run the tests?" → Dhee injects your Testing Guidelines section (292 tokens), not your entire CLAUDE.md (2000 tokens). 67% reduction, zero information loss.
"Explain dark matter to me" → Dhee injects nothing. No project docs are relevant. 100% reduction.
Your files stay exactly where they are. You maintain them the same way. Dhee just makes the delivery intelligent.
Quick Start
One command. No venv. No config. No pasting into settings.json.
curl -fsSL https://raw.githubusercontent.com/Sankhya-AI/Dhee/main/install.sh | sh
That's it. The installer creates ~/.dhee with a hidden venv, installs the dhee package, and wires Claude Code hooks automatically. Next time you open Claude Code in any project, cognition is on.
Other install options
Via pip (if you manage your own venv):
pip install dhee
dhee install # configure Claude Code hooks
From source (contributors):
git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee
./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
dhee install
After install, Dhee auto-ingests project docs (CLAUDE.md, AGENTS.md, etc.) on the first session. Run dhee ingest manually any time to re-chunk.
The Lifecycle
Dhee manages information through a complete lifecycle — not just storage and retrieval, but learning, decay, and promotion.
┌─────────────────────────┐
│ Your Documentation │
│ CLAUDE.md, AGENTS.md │
│ SKILL.md, etc. │
└──────────┬──────────────┘
│
dhee ingest
(chunk + embed)
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Dhee Vector Store │
│ │
│ Doc Chunks (high strength, heading-scoped) │
│ Short-term memories (facts, file edits, failures) │
│ Long-term memories (promoted by access + strength) │
│ Typed cognition (insights, beliefs, policies, intentions) │
└──────────────────────────────────────────────────────────────┘
│
┌──────────┴──────────┐
│ │
Session Start Each Prompt
(full assembly) (doc chunks only)
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Relevant docs │ │ Matching rules │
│ + insights │ │ above threshold │
│ + performance │ │ or nothing │
│ + warnings │ │ │
└────────┬────────┘ └────────┬────────┘
│ │
▼ ▼
┌─────────────────────────────────────┐
│ Token-budgeted XML injection │
│ <dhee> │
│ <r s="0.83">Always run pytest...│
│ </r> │
│ </dhee> │
└─────────────────────────────────────┘
│
LLM sees only
what it needs this turn
During the session
| Event | What Dhee does | Token cost |
|---|---|---|
| Session opens | Auto-ingests stale docs, assembles relevant doc chunks + typed cognition | ~300-900 tokens (vs 2000+ for raw files) |
| Each user prompt | Searches doc chunks for THIS specific question. Injects matching rules above threshold. | 0-300 tokens (0 when nothing matches) |
| Tool use (Edit/Write) | Records which files were touched (for session context) | 0 tokens (storage only) |
| Tool failure (Bash) | Stores the failure + error message as a learnable signal | 0 tokens (storage only) |
| Session ends | Checkpoints outcomes, what worked/failed → becomes insights for next session | 0 tokens (storage only) |
Between sessions
| Phase | What happens |
|---|---|
| Short-term memory | Facts from the session sit in SML with natural decay |
| Promotion | Frequently-accessed memories promote to long-term (LML) automatically |
| Decay | Unused memories fade on an Ebbinghaus curve. Your 50th memory costs the same as your 50,000th. |
| Insight synthesis | what_worked / what_failed from checkpoints become transferable learnings |
| Intentions | "Remember to run auth tests after login.py changes" fires when the trigger matches |
The result
Your documentation stays fat and thorough — that's your team's knowledge base. But the LLM only sees the slice it needs, when it needs it. After a year, you have 50 files and 10,000 memories. The per-turn injection is still ~300 tokens.
Why Not Just CLAUDE.md?
Markdown files work great at first. 50 lines, manually curated, loaded fresh every session. But they don't scale:
| Markdown files | Dhee | |
|---|---|---|
| Token cost | Linear with file size. 500 lines = 5000 tokens every turn. | Constant ~300 tokens regardless of total knowledge. |
| Relevance | Everything loaded, always. Git commit rules injected while fixing auth. | Only matching chunks. Off-topic turns cost 0 tokens. |
| Staleness | Equal weight forever. A rule from 6 months ago sits next to today's. | Natural decay. Unused knowledge fades. Fresh knowledge surfaces. |
| Scale | Hits context limits. You start deleting old rules to make room. | 50,000 memories, same injection cost as 50. |
| Learning | Static. Agent makes the same mistakes next session. | Captures what worked/failed. Synthesizes transferable insights. |
| Cross-session | Cold start every time unless you manually update the file. | Session handoff, performance trends, prospective memory. |
Dhee doesn't replace your markdown files. It makes them work at scale. Keep writing CLAUDE.md the way you always have. Dhee handles the delivery.
The 4-Operation API
Every interface — hooks, MCP, Python, CLI — exposes the same 4 operations.
remember(content) — Store a fact
0 LLM calls, 1 embedding (~$0.0002). Stored immediately. Echo enrichment (paraphrases, keywords for better recall) runs at checkpoint.
d.remember("User prefers FastAPI over Flask")
recall(query) — Search memory
0 LLM calls, 1 embedding. Pure vector search with echo-boosted re-ranking.
results = d.recall("what framework does the project use?")
# [{"memory": "User prefers FastAPI over Flask", "score": 0.94}]
context(task_description) — Session bootstrap
Returns everything the agent needs: last session state, performance trends, insights, intentions, warnings, and relevant memories.
ctx = d.context("fixing the auth bug")
# ctx["insights"] → [{"content": "What worked: git blame → found breaking commit"}]
# ctx["warnings"] → ["Performance on 'bug_fix' declining"]
checkpoint(summary, ...) — End-of-session cognition
Where the learning happens. 1 LLM call per ~10 memories.
d.checkpoint(
"Fixed auth bug",
what_worked="git blame showed the exact breaking commit",
what_failed="grep was too slow on the monorepo",
outcome_score=1.0,
)
Integration
Claude Code — Native Hooks
pip install dhee
dhee install # installs lifecycle hooks
dhee ingest # chunks project docs into vector memory
That's it. Six hooks fire automatically at the right moments. No SKILL.md, no plugin directories. The agent doesn't even know Dhee is there — it just gets better context.
MCP Server (Claude Code, Cursor, any MCP client)
{
"mcpServers": {
"dhee": { "command": "dhee-mcp" }
}
}
4 tools exposed. The agent uses them as needed.
Python SDK
from dhee import Dhee
d = Dhee()
d.remember("User prefers dark mode")
d.recall("what theme does the user like?")
d.context("fixing auth bug")
d.checkpoint("Fixed it", what_worked="git blame first")
CLI
dhee remember "User prefers Python"
dhee recall "programming language"
dhee ingest CLAUDE.md AGENTS.md # chunk specific files
dhee ingest # auto-scan project
dhee docs # show ingested manifest
dhee checkpoint "Fixed auth bug" --what-worked "checked logs"
dhee install # install Claude Code hooks
dhee uninstall-hooks # remove them
Docker
docker compose up -d # uses OPENAI_API_KEY from env
Cost
| Operation | LLM calls | Embed calls | Cost |
|---|---|---|---|
remember |
0 | 1 | ~$0.0002 |
recall |
0 | 1 | ~$0.0002 |
context |
0 | 0-1 | ~$0.0002 |
checkpoint |
1 per ~10 memories | 0 | ~$0.001 |
| Typical session | 1 | ~15 | ~$0.004 |
The Dhee overhead per session is ~$0.004. The token savings from selective injection on a 20-turn Opus session are ~$0.50+. >100x ROI.
Under the Hood
Memory Store (Engram)
SQLite + vector index. On the hot path (remember/recall), zero LLM calls — just embedding. At checkpoint, unified enrichment runs in one batched LLM call:
- Echo encoding — paraphrases, keywords, question-forms so "User likes dark mode" matches "what theme?"
- Category inference — auto-tags for filtering
- Strength-based decay — Ebbinghaus curve. Frequently accessed → promoted to long-term. Unused → fades.
Cognition Engine (Buddhi)
Parallel intelligence layer that builds meta-knowledge from the memory pipeline:
- Performance tracking — outcomes per task type, trend detection, regression warnings
- Insight synthesis — causal hypotheses from what worked/failed, with confidence scores
- Prospective memory — future triggers with keyword matching
- Belief store — confidence-tracked facts with contradiction detection (experimental)
- Policy store — condition→action rules from task completions (experimental)
Zero LLM calls on hot path. Pure pattern matching + statistics.
Doc Pipeline (v3.3.1)
- Chunker — heading-scoped splits that respect code fences, paragraph boundaries, size limits
- Ingest — SHA-tracked. Re-ingesting unchanged files is a no-op. Changed files get atomic chunk replacement.
- Assembler — vector similarity search filtered by
kind=doc_chunk, score threshold, token budget - Renderer — Caveman-compressed XML:
<dhee><r s="0.83">...</r></dhee>. No header, no wrapper tags, no indentation — every byte earns its place. ~40% fewer structural tokens vs v3.3.
Provider Options
pip install dhee[openai,mcp] # OpenAI (recommended, cheapest embeddings)
pip install dhee[gemini,mcp] # Google Gemini
pip install dhee[ollama,mcp] # Ollama (local, no API costs)
Contributing
git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee
./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
pytest # 978 tests
Your docs stay fat. Your token bill stays thin.
GitHub ·
PyPI ·
Issues
MIT License — Sankhya AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dhee-3.4.0.tar.gz.
File metadata
- Download URL: dhee-3.4.0.tar.gz
- Upload date:
- Size: 599.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ca1cb60945cf5fbb12cc08b41c98c699bb1435e5b4a365b72ab4b3e9387f04a
|
|
| MD5 |
ed0bfc36faa6ba6a0f8b44a4acca51f6
|
|
| BLAKE2b-256 |
b5a26cbff2184a1a4be32222d5bb18cfee1a7643dc134d52caaa99ae82a9caa2
|
File details
Details for the file dhee-3.4.0-py3-none-any.whl.
File metadata
- Download URL: dhee-3.4.0-py3-none-any.whl
- Upload date:
- Size: 541.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b68f7278617e18fea46c22abb7828332bbabdde0bb825d505cb60bc1ea3c90a
|
|
| MD5 |
64303fb58a5145052baa1b46c35ac483
|
|
| BLAKE2b-256 |
d16d96d48d721e50c89895b1161423009fbd465e6ac32e4dabe328b493ed04b1
|