Skip to main content

Cognition layer for AI agents — persistent memory, performance tracking, and insight synthesis

Project description

Dhee

Dhee

Stop burning tokens on context your agent doesn't need this turn.

Keep your CLAUDE.md, your skills, your AGENTS.md — exactly as they are.
Dhee selects what's relevant per prompt and injects only that.

Python 3.9+ MIT License


The Problem

Every AI coding agent today dumps your entire CLAUDE.md into every LLM call. 200 lines of project rules, coding conventions, testing guidelines — loaded into the prompt whether you're running tests or writing a docstring. Every turn. Full price.

Over a 20-turn session on Opus, that's 40,000+ tokens of mostly-irrelevant context. You're paying for the model to read your git commit conventions while it's fixing an auth bug.

And it gets worse over time. After 6 months, your CLAUDE.md is 500 lines. Your skills directory has 12 files. Your AGENTS.md has grown. But the agent still loads all of it, every turn, at full token cost. The markdown files that were supposed to make your agent smarter are now your biggest line item.

How Dhee Fixes It

Before Dhee:  CLAUDE.md (2000 tokens) → loaded every turn → 40K tokens/session
With Dhee:    CLAUDE.md → chunked + vectorized → ~300 tokens of relevant rules per turn

Dhee sits between your documentation and the LLM. It chunks your markdown files into heading-scoped pieces, embeds them once, and on each prompt selects only the chunks that match what the user is actually asking about.

"How do I run the tests?" → Dhee injects your Testing Guidelines section (292 tokens), not your entire CLAUDE.md (2000 tokens). 67% reduction, zero information loss.

"Explain dark matter to me" → Dhee injects nothing. No project docs are relevant. 100% reduction.

Your files stay exactly where they are. You maintain them the same way. Dhee just makes the delivery intelligent.


Quick Start

One command. No venv. No config. No pasting into settings.json.

curl -fsSL https://raw.githubusercontent.com/Sankhya-AI/Dhee/main/install.sh | sh

That's it. The installer creates ~/.dhee with a hidden venv, installs the dhee package, and wires Claude Code hooks automatically. Next time you open Claude Code in any project, cognition is on.

Other install options

Via pip (if you manage your own venv):

pip install dhee
dhee install       # configure Claude Code hooks

From source (contributors):

git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee
./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
dhee install

After install, Dhee auto-ingests project docs (CLAUDE.md, AGENTS.md, etc.) on the first session. Run dhee ingest manually any time to re-chunk.


The Lifecycle

Dhee manages information through a complete lifecycle — not just storage and retrieval, but learning, decay, and promotion.

                        ┌─────────────────────────┐
                        │   Your Documentation     │
                        │   CLAUDE.md, AGENTS.md   │
                        │   SKILL.md, etc.         │
                        └──────────┬──────────────┘
                                   │
                            dhee ingest
                          (chunk + embed)
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────┐
│                     Dhee Vector Store                         │
│                                                              │
│  Doc Chunks (high strength, heading-scoped)                  │
│  Short-term memories (facts, file edits, failures)           │
│  Long-term memories (promoted by access + strength)          │
│  Typed cognition (insights, beliefs, policies, intentions)   │
└──────────────────────────────────────────────────────────────┘
                                   │
                        ┌──────────┴──────────┐
                        │                     │
                   Session Start         Each Prompt
                   (full assembly)     (doc chunks only)
                        │                     │
                        ▼                     ▼
              ┌─────────────────┐   ┌─────────────────┐
              │ Relevant docs   │   │ Matching rules   │
              │ + insights      │   │ above threshold  │
              │ + performance   │   │ or nothing       │
              │ + warnings      │   │                  │
              └────────┬────────┘   └────────┬────────┘
                       │                     │
                       ▼                     ▼
              ┌─────────────────────────────────────┐
              │  Token-budgeted XML injection        │
              │  <dhee>                              │
              │    <r s="0.83">Always run pytest...│
              │    </r>                              │
              │  </dhee>                             │
              └─────────────────────────────────────┘
                                   │
                            LLM sees only
                         what it needs this turn

During the session

Event What Dhee does Token cost
Session opens Auto-ingests stale docs, assembles relevant doc chunks + typed cognition ~300-900 tokens (vs 2000+ for raw files)
Each user prompt Searches doc chunks for THIS specific question. Injects matching rules above threshold. 0-300 tokens (0 when nothing matches)
Tool use (Edit/Write) Records which files were touched (for session context) 0 tokens (storage only)
Tool failure (Bash) Stores the failure + error message as a learnable signal 0 tokens (storage only)
Session ends Checkpoints outcomes, what worked/failed → becomes insights for next session 0 tokens (storage only)

Between sessions

Phase What happens
Short-term memory Facts from the session sit in SML with natural decay
Promotion Frequently-accessed memories promote to long-term (LML) automatically
Decay Unused memories fade on an Ebbinghaus curve. Your 50th memory costs the same as your 50,000th.
Insight synthesis what_worked / what_failed from checkpoints become transferable learnings
Intentions "Remember to run auth tests after login.py changes" fires when the trigger matches

The result

Your documentation stays fat and thorough — that's your team's knowledge base. But the LLM only sees the slice it needs, when it needs it. After a year, you have 50 files and 10,000 memories. The per-turn injection is still ~300 tokens.


Why Not Just CLAUDE.md?

Markdown files work great at first. 50 lines, manually curated, loaded fresh every session. But they don't scale:

Markdown files Dhee
Token cost Linear with file size. 500 lines = 5000 tokens every turn. Constant ~300 tokens regardless of total knowledge.
Relevance Everything loaded, always. Git commit rules injected while fixing auth. Only matching chunks. Off-topic turns cost 0 tokens.
Staleness Equal weight forever. A rule from 6 months ago sits next to today's. Natural decay. Unused knowledge fades. Fresh knowledge surfaces.
Scale Hits context limits. You start deleting old rules to make room. 50,000 memories, same injection cost as 50.
Learning Static. Agent makes the same mistakes next session. Captures what worked/failed. Synthesizes transferable insights.
Cross-session Cold start every time unless you manually update the file. Session handoff, performance trends, prospective memory.

Dhee doesn't replace your markdown files. It makes them work at scale. Keep writing CLAUDE.md the way you always have. Dhee handles the delivery.


The 4-Operation API

Every interface — hooks, MCP, Python, CLI — exposes the same 4 operations.

remember(content) — Store a fact

0 LLM calls, 1 embedding (~$0.0002). Stored immediately. Echo enrichment (paraphrases, keywords for better recall) runs at checkpoint.

d.remember("User prefers FastAPI over Flask")

recall(query) — Search memory

0 LLM calls, 1 embedding. Pure vector search with echo-boosted re-ranking.

results = d.recall("what framework does the project use?")
# [{"memory": "User prefers FastAPI over Flask", "score": 0.94}]

context(task_description) — Session bootstrap

Returns everything the agent needs: last session state, performance trends, insights, intentions, warnings, and relevant memories.

ctx = d.context("fixing the auth bug")
# ctx["insights"] → [{"content": "What worked: git blame → found breaking commit"}]
# ctx["warnings"] → ["Performance on 'bug_fix' declining"]

checkpoint(summary, ...) — End-of-session cognition

Where the learning happens. 1 LLM call per ~10 memories.

d.checkpoint(
    "Fixed auth bug",
    what_worked="git blame showed the exact breaking commit",
    what_failed="grep was too slow on the monorepo",
    outcome_score=1.0,
)

Integration

Claude Code — Native Hooks

pip install dhee
dhee install    # installs lifecycle hooks
dhee ingest     # chunks project docs into vector memory

That's it. Six hooks fire automatically at the right moments. No SKILL.md, no plugin directories. The agent doesn't even know Dhee is there — it just gets better context.

MCP Server (Claude Code, Cursor, any MCP client)

{
  "mcpServers": {
    "dhee": { "command": "dhee-mcp" }
  }
}

4 tools exposed. The agent uses them as needed.

Python SDK

from dhee import Dhee

d = Dhee()
d.remember("User prefers dark mode")
d.recall("what theme does the user like?")
d.context("fixing auth bug")
d.checkpoint("Fixed it", what_worked="git blame first")

CLI

dhee remember "User prefers Python"
dhee recall "programming language"
dhee ingest CLAUDE.md AGENTS.md    # chunk specific files
dhee ingest                        # auto-scan project
dhee docs                          # show ingested manifest
dhee checkpoint "Fixed auth bug" --what-worked "checked logs"
dhee install                       # install Claude Code hooks
dhee uninstall-hooks               # remove them

Docker

docker compose up -d   # uses OPENAI_API_KEY from env

Cost

Operation LLM calls Embed calls Cost
remember 0 1 ~$0.0002
recall 0 1 ~$0.0002
context 0 0-1 ~$0.0002
checkpoint 1 per ~10 memories 0 ~$0.001
Typical session 1 ~15 ~$0.004

The Dhee overhead per session is ~$0.004. The token savings from selective injection on a 20-turn Opus session are ~$0.50+. >100x ROI.


Under the Hood

Memory Store (Engram)

SQLite + vector index. On the hot path (remember/recall), zero LLM calls — just embedding. At checkpoint, unified enrichment runs in one batched LLM call:

  • Echo encoding — paraphrases, keywords, question-forms so "User likes dark mode" matches "what theme?"
  • Category inference — auto-tags for filtering
  • Strength-based decay — Ebbinghaus curve. Frequently accessed → promoted to long-term. Unused → fades.

Cognition Engine (Buddhi)

Parallel intelligence layer that builds meta-knowledge from the memory pipeline:

  • Performance tracking — outcomes per task type, trend detection, regression warnings
  • Insight synthesis — causal hypotheses from what worked/failed, with confidence scores
  • Prospective memory — future triggers with keyword matching
  • Belief store — confidence-tracked facts with contradiction detection (experimental)
  • Policy store — condition→action rules from task completions (experimental)

Zero LLM calls on hot path. Pure pattern matching + statistics.

Doc Pipeline (v3.3.1)

  • Chunker — heading-scoped splits that respect code fences, paragraph boundaries, size limits
  • Ingest — SHA-tracked. Re-ingesting unchanged files is a no-op. Changed files get atomic chunk replacement.
  • Assembler — vector similarity search filtered by kind=doc_chunk, score threshold, token budget
  • Renderer — Caveman-compressed XML: <dhee><r s="0.83">...</r></dhee>. No header, no wrapper tags, no indentation — every byte earns its place. ~40% fewer structural tokens vs v3.3.

Provider Options

pip install dhee[openai,mcp]     # OpenAI (recommended, cheapest embeddings)
pip install dhee[gemini,mcp]     # Google Gemini
pip install dhee[ollama,mcp]     # Ollama (local, no API costs)

Contributing

git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee
./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
pytest    # 978 tests

Your docs stay fat. Your token bill stays thin.

GitHub · PyPI · Issues

MIT License — Sankhya AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dhee-3.4.0.tar.gz (599.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dhee-3.4.0-py3-none-any.whl (541.8 kB view details)

Uploaded Python 3

File details

Details for the file dhee-3.4.0.tar.gz.

File metadata

  • Download URL: dhee-3.4.0.tar.gz
  • Upload date:
  • Size: 599.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dhee-3.4.0.tar.gz
Algorithm Hash digest
SHA256 7ca1cb60945cf5fbb12cc08b41c98c699bb1435e5b4a365b72ab4b3e9387f04a
MD5 ed0bfc36faa6ba6a0f8b44a4acca51f6
BLAKE2b-256 b5a26cbff2184a1a4be32222d5bb18cfee1a7643dc134d52caaa99ae82a9caa2

See more details on using hashes here.

File details

Details for the file dhee-3.4.0-py3-none-any.whl.

File metadata

  • Download URL: dhee-3.4.0-py3-none-any.whl
  • Upload date:
  • Size: 541.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dhee-3.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b68f7278617e18fea46c22abb7828332bbabdde0bb825d505cb60bc1ea3c90a
MD5 64303fb58a5145052baa1b46c35ac483
BLAKE2b-256 d16d96d48d721e50c89895b1161423009fbd465e6ac32e4dabe328b493ed04b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page