dhee

Cognition layer for AI agents — persistent memory, performance tracking, and insight synthesis

These details have not been verified by PyPI

Project links

Project description

Dhee

Stop burning tokens on context your agent doesn't need this turn.

Keep your CLAUDE.md, your skills, your AGENTS.md — exactly as they are.
Dhee selects what's relevant per prompt and injects only that.

The Problem

Every AI coding agent today dumps your entire CLAUDE.md into every LLM call. 200 lines of project rules, coding conventions, testing guidelines — loaded into the prompt whether you're running tests or writing a docstring. Every turn. Full price.

Over a 20-turn session on Opus, that's 40,000+ tokens of mostly-irrelevant context. You're paying for the model to read your git commit conventions while it's fixing an auth bug.

And it gets worse over time. After 6 months, your CLAUDE.md is 500 lines. Your skills directory has 12 files. Your AGENTS.md has grown. But the agent still loads all of it, every turn, at full token cost. The markdown files that were supposed to make your agent smarter are now your biggest line item.

How Dhee Fixes It

Before Dhee:  CLAUDE.md (2000 tokens) → loaded every turn → 40K tokens/session
With Dhee:    CLAUDE.md → chunked + vectorized → ~300 tokens of relevant rules per turn

Dhee sits between your documentation and the LLM. It chunks your markdown files into heading-scoped pieces, embeds them once, and on each prompt selects only the chunks that match what the user is actually asking about.

"How do I run the tests?" → Dhee injects your Testing Guidelines section (292 tokens), not your entire CLAUDE.md (2000 tokens). 67% reduction, zero information loss.

"Explain dark matter to me" → Dhee injects nothing. No project docs are relevant. 100% reduction.

Your files stay exactly where they are. You maintain them the same way. Dhee just makes the delivery intelligent.

Quick Start

One command. No venv. No config. No pasting into settings.json.

curl -fsSL https://raw.githubusercontent.com/Sankhya-AI/Dhee/main/install.sh | sh

That's it. The installer creates ~/.dhee with a hidden venv, installs the dhee package, and wires Claude Code hooks automatically. Next time you open Claude Code in any project, cognition is on.

Other install options

Via pip (if you manage your own venv):

pip install dhee
dhee install       # configure Claude Code hooks

From source (contributors):

git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee
./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
dhee install

After install, Dhee auto-ingests project docs (CLAUDE.md, AGENTS.md, etc.) on the first session. Run dhee ingest manually any time to re-chunk.

The Lifecycle

Dhee manages information through a complete lifecycle — not just storage and retrieval, but learning, decay, and promotion.

                        ┌─────────────────────────┐
                        │   Your Documentation     │
                        │   CLAUDE.md, AGENTS.md   │
                        │   SKILL.md, etc.         │
                        └──────────┬──────────────┘
                                   │
                            dhee ingest
                          (chunk + embed)
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────┐
│                     Dhee Vector Store                         │
│                                                              │
│  Doc Chunks (high strength, heading-scoped)                  │
│  Short-term memories (facts, file edits, failures)           │
│  Long-term memories (promoted by access + strength)          │
│  Typed cognition (insights, beliefs, policies, intentions)   │
└──────────────────────────────────────────────────────────────┘
                                   │
                        ┌──────────┴──────────┐
                        │                     │
                   Session Start         Each Prompt
                   (full assembly)     (doc chunks only)
                        │                     │
                        ▼                     ▼
              ┌─────────────────┐   ┌─────────────────┐
              │ Relevant docs   │   │ Matching rules   │
              │ + insights      │   │ above threshold  │
              │ + performance   │   │ or nothing       │
              │ + warnings      │   │                  │
              └────────┬────────┘   └────────┬────────┘
                       │                     │
                       ▼                     ▼
              ┌─────────────────────────────────────┐
              │  Token-budgeted XML injection        │
              │  <dhee>                              │
              │    <r s="0.83">Always run pytest...│
              │    </r>                              │
              │  </dhee>                             │
              └─────────────────────────────────────┘
                                   │
                            LLM sees only
                         what it needs this turn

During the session

Event	What Dhee does	Token cost
Session opens	Auto-ingests stale docs, assembles relevant doc chunks + typed cognition	~300-900 tokens (vs 2000+ for raw files)
Each user prompt	Searches doc chunks for THIS specific question. Injects matching rules above threshold.	0-300 tokens (0 when nothing matches)
Tool use (Edit/Write)	Records which files were touched (for session context)	0 tokens (storage only)
Tool failure (Bash)	Stores the failure + error message as a learnable signal	0 tokens (storage only)
Session ends	Checkpoints outcomes, what worked/failed → becomes insights for next session	0 tokens (storage only)

Between sessions

Phase	What happens
Short-term memory	Facts from the session sit in SML with natural decay
Promotion	Frequently-accessed memories promote to long-term (LML) automatically
Decay	Unused memories fade on an Ebbinghaus curve. Your 50th memory costs the same as your 50,000th.
Insight synthesis	`what_worked` / `what_failed` from checkpoints become transferable learnings
Intentions	"Remember to run auth tests after login.py changes" fires when the trigger matches

The result

Your documentation stays fat and thorough — that's your team's knowledge base. But the LLM only sees the slice it needs, when it needs it. After a year, you have 50 files and 10,000 memories. The per-turn injection is still ~300 tokens.

Why Not Just CLAUDE.md?

Markdown files work great at first. 50 lines, manually curated, loaded fresh every session. But they don't scale:

	Markdown files	Dhee
Token cost	Linear with file size. 500 lines = 5000 tokens every turn.	Constant ~300 tokens regardless of total knowledge.
Relevance	Everything loaded, always. Git commit rules injected while fixing auth.	Only matching chunks. Off-topic turns cost 0 tokens.
Staleness	Equal weight forever. A rule from 6 months ago sits next to today's.	Natural decay. Unused knowledge fades. Fresh knowledge surfaces.
Scale	Hits context limits. You start deleting old rules to make room.	50,000 memories, same injection cost as 50.
Learning	Static. Agent makes the same mistakes next session.	Captures what worked/failed. Synthesizes transferable insights.
Cross-session	Cold start every time unless you manually update the file.	Session handoff, performance trends, prospective memory.

Dhee doesn't replace your markdown files. It makes them work at scale. Keep writing CLAUDE.md the way you always have. Dhee handles the delivery.

The 4-Operation API

Every interface — hooks, MCP, Python, CLI — exposes the same 4 operations.

`remember(content)` — Store a fact

0 LLM calls, 1 embedding (~$0.0002). Stored immediately. Echo enrichment (paraphrases, keywords for better recall) runs at checkpoint.

d.remember("User prefers FastAPI over Flask")

`recall(query)` — Search memory

0 LLM calls, 1 embedding. Pure vector search with echo-boosted re-ranking.

results = d.recall("what framework does the project use?")
# [{"memory": "User prefers FastAPI over Flask", "score": 0.94}]

`context(task_description)` — Session bootstrap

Returns everything the agent needs: last session state, performance trends, insights, intentions, warnings, and relevant memories.

ctx = d.context("fixing the auth bug")
# ctx["insights"] → [{"content": "What worked: git blame → found breaking commit"}]
# ctx["warnings"] → ["Performance on 'bug_fix' declining"]

`checkpoint(summary, ...)` — End-of-session cognition

Where the learning happens. 1 LLM call per ~10 memories.

d.checkpoint(
    "Fixed auth bug",
    what_worked="git blame showed the exact breaking commit",
    what_failed="grep was too slow on the monorepo",
    outcome_score=1.0,
)

Integration

Claude Code — Native Hooks

pip install dhee
dhee install    # installs lifecycle hooks
dhee ingest     # chunks project docs into vector memory

That's it. Six hooks fire automatically at the right moments. No SKILL.md, no plugin directories. The agent doesn't even know Dhee is there — it just gets better context.

MCP Server (Claude Code, Cursor, any MCP client)

{
  "mcpServers": {
    "dhee": { "command": "dhee-mcp" }
  }
}

4 tools exposed. The agent uses them as needed.

Python SDK

from dhee import Dhee

d = Dhee()
d.remember("User prefers dark mode")
d.recall("what theme does the user like?")
d.context("fixing auth bug")
d.checkpoint("Fixed it", what_worked="git blame first")

CLI

dhee remember "User prefers Python"
dhee recall "programming language"
dhee ingest CLAUDE.md AGENTS.md    # chunk specific files
dhee ingest                        # auto-scan project
dhee docs                          # show ingested manifest
dhee checkpoint "Fixed auth bug" --what-worked "checked logs"
dhee install                       # install Claude Code hooks
dhee uninstall-hooks               # remove them

Docker

docker compose up -d   # uses OPENAI_API_KEY from env

Cost

Operation	LLM calls	Embed calls	Cost
`remember`	0	1	~$0.0002
`recall`	0	1	~$0.0002
`context`	0	0-1	~$0.0002
`checkpoint`	1 per ~10 memories	0	~$0.001
Typical session	1	~15	~$0.004

The Dhee overhead per session is ~$0.004. The token savings from selective injection on a 20-turn Opus session are ~$0.50+. >100x ROI.

Under the Hood

Memory Store (Engram)

SQLite + vector index. On the hot path (remember/recall), zero LLM calls — just embedding. At checkpoint, unified enrichment runs in one batched LLM call:

Echo encoding — paraphrases, keywords, question-forms so "User likes dark mode" matches "what theme?"
Category inference — auto-tags for filtering
Strength-based decay — Ebbinghaus curve. Frequently accessed → promoted to long-term. Unused → fades.

Cognition Engine (Buddhi)

Parallel intelligence layer that builds meta-knowledge from the memory pipeline:

Performance tracking — outcomes per task type, trend detection, regression warnings
Insight synthesis — causal hypotheses from what worked/failed, with confidence scores
Prospective memory — future triggers with keyword matching
Belief store — confidence-tracked facts with contradiction detection (experimental)
Policy store — condition→action rules from task completions (experimental)

Zero LLM calls on hot path. Pure pattern matching + statistics.

Doc Pipeline (v3.3.1)

Chunker — heading-scoped splits that respect code fences, paragraph boundaries, size limits
Ingest — SHA-tracked. Re-ingesting unchanged files is a no-op. Changed files get atomic chunk replacement.
Assembler — vector similarity search filtered by kind=doc_chunk, score threshold, token budget
Renderer — Caveman-compressed XML: <dhee><r s="0.83">...</r></dhee>. No header, no wrapper tags, no indentation — every byte earns its place. ~40% fewer structural tokens vs v3.3.

Provider Options

pip install dhee[openai,mcp]     # OpenAI (recommended, cheapest embeddings)
pip install dhee[gemini,mcp]     # Google Gemini
pip install dhee[ollama,mcp]     # Ollama (local, no API costs)

Contributing

git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee
./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
pytest    # 978 tests

Your docs stay fat. Your token bill stays thin.

GitHub · PyPI · Issues

MIT License — Sankhya AI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

6.1.0

May 8, 2026

6.0.2

Apr 24, 2026

6.0.1

Apr 24, 2026

6.0.0

Apr 24, 2026

5.0.0

Apr 20, 2026

4.0.0

Apr 18, 2026

This version

3.4.0

Apr 16, 2026

3.3.0

Apr 16, 2026

3.2.0

Apr 6, 2026

3.1.0

Apr 4, 2026

3.0.1

Apr 1, 2026

3.0.0

Apr 1, 2026

2.0.0

Mar 30, 2026

1.0.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dhee-3.4.0.tar.gz (599.7 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dhee-3.4.0-py3-none-any.whl (541.8 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file dhee-3.4.0.tar.gz.

File metadata

Download URL: dhee-3.4.0.tar.gz
Upload date: Apr 16, 2026
Size: 599.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dhee-3.4.0.tar.gz
Algorithm	Hash digest
SHA256	`7ca1cb60945cf5fbb12cc08b41c98c699bb1435e5b4a365b72ab4b3e9387f04a`
MD5	`ed0bfc36faa6ba6a0f8b44a4acca51f6`
BLAKE2b-256	`b5a26cbff2184a1a4be32222d5bb18cfee1a7643dc134d52caaa99ae82a9caa2`

See more details on using hashes here.

File details

Details for the file dhee-3.4.0-py3-none-any.whl.

File metadata

Download URL: dhee-3.4.0-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 541.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dhee-3.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b68f7278617e18fea46c22abb7828332bbabdde0bb825d505cb60bc1ea3c90a`
MD5	`64303fb58a5145052baa1b46c35ac483`
BLAKE2b-256	`d16d96d48d721e50c89895b1161423009fbd465e6ac32e4dabe328b493ed04b1`

See more details on using hashes here.

dhee 3.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Stop burning tokens on context your agent doesn't need this turn.

The Problem

How Dhee Fixes It

Quick Start

The Lifecycle

During the session

Between sessions

The result

Why Not Just CLAUDE.md?

The 4-Operation API

remember(content) — Store a fact

recall(query) — Search memory

context(task_description) — Session bootstrap

checkpoint(summary, ...) — End-of-session cognition

Integration

Claude Code — Native Hooks

MCP Server (Claude Code, Cursor, any MCP client)

Python SDK

CLI

Docker

Cost

Under the Hood

Memory Store (Engram)

Cognition Engine (Buddhi)

Doc Pipeline (v3.3.1)

Provider Options

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`remember(content)` — Store a fact

`recall(query)` — Search memory

`context(task_description)` — Session bootstrap

`checkpoint(summary, ...)` — End-of-session cognition