Reusable memory runtime for AI agents

These details have not been verified by PyPI

Project links

Project description

agent-memory

A universal memory and knowledge runtime for AI agents.

agent-memory is an open-source memory layer for multi-agent and multi-harness systems. It is designed to work with Hermes, Codex-like runtimes, Claude-style runtimes, and any other agent harness that can emit events and call a retrieval API.

Important repository convention:

.dev/ contains AI-authored draft documents, design spikes, research notes, and unapproved plans.
docs/ is reserved for human-reviewed, promoted, approved documentation.

Product thesis

Most agent systems are still weak at memory because they treat memory as one of these:

raw session logs
a flat key-value note store
one-shot RAG over loosely related documents

agent-memory takes a different approach:

separate memory into working, episodic, semantic, and procedural layers
preserve provenance and confidence for every memory item
connect memories into a graph instead of only storing chunks
combine lexical search, graph traversal, metadata filters, and optional embedding recall
curate durable knowledge instead of stuffing every transcript into prompt context

Non-goals

replacing the host agent runtime
owning the user's entire wiki lifecycle
forcing one storage engine or one embedding vendor
pretending every transcript line is durable knowledge

Initial scope

Event ingestion from external harnesses
Memory normalization and storage
Retrieval API for prompt-time context
Curation lifecycle: raw -> candidate -> approved -> deprecated
Graph links between entities, episodes, concepts, tasks, and rules
Thin adapters for Hermes and other harnesses

CLI quick start

Current release posture:

npm is the shortest onboarding path for Hermes / Claude Code / Codex style CLI users
PyPI is the canonical Python runtime package for direct installs, CI, and power users

Chosen distribution names:

npm package: @cafitac/agent-memory
PyPI package: cafitac-agent-memory
installed CLI command on both surfaces: agent-memory

Shortest onboarding path:

npm install -g @cafitac/agent-memory
agent-memory bootstrap
agent-memory doctor

Fastest Hermes-oriented path:

install via npm
run agent-memory bootstrap
verify with agent-memory doctor
inspect installed hooks with hermes hooks list

The npm launcher is intentionally thin:

bootstrap maps to the Python CLI command hermes-bootstrap
doctor maps to the Python CLI command hermes-doctor
runtime resolution prefers AGENT_MEMORY_PYTHON_EXECUTABLE, then uvx, then pipx

Published install smoke recipes live in docs/install-smoke.md.

Alternative Python-first install paths:

pipx install cafitac-agent-memory
agent-memory bootstrap
agent-memory doctor

uv tool install cafitac-agent-memory
agent-memory bootstrap
agent-memory doctor

Source / development flow:

Initialize a SQLite memory database. For real use, prefer one global user-level database and let scopes/provenance separate projects:

uv run agent-memory init ~/.agent-memory/memory.db

If you want the shortest real Hermes onboarding path, hermes-bootstrap is the primary one-line command. It initializes the database if missing, writes or merges the Hermes hook config, and keeps existing Hermes hooks intact.

Fresh-install and upgrade notes:

brand-new Hermes users can run uv run agent-memory hermes-bootstrap first; if ~/.hermes/config.yaml does not exist yet, the command creates it with the agent-memory hook installed.
existing Hermes users with their own pre_llm_call / on_session_end hooks can also use hermes-bootstrap; the installer preserves the existing hook list and appends the agent-memory hook without rewriting the whole config.
after bootstrap, run uv run agent-memory hermes-doctor and hermes hooks doctor once to confirm the config path, hook install, and runtime allowlist state.
the first real Hermes run may still require --accept-hooks (or an interactive approval prompt) because Hermes itself will not fire unapproved shell hooks at runtime.

uv run agent-memory hermes-bootstrap

If you want a one-line health check for that setup:

uv run agent-memory hermes-doctor

If you want the same flow with explicit paths and budgets, hermes-install-hook remains available:

uv run agent-memory hermes-install-hook ~/.agent-memory/memory.db --config-path ~/.hermes/config.yaml --top-k 3 --max-prompt-lines 8 --max-prompt-chars 1200 --max-prompt-tokens 300 --max-alternatives 2 --timeout 12

For throwaway experiments, a temp database is fine:

uv run agent-memory init /tmp/agent-memory.db

Scope model:

user:default is the recommended durable default for memories that should travel with the user across projects and harnesses.
cwd:<hash> is used by the Hermes hook when no explicit --preferred-scope is provided. It is derived from the runtime cwd, but stores a hash instead of the raw folder path so local usernames and repository names do not leak into prompts or examples.
project:* / workspace:* scopes are still supported for explicit narrowing, but they are not the primary storage boundary.

Retrieve the raw MemoryPacket for a query:

uv run agent-memory retrieve ~/.agent-memory/memory.db "What does Project X use?" --preferred-scope user:default

Evaluate retrieval fixtures against the current retrieval path:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval

Or include a simple lexical baseline for side-by-side comparison:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --baseline-mode lexical

Or fail the command when any current task regresses:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --fail-on-regression

Or fail the command only when the current retrieval path is worse than the lexical baseline:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --baseline-mode lexical --fail-on-baseline-regression

Or compare against a source-linked lexical baseline that scores approved memories by the lexical overlap of their linked source content within the same preferred scope:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --baseline-mode source-lexical

Or compare against a source-linked lexical baseline that ignores preferred scope and lets cross-scope source evidence compete in the baseline ranking:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --baseline-mode source-global

Or compare against a lexical baseline that ignores preferred scope and lets cross-scope drift compete in the baseline ranking (this can make the baseline strictly worse than current retrieval, which is useful for drift-sensitive diagnostics but will not trip baseline-regression gates on its own):

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --baseline-mode lexical-global

Or fail the command only when the current retrieval path is worse than the lexical baseline for selected primary task types:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --baseline-mode lexical --fail-on-baseline-regression-memory-type facts

Or emit non-fatal soft-gate advisories when current regressions exceed a threshold:

uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --warn-on-regression-threshold 0
uv run agent-memory eval retrieval ~/.agent-memory/memory.db tests/fixtures/retrieval_eval --baseline-mode lexical --warn-on-baseline-regression-threshold 0

The retrieval evaluator accepts either one JSON fixture file or a fixture directory. Directory input is recursive, so fixture families can live under nested folders such as scope/, procedure/, drift/, staleness/, and episode/. Fixtures may use direct numeric IDs or top-level symbolic references that resolve against approved memories in the target database, which makes checked-in fixture families directly runnable from the CLI. Symbolic selectors now also support richer matching such as searchable_text_contains, step_contains, and tags_include when exact field equality is too brittle for checked-in fixtures. Each task may also carry optional human-authored rationale text and notes arrays; these are preserved verbatim in the JSON report so fixture reviews can explain why a hit matters without introducing LLM judging. The evaluator runs retrieve_memory_packet for each task and prints JSON with fixture paths, per-task rationale/notes, retrieved IDs, expected hits, missing expected IDs, avoid/drift hits, a derived per-task pass flag, any non-fatal soft-gate advisories, and an aggregate summary. Summary objects now also include top-level task counts (total_tasks, passed_tasks, failed_tasks), by_memory_type rollups for facts/procedures/episodes, and by_primary_task_type rollups keyed by each task's main target surface so regressions can be reviewed both by memory-slice participation and by per-task intent; the per-type summaries expose the same task counts plus hit/miss/avoid totals. With --baseline-mode lexical, the same output also includes per-task baseline metrics, per-task delta fields (expected_hit_delta, missing_expected_delta, avoid_hit_delta, pass_changed), plus baseline and delta summaries using a simpler lexical-only retrieval path scoped to the same preferred scope; --baseline-mode source-lexical keeps that preferred-scope restriction but scores approved memories by lexical overlap in their linked source content instead of normalized memory text; --baseline-mode source-global uses the same source-linked lexical scoring while ignoring preferred scope; and --baseline-mode lexical-global keeps normalized-text lexical scoring but ignores preferred scope. Soft-gate thresholds never change the per-task pass semantics or process exit code on their own; they only populate advisories when the observed current or baseline-relative regression count exceeds the requested threshold.

Export approved memories as a human-readable KB draft:

uv run agent-memory kb export ~/.agent-memory/memory.db ./kb-draft --scope user:default

The KB export writes markdown files for approved facts, procedures, and episodes. Candidate, disputed, and deprecated memories are intentionally excluded. Each exported memory includes its referenced source IDs; when source records exist, the markdown also includes source type, created timestamp, adapter/external reference, metadata, and a short source excerpt for human review. The CLI prints JSON with generated files, per-type counts, total exported items, and referenced source IDs. The SQLite database remains the source of truth; exported markdown is a reviewable artifact for humans and downstream wiki sync workflows.

Render a Hermes-consumable adapter context:

uv run agent-memory hermes-context ~/.agent-memory/memory.db "What does Project X use?" --preferred-scope user:default --top-k 3 --max-prompt-lines 8 --max-prompt-chars 1200 --max-prompt-tokens 300 --max-alternatives 2

The hermes-context output is JSON with:

context: HermesMemoryContext, including prompt_text, answer flags, blocking steps, and full adapter payload
outcome: null unless verification results are supplied

For Codex- or Claude-style CLI wrappers that just want a plain prompt string instead of the full JSON payload, use:

uv run agent-memory codex-prompt ~/.agent-memory/memory.db "What does Project X use?" --preferred-scope user:default --top-k 3 --max-prompt-lines 8 --max-prompt-chars 1200 --max-prompt-tokens 300 --max-alternatives 2
uv run agent-memory claude-prompt ~/.agent-memory/memory.db "What does Project X use?" --preferred-scope user:default --top-k 3 --max-prompt-lines 8 --max-prompt-chars 1200 --max-prompt-tokens 300 --max-alternatives 2

Both commands print only the rendered prompt text, including the normal response/verification guidance plus short snippets from the top retrieved facts, procedures, or episodes, so a wrapper can prepend it to the live user question before calling Codex or Claude Code.

If you want a reusable wrapper script instead of assembling the prompt yourself, use:

python scripts/run_codex_with_memory.py ~/.agent-memory/memory.db "What does Project X use?" --preferred-scope user:default --codex-model gpt-5.4-mini
python scripts/run_claude_with_memory.py ~/.agent-memory/memory.db "What does Project X use?" --preferred-scope user:default --max-turns 1

Both wrapper scripts call agent-memory codex-prompt / agent-memory claude-prompt internally, append the live user request, and then invoke the target CLI. Use --dry-run to inspect the final prompt and command without executing Codex or Claude Code. In this repository's current verified state, both wrappers have now been smoke-tested against the real target CLIs in this environment.

Apply harness-supplied verification results and print a HermesVerificationOutcome:

uv run agent-memory hermes-context ~/.agent-memory/memory.db "What does Project X use?" --verification-results-json '[{"step_action":"cross_check_hidden_alternatives","status":"passed","evidence_summary":"No approved alternative contradicted the primary memory.","target_memory_type":"fact","target_memory_id":1}]'

The CLI does not execute verification itself; it only applies result objects supplied by the calling harness.

Generate a mergeable Hermes hook config snippet without modifying any existing config file:

uv run agent-memory hermes-hook-config-snippet ~/.agent-memory/memory.db --top-k 3 --max-prompt-lines 8 --max-prompt-chars 1200 --max-prompt-tokens 300 --max-alternatives 2 --no-reason-codes

The snippet command only prints YAML. It does not read, write, or merge ~/.hermes/config.yaml.

Install the same hook explicitly into a Hermes config file. For the shortest onboarding flow, prefer uv run agent-memory hermes-bootstrap and only drop to hermes-install-hook when you want to pin explicit paths or budgets. hermes-bootstrap uses the same installer with user-level defaults:

uv run agent-memory hermes-bootstrap

The lower-level explicit form remains available:

uv run agent-memory hermes-install-hook ~/.agent-memory/memory.db --config-path ~/.hermes/config.yaml --top-k 3 --max-prompt-lines 8 --max-prompt-chars 1200 --max-prompt-tokens 300 --max-alternatives 2 --no-reason-codes

Recommended post-install verification for external users:

uv run agent-memory hermes-doctor ~/.agent-memory/memory.db --config-path ~/.hermes/config.yaml
hermes hooks list
hermes hooks doctor
# approve the hook on first real use if Hermes reports it is not allowlisted yet
hermes --accept-hooks chat -q 'Reply with OK only.' --quiet
hermes hooks test pre_llm_call

The bootstrap/install path has been smoke-tested both for a fresh config.yaml creation flow and for configs that already contain other Hermes shell hooks. If hermes hooks doctor still reports failures after bootstrap, they are usually pre-existing hook path/auth problems elsewhere in the user's Hermes setup rather than an agent-memory install failure.

Release and distribution notes

Current release surfaces in the repository:

Python package metadata in pyproject.toml
runtime module version in src/agent_memory/__init__.py
npm launcher metadata in package.json
release metadata checker in scripts/check_release_metadata.py
release-readiness smoke in scripts/smoke_release_readiness.py
GitHub Actions workflows in .github/workflows/ci.yml and .github/workflows/publish.yml
release checklist draft in .dev/release/release-checklist-v0.md

Release rule: keep the Python package version, npm package version, and module __version__ identical. CI and publish workflows validate that sync before building artifacts. The Python distribution name and npm distribution name differ intentionally (cafitac-agent-memory vs @cafitac/agent-memory), but both must point at the same runtime version. The publish workflow now also creates a GitHub Release on tag-driven runs after the package publishes finish. The explicit gate for switching the README to true npm-first quickstart is now documented in .dev/release/release-checklist-v0.md.

First publish checklist summary:

confirm GitHub Actions has NPM_TOKEN
confirm PyPI trusted publishing is enabled for this repository, or set PYPI_API_TOKEN in GitHub Actions secrets as the fallback path
run uv run python scripts/check_release_metadata.py
run uv run pytest tests/ -q
run uv run python scripts/smoke_release_readiness.py
run uvx --from build python -m build
run npm pack --dry-run
push a vX.Y.Z tag or trigger publish.yml manually

Recommended post-publish smoke on a clean machine/session:

npm install -g @cafitac/agent-memory
agent-memory bootstrap
agent-memory doctor

hermes-install-hook is intentionally conservative. It creates a missing config, initializes a missing database, backs up changed existing config files to *.agent-memory.bak, and no-ops if the hook command is already installed. hermes-bootstrap is just the one-line convenience wrapper over the same behavior with recommended defaults. hermes-doctor is the matching read-only validator: it checks whether the DB exists, whether the Hermes config exists, whether the hook command is present, and prints the exact one-line bootstrap command to run when setup is incomplete. If a top-level hooks: block already exists, the installer performs a simple structured merge: it preserves existing hook events, appends the agent-memory command to an existing pre_llm_call: list, or creates pre_llm_call: under hooks: when missing. After installing, validate with hermes hooks list, then run Hermes with hook consent enabled (for example hermes --accept-hooks ...) or approve the hook through Hermes's normal shell-hook consent flow. The merge is text-based and intended for ordinary Hermes YAML config; for unusual YAML anchors or multiline hook definitions, inspect the backup and generated snippet before relying on it.

Use agent-memory directly from a Hermes pre_llm_call shell hook:

hooks:
  pre_llm_call:
    - command: "uv run agent-memory hermes-pre-llm-hook ~/.agent-memory/memory.db --top-k 3 --max-prompt-lines 8 --max-prompt-chars 1200 --max-prompt-tokens 300"
      timeout: 10

Hermes passes a JSON hook payload on stdin. hermes-pre-llm-hook reads extra.user_message, retrieves memory, and prints either:

{"context":"<agent_memory_context>...rendered memory context...</agent_memory_context>"}

or {} for unsupported/non-pre_llm_call payloads. Hermes injects the returned context into the current user message as ephemeral context; it is not written back to Hermes session storage.

When --preferred-scope is omitted in a Hermes hook, agent-memory derives a privacy-preserving cwd:<hash> preferred scope from the hook payload's cwd. This makes one global user database behave differently per folder/project without embedding raw local paths in prompt context.

Prompt budgets are renderer-level and do not mutate the full adapter payload. --max-prompt-tokens is an approximate local estimate (ceil(rendered_chars / 4)) that preserves whole rendered lines; combine it with --max-prompt-chars when you want both model-ish and hard character caps.

Draft design documents

.dev/product/thesis-and-scope.md
.dev/architecture/architecture-v0.md
.dev/architecture/graph-vs-hybrid-retrieval.md
.dev/roadmap/roadmap-v0.md
.dev/research/brain-and-llm-memory-notes.md

Core idea

RAG is part of the story, but not the whole story.

The long-term goal is not just "retrieve similar text chunks". The long-term goal is memory that behaves more like a connected system:

an event can become an episode
an episode can produce facts
facts can update entities and concepts
entities can be linked by relations
repeated successful behaviors can become procedural memory
retrieval can walk these links and rank by relevance, recency, confidence, and task fit

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.14

Apr 30, 2026

0.1.13

Apr 29, 2026

0.1.12

Apr 29, 2026

0.1.11

Apr 29, 2026

This version

0.1.10

Apr 29, 2026

0.1.9

Apr 29, 2026

0.1.8

Apr 28, 2026

0.1.7

Apr 28, 2026

0.1.6

Apr 28, 2026

0.1.5

Apr 28, 2026

0.1.4

Apr 28, 2026

0.1.3

Apr 28, 2026

0.1.2

Apr 28, 2026

0.1.1

Apr 28, 2026

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cafitac_agent_memory-0.1.10.tar.gz (73.7 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cafitac_agent_memory-0.1.10-py3-none-any.whl (44.2 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file cafitac_agent_memory-0.1.10.tar.gz.

File metadata

Download URL: cafitac_agent_memory-0.1.10.tar.gz
Upload date: Apr 29, 2026
Size: 73.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cafitac_agent_memory-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`5169270ac595e8f10e29dcc688712cf3ddd9a19e7aa5e7512c027f0d2f3a0696`
MD5	`c0b3bfc3e02cf6ce504ebbf6e95a68a5`
BLAKE2b-256	`a5258ccb123becab14bbc1c09e9529d816bc260ed0ec926cac839c5422b5a8e5`

See more details on using hashes here.

File details

Details for the file cafitac_agent_memory-0.1.10-py3-none-any.whl.

File metadata

Download URL: cafitac_agent_memory-0.1.10-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 44.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cafitac_agent_memory-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`064655dea82eba0c39d2ef95b8fde7d425ec0d99d1bacffdc075a272a3f052a2`
MD5	`d0b2955d921ac84a43e488f2f5b7581e`
BLAKE2b-256	`aee442e5547d3877b766852f7b11f31d5d41b84082a6c578caca5374c1d08272`

See more details on using hashes here.

cafitac-agent-memory 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-memory

Product thesis

Non-goals

Initial scope

CLI quick start

Release and distribution notes

Draft design documents

Core idea

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes