Capture, analyze, and mine Codex / Claude Code rollouts into durable local memory.
Project description
retro
Capture Codex and Claude Code rollouts into durable local artifacts, evaluate them with signals, mine them into prompt-time memory, and report on cost and behavior via a static dashboard. Local-first, no cloud, evidence-linked.
Project wiki and onboarding guides: https://sajjadgg.github.io/retro/
This project answers two questions that "another memory store" doesn't:
- How did this rollout actually go? (signals: activity, outcome, cost, risk)
- What can the next session reuse from it? (mining: skills, procedures, failure triggers)
The full design is split across specs in specs/. The headline ones:
full_rollout_capture_feature_spec.md— the capture contract.rollout_signals_spec.md— what signals are and how they aggregate.rollout_mining_methods.md— the catalog of mining methods.memory_storage_backend_spec.md— SQLite memory index over flat files.rollout_dashboard_spec.md— the dashboard.ccusage_comparison_spec.md— gap analysis vs. ccusage.
Table of contents
- Install
- Architecture at a glance
- Storage layout
- Capture
- Signals
- Mining
- Memory backend
- Dashboard
- Pricing snapshot
- Release and publishing
- Environment
- Notes and guarantees
Install
Requires Python ≥ 3.10.
From PyPI:
pip install retro-agent-memory
That installs the retro CLI.
From a clone:
python3.13 -m venv .venv
.venv/bin/pip install -e .
The PyPI distribution is named retro-agent-memory because retro is already occupied on PyPI; the import package and CLI are still named retro.
Architecture at a glance
┌──────────────────────────────────────────────────────────┐
│ retro CLI │
├──────────────┬─────────────┬──────────────┬───────────┤
~/.claude ──▶ │ importers │ signals │ mining │ memory │
~/.codex ──▶ │ (claude, │ (heuristic │ (4 methods, │ index │
│ codex) │ + external)│ + filters) │ + weave │
└──────┬───────┴──────┬──────┴───────┬──────┴─────┬─────┘
▼ ▼ ▼ ▼
raw/ normalized/ mined/ memories/
(immutable) (events.jsonl) (per method) (files + SQLite)
│
▼
rendered/ ← markdown view (any time)
│
▼
dashboard/build_dashboard.py
(static HTML + KPIs + per-model cost + signals + drill-down)
Each stage is independent — you can re-render markdown, recompute signals, re-mine, rebuild the memory index, or rebuild the dashboard from whatever rollout-memory/ already has on disk.
Storage layout
Artifacts land in ./rollout-memory/ by default:
rollout-memory/
raw/<host>/<session-id>/ # immutable copies of source jsonl + metadata + sidecars
transcript.jsonl (claude)
rollout.jsonl (codex)
import_meta.json (claude) # source path, project_slug, claude_home, …
thread.json (codex) # cwd, model_provider, source_kind, spawn_edges, …
sidecars/ (claude todos/tasks if present)
normalized/<host>/<session-id>.events.jsonl # one normalized event per line
rendered/<host>/<session-id>.md # human-readable markdown view
mined/<method>/<host>/<session-id>.json # mined candidates (structured)
mined/<method>/<host>/<session-id>.prompt.md # paste-ready prompt block
signals/
readings.jsonl # one row per (session, signal)
aggregates.json # rolled-up stats per signal
summary.md # human view
memories/
items.jsonl # canonical appended memory records
events.jsonl # memory lifecycle / utility log
index.sqlite # derived SQLite + FTS index, rebuildable
raw/ is treated as immutable: re-importing the same session refuses unless --force is passed.
Capture
Discovery merges both Claude default roots and honors comma-separated overrides — same behavior as ccusage.
| Host | Default roots | Env override |
|---|---|---|
| Claude Code | ~/.claude/projects/ and ~/.config/claude/projects/ |
CLAUDE_CONFIG_DIR |
| Codex | ~/.codex (uses state_5.sqlite.threads.rollout_path) |
CODEX_HOME |
Codex roots are auto-classified:
- sqlite_home: has
state_5.sqlite→ discovery via SQLite (preserves spawn-edge graph). - sessions_dir: has a
sessions/subdir but no SQLite → scan*.jsonl. - jsonl_dir: neither → treat the directory as a flat JSONL bag (e.g.
codex exec --jsonoutput).
Commands
# Discover what's available on this machine
retro list # both hosts, with retention warning for old Claude logs
retro list --host claude
retro list --host codex
retro list --limit 50
# Import a single session
retro import claude --session-id <session-id>
retro import codex --thread-id <thread-id>
retro import claude --latest # most-recent
retro import codex --latest
# Import every discoverable session
retro import all # both hosts
retro import claude --all
retro import codex --all
retro import all --limit-per-host 20
# Force overwrite an existing capture
retro import claude --latest --force
# Skip the markdown re-render step
retro import codex --latest --no-render
# Re-render markdown from an already-imported normalized stream
retro render claude <session-id>
retro render codex <thread-id>
# Inspect what got captured
retro show claude <session-id>
retro show codex <thread-id>
Multi-root example
# Pull in Claude logs from two locations:
CLAUDE_CONFIG_DIR="$HOME/.claude,$HOME/.config/claude,/backup/claude-archive" retro import claude --all
# Codex saved via `codex exec --json` (no SQLite present):
CODEX_HOME="$HOME/.codex,$HOME/codex-exec-logs" retro import codex --all
Retention warning
When the oldest discoverable Claude transcript is more than 25 days old, retro list prints a yellow warning. Claude Code retains logs for ~30 days by default — capture older sessions before they age out, or raise cleanupPeriodDays in Claude settings.
Signals
Signals are evaluators that observe a captured rollout and emit a comparable reading (numeric / boolean / categorical). They answer: "how did this rollout go, by some objective measure?"
Signals are grouped by intent, not implementation:
| Group | Question answered | Examples |
|---|---|---|
activity |
What did the agent do? | command_count, unique_files_edited, web_search_count |
outcome |
Did the work land? | git_commits_made_during, failed_command_ratio, user_satisfaction_lexical |
cost |
What did it cost (effort/tokens/time)? | session_duration_seconds, user_correction_count, time_to_first_edit_seconds |
risk |
How trustworthy was the capture/work? | unknown_event_count, capture_gap_signal, secret_exposure_signal |
Methods: heuristic (pure event walk), regex (text patterns), external (touches the world — e.g. git log). llm_judge is reserved for later.
Commands
# Inventory of registered signals
retro signal list
retro signal list --group outcome
# Compute signals for every imported session
retro signal run
# Restrict scope
retro signal run --host claude
retro signal run --host codex --session-id <id>
retro signal run --signal git_commits_made_during,failed_command_ratio
# Inspect readings for one session
retro signal show claude <session-id>
retro signal show codex <thread-id>
Output
rollout-memory/signals/readings.jsonl— one row per(session, signal)reading.rollout-memory/signals/aggregates.json— per-signal cross-session aggregates (mean / median / p90 / sum / histogram).rollout-memory/signals/summary.md— human view.
Each reading carries evidence_refs (event_ids), confidence, and a method tag so it can be traced back to the rollout. External signals (e.g. git_commits_made_during) emit a null reading with a reason field when their context is missing (cwd gone, not a git repo) rather than silently skipping.
Adding a new signal
from retro.signals.base import SessionContext, reading, register
@register("my_signal", group="outcome", kind="boolean",
method="heuristic", description="…")
def _my_signal(ctx: SessionContext):
return reading(ctx, _my_signal, value=True, evidence=[some_event])
retro signal list/run/show and the dashboard pick it up automatically.
Mining
Mining turns captured rollouts into prompt-time memory candidates — paste-ready blocks for the next session. Multiple methods are registered; each produces the same MemoryCandidate shape so they compose. Filters run after methods to prune unsafe / redundant candidates.
Registered methods
| Method | What it produces | Source paper |
|---|---|---|
reme_refine_poc |
Success patterns, failure triggers, tool lessons, risk rules. | ReMe |
skill_pro |
Reusable skills with explicit activation / steps / termination / verification blocks. |
Skill-Pro |
memp_procedural |
Whole-session procedural memory: goal / preconditions / steps / warnings / outcome. |
Memp |
codex_headless |
LLM-backed extraction through codex exec --json; saves the headless Codex JSONL capture. |
local Codex CLI |
Registered filters
| Filter | What it does |
|---|---|
risk_aware |
Drops risk=high and low-confidence candidates, dedupes near-duplicates, caps at 8 items. |
The default methods are deterministic and local. codex_headless is opt-in and calls codex exec --json on a compact, redacted rollout summary. It writes the Codex headless stream to rollout-memory/headless/codex_headless/<host>/<session-id>.codex.jsonl.
Commands
# Inventory
retro methods
# Single method, single session (default method is reme_refine_poc)
retro mine codex <thread-id>
retro mine claude <session-id> --method skill_pro
retro mine codex <thread-id> --method memp_procedural
retro mine codex <thread-id> --method codex_headless
# Run every registered method on one session
retro mine codex <thread-id> --method all
# Apply one or more filters after mining
retro mine codex <thread-id> --method skill_pro --filter risk_aware
retro mine codex <thread-id> --method all --filter risk_aware
# Bulk mine
retro mine '*' '*' --method all --filter risk_aware
retro mine codex '*' --method skill_pro
retro mine claude '*' --method memp_procedural --all
Output
For each (method, host, session_id):
rollout-memory/mined/<method>/<host>/<id>.json— full structured result (candidates, structured payloads, evidence refs, notes, filters applied).rollout-memory/mined/<method>/<host>/<id>.prompt.md— paste-ready memory block tagged with<retro method="…">, sorted by priority and confidence.
Example skill_pro output
Skill: investigate and patch a bug
Activation: gi ahead and fix the issues found here
Steps:
1. TaskCreate (Scan both ~/.claude/projects/ and ~/.config/claude/projects/…)
2. read file (file_path=~/…/retro/importers/claude.py)
3. edit file (file_path=~/…/retro/importers/claude.py)
…
Termination: after writing ~/…/Mem/specs/ccusage_comparison_spec.md
Verification: (no explicit verification observed)
Adding a new method
# src/retro/mining/methods/my_method.py
from ..base import MiningContext, MiningResult, MemoryCandidate, register_method, memory_id
@register_method("my_method", description="…")
def mine_my_method(ctx: MiningContext) -> MiningResult:
candidates = [
MemoryCandidate(
id=memory_id(ctx.session_id, "my_method", 1),
method="my_method",
kind="procedure",
text="…",
when_to_use="…",
evidence_refs=[ev.event_id for ev in ctx.events[:3]],
confidence=0.6,
priority=4,
)
]
return MiningResult(
session_id=ctx.session_id, host=ctx.host, method="my_method",
task_summary=ctx.task_summary(), candidates=candidates,
)
Import it in mining/methods/__init__.py. retro methods and --method pick it up.
Memory backend
The memory backend turns mined and authored memories into a local SQLite index. Flat files remain the source of truth; index.sqlite is a derived cache that can be deleted and rebuilt.
The index currently supports:
- SQLite schema bootstrap with FTS5 keyword recall.
- Reindexing from
rollout-memory/memories/items.jsonland existing mined artifacts. [[wiki-link]]edge extraction and one-hop linked-memory expansion.- Authored markdown import with simple frontmatter.
- Keyword retrieval with scope/repo filtering and value-aware reranking.
- Utility updates through
memories/events.jsonl. - Prompt-time context weaving.
- Dashboard counts for indexed memory, status/scope/kind, top utility, and lifecycle events.
Commands
# Create rollout-memory/memories/ and an empty index
retro memory init
# Rebuild index.sqlite from files and mined artifacts
retro memory reindex
# Inspect index counts and dangling wiki-links
retro memory doctor
# Import hand-authored markdown memories
retro memory import-authored ~/path/to/memory-notes
# Retrieve ranked memories; include candidates by default while promotion matures
retro memory retrieve --query "pytest retrieval" --cwd /path/to/repo
retro memory retrieve --query "pytest retrieval" --cwd /path/to/repo --accepted-only
# Emit a compact prompt block
retro memory weave --query "pytest retrieval" --cwd /path/to/repo
# Update observed utility after a memory is used
retro memory update-utility --memory-id <id> --reward 0.8 --session-id <session-id>
Authored markdown
Authored memories are normal markdown files. Frontmatter is optional; missing fields get conservative defaults.
---
id: pytest-policy
kind: tool_lesson
scope: global
status: accepted
risk: low
when_to_use: Use when editing tests.
---
Run pytest after changing retrieval. Link to [[debugging-policy]] when relevant.
Accepted memories that contain prompt-injection markers, credential-looking strings, or invisible control characters are downgraded to needs_review on write.
Optional embeddings are still intentionally out of the core path for now; keyword retrieval works with no native dependencies.
Dashboard
A static, local HTML dashboard reads everything under rollout-memory/ and produces:
- KPI strip (sessions, days, events, tool calls, tokens, est. cost).
- Signals panel (color-coded by group, per-signal aggregates across sessions).
- Activity-by-day bars.
- Searchable / filterable session table.
- Per-session drill-down with tabs: Summary, Models, Signals, Transcript, Memory.
- Indexed memory sections for counts, top utility, lifecycle events, and source-session drill-downs when
retro memory reindexhas been run.
The Models tab shows the per-model token + cost breakdown (input / cache_create / cache_read / output / total / cost) for each session.
Preview
Build
# Default mode: auto (use embedded cost when present, else compute from tokens)
retro dashboard build
# Force recomputation from token counts
retro dashboard build --mode calculate
# Show only provider-embedded costUSD; None when missing
retro dashboard build --mode display
# Direct script still works, including via env var
RETRO_COST_MODE=calculate python dashboard/build_dashboard.py
Then open dashboard/index.html directly from disk.
Cost computation
- LiteLLM pricing snapshot at
dashboard/pricing/litellm-pricing.jsonis the authoritative source for per-model rates (USD per token). The hardcodedDEFAULT_RATEStable is the fallback for any model not in the snapshot. - Per-model attribution: each token delta is tagged with the active model (
turn_context.modelfor Codex, per-messagemodelfor Claude). Multi-model sessions produce per-model rows. - Codex token deltas prefer the explicit
info.last_token_usageper-turn delta when present, falling back to cumulative subtraction for older Codex builds. - Cached-input handling follows ccusage's behavior: subtract for Codex (its
input_tokensis cumulative including cache), don't subtract for Anthropic (itsinput_tokensis already fresh non-cached). - Cost source labeling: each session records whether its cost was
embedded(provider-supplied) orcalculated(token math).
Numbers are always estimates, never billing truth.
Pricing snapshot
The dashboard pulls rates from dashboard/pricing/litellm-pricing.json — a curated subset of LiteLLM's model_prices_and_context_window.json. Refresh it with:
python dashboard/pricing/refresh.py
The refresh script:
- Reads the existing snapshot's model list (top-level keys).
- Downloads LiteLLM's upstream JSON.
- Replaces pricing fields for known models from upstream; leaves the local set otherwise.
- Updates
_meta.snapshot_taken.
To add a model: add an empty entry under its name in the snapshot, then re-run refresh.py.
Release and publishing
CI lives in .github/workflows/ci.yml. It runs on pushes and pull requests to main, installs the package across Python 3.10-3.13, compiles src/retro, smoke-tests the CLI, builds the wheel/sdist, and validates them with twine check.
Publishing lives in .github/workflows/publish.yml. It runs when a GitHub release is published, builds the package, publishes it to PyPI with trusted publishing, and attaches the wheel/sdist to the GitHub release.
One-time PyPI setup:
- Create or claim the PyPI project
retro-agent-memory. - Add a trusted publisher for repository
sajjadGG/retro. - Use workflow
.github/workflows/publish.yml. - Use environment
pypi.
Release flow:
# bump version in pyproject.toml first
git tag v0.1.0
git push origin v0.1.0
Then publish a GitHub release for that tag. The release event triggers the PyPI publish job.
Environment
| Variable | Effect |
|---|---|
CLAUDE_CONFIG_DIR |
Override Claude data roots (comma-separated). Wins over the default ~/.claude + ~/.config/claude. |
CODEX_HOME |
Override Codex roots (comma-separated). Each root is classified into sqlite_home/sessions_dir/jsonl_dir. |
RETRO_COST_MODE |
Default --mode for the dashboard builder. One of auto, calculate, display. |
Notes and guarantees
- Local-only. Nothing is uploaded.
- Raw is immutable. Re-imports refuse to clobber
raw/unless--forceis passed. - Capture gaps are reported, not hidden. Unknown event types become
unknownnormalized events with the original payload preserved; counts surface inretro show, theunknown_event_countsignal, and the dashboard. - Everything is evidence-linked. Signal readings and mined memories carry the
event_ids they were derived from, so the dashboard can trace any claim back to the source rollout. - Composable. Capture, signals, mining, and the dashboard each consume the layer below from disk. Re-run any stage independently.
- Designed beyond ccusage on purpose. ccusage reports tokens and costs. This project captures the whole rollout, evaluates it with signals, and turns it into reusable behavioral memory. See
specs/ccusage_comparison_spec.mdfor the full comparison.
Project layout
src/retro/
schema.py # NormalizedEvent + read/write helpers
storage.py # artifact layout
cli.py # Typer CLI (`retro`)
renderer.py # Markdown view
importers/
claude.py # Claude Code transcript reader (multi-root + env override)
codex.py # Codex SQLite + JSONL reader (multi-root)
signals/
base.py, runner.py, heuristics.py, external.py
mining/
base.py # MemoryCandidate, MiningResult, registries
render.py # prompt-block markdown rendering + write helpers
methods/
reme_refine.py # success patterns + failure triggers + risk rules
skill_pro.py # reusable skill blocks
memp_procedural.py # whole-session procedural memory
filters/
risk_aware.py # prune high-risk / low-confidence / near-duplicates
dashboard/
build_dashboard.py # static HTML + rollouts.json builder
pricing/
litellm-pricing.json # vendored LiteLLM rates
refresh.py # refresh the snapshot from upstream
README.md
index.html # generated
data/rollouts.json # generated
specs/ # design docs (see top of this README)
rollout-memory/ # captured artifacts (gitignored)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file retro_ai-0.1.0.tar.gz.
File metadata
- Download URL: retro_ai-0.1.0.tar.gz
- Upload date:
- Size: 4.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92b877ec1191d3ff43f0efeeb601d29179a3d51614d39e19d09c7b7299b0e782
|
|
| MD5 |
e77b246bf1be0fae6ba62c1c65b2abd8
|
|
| BLAKE2b-256 |
2e8655e3764e38d5401105ca147bf6b2b89c95452d065b2e929fe53fb3a8fd64
|
Provenance
The following attestation bundles were made for retro_ai-0.1.0.tar.gz:
Publisher:
publish.yml on sajjadGG/retro
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
retro_ai-0.1.0.tar.gz -
Subject digest:
92b877ec1191d3ff43f0efeeb601d29179a3d51614d39e19d09c7b7299b0e782 - Sigstore transparency entry: 1707474369
- Sigstore integration time:
-
Permalink:
sajjadGG/retro@22c1726f7d6173747d0d2ddfbe70f24b5edb050b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/sajjadGG
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@22c1726f7d6173747d0d2ddfbe70f24b5edb050b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file retro_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: retro_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 111.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
922148ac8e442e9bee972c7cd7cef79842745643f99ae142a14d3e2349092abe
|
|
| MD5 |
1512dc92f82e40250e37c9cb5fa20089
|
|
| BLAKE2b-256 |
ed2f39e671e4a150e66503528f5ff068976fadb0fe29dab4d752109ef087c867
|
Provenance
The following attestation bundles were made for retro_ai-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on sajjadGG/retro
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
retro_ai-0.1.0-py3-none-any.whl -
Subject digest:
922148ac8e442e9bee972c7cd7cef79842745643f99ae142a14d3e2349092abe - Sigstore transparency entry: 1707474384
- Sigstore integration time:
-
Permalink:
sajjadGG/retro@22c1726f7d6173747d0d2ddfbe70f24b5edb050b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/sajjadGG
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@22c1726f7d6173747d0d2ddfbe70f24b5edb050b -
Trigger Event:
workflow_dispatch
-
Statement type: