Skip to main content

Capture, analyze, and mine Codex / Claude Code rollouts into durable local memory.

Project description

retro

Retro logo

Capture Codex and Claude Code rollouts into durable local artifacts, evaluate them with signals, mine them into prompt-time memory, and report on cost and behavior via a static dashboard. Local-first, no cloud, evidence-linked.

Project wiki and onboarding guides: https://sajjadgg.github.io/retro/

This project answers two questions that "another memory store" doesn't:

  1. How did this rollout actually go? (signals: activity, outcome, cost, risk)
  2. What can the next session reuse from it? (mining: skills, procedures, failure triggers)

The full design is split across specs in specs/. The headline ones:


Table of contents


Install

Requires Python ≥ 3.10.

From PyPI:

pip install retro-agent-memory

That installs the retro CLI.

From a clone:

python3.13 -m venv .venv
.venv/bin/pip install -e .

The PyPI distribution is named retro-agent-memory because retro is already occupied on PyPI; the import package and CLI are still named retro.


Architecture at a glance

                  ┌──────────────────────────────────────────────────────────┐
                  │                       retro CLI                       │
                  ├──────────────┬─────────────┬──────────────┬───────────┤
   ~/.claude  ──▶ │  importers   │   signals   │    mining    │  memory   │
   ~/.codex   ──▶ │  (claude,    │  (heuristic │  (4 methods, │  index    │
                  │   codex)     │  + external)│  + filters)  │  + weave  │
                  └──────┬───────┴──────┬──────┴───────┬──────┴─────┬─────┘
                         ▼              ▼              ▼            ▼
                       raw/        normalized/      mined/      memories/
                  (immutable)    (events.jsonl)   (per method) (files + SQLite)
                         │
                         ▼
                     rendered/   ← markdown view (any time)
                         │
                         ▼
                    dashboard/build_dashboard.py
                  (static HTML + KPIs + per-model cost + signals + drill-down)

Each stage is independent — you can re-render markdown, recompute signals, re-mine, rebuild the memory index, or rebuild the dashboard from whatever rollout-memory/ already has on disk.


Storage layout

Artifacts land in ./rollout-memory/ by default:

rollout-memory/
  raw/<host>/<session-id>/                   # immutable copies of source jsonl + metadata + sidecars
    transcript.jsonl        (claude)
    rollout.jsonl           (codex)
    import_meta.json        (claude)         # source path, project_slug, claude_home, …
    thread.json             (codex)          # cwd, model_provider, source_kind, spawn_edges, …
    sidecars/               (claude todos/tasks if present)

  normalized/<host>/<session-id>.events.jsonl  # one normalized event per line

  rendered/<host>/<session-id>.md             # human-readable markdown view

  mined/<method>/<host>/<session-id>.json     # mined candidates (structured)
  mined/<method>/<host>/<session-id>.prompt.md  # paste-ready prompt block

  signals/
    readings.jsonl                            # one row per (session, signal)
    aggregates.json                           # rolled-up stats per signal
    summary.md                                # human view

  memories/
    items.jsonl                               # canonical appended memory records
    events.jsonl                              # memory lifecycle / utility log
    index.sqlite                              # derived SQLite + FTS index, rebuildable

raw/ is treated as immutable: re-importing the same session refuses unless --force is passed.


Capture

Discovery merges both Claude default roots and honors comma-separated overrides — same behavior as ccusage.

Host Default roots Env override
Claude Code ~/.claude/projects/ and ~/.config/claude/projects/ CLAUDE_CONFIG_DIR
Codex ~/.codex (uses state_5.sqlite.threads.rollout_path) CODEX_HOME

Codex roots are auto-classified:

  • sqlite_home: has state_5.sqlite → discovery via SQLite (preserves spawn-edge graph).
  • sessions_dir: has a sessions/ subdir but no SQLite → scan *.jsonl.
  • jsonl_dir: neither → treat the directory as a flat JSONL bag (e.g. codex exec --json output).

Commands

# Discover what's available on this machine
retro list                           # both hosts, with retention warning for old Claude logs
retro list --host claude
retro list --host codex
retro list --limit 50

# Import a single session
retro import claude --session-id <session-id>
retro import codex  --thread-id  <thread-id>
retro import claude --latest         # most-recent
retro import codex  --latest

# Import every discoverable session
retro import all                     # both hosts
retro import claude --all
retro import codex  --all
retro import all --limit-per-host 20

# Force overwrite an existing capture
retro import claude --latest --force

# Skip the markdown re-render step
retro import codex --latest --no-render

# Re-render markdown from an already-imported normalized stream
retro render claude <session-id>
retro render codex  <thread-id>

# Inspect what got captured
retro show claude <session-id>
retro show codex  <thread-id>

Multi-root example

# Pull in Claude logs from two locations:
CLAUDE_CONFIG_DIR="$HOME/.claude,$HOME/.config/claude,/backup/claude-archive" retro import claude --all

# Codex saved via `codex exec --json` (no SQLite present):
CODEX_HOME="$HOME/.codex,$HOME/codex-exec-logs" retro import codex --all

Retention warning

When the oldest discoverable Claude transcript is more than 25 days old, retro list prints a yellow warning. Claude Code retains logs for ~30 days by default — capture older sessions before they age out, or raise cleanupPeriodDays in Claude settings.


Signals

Signals are evaluators that observe a captured rollout and emit a comparable reading (numeric / boolean / categorical). They answer: "how did this rollout go, by some objective measure?"

Signals are grouped by intent, not implementation:

Group Question answered Examples
activity What did the agent do? command_count, unique_files_edited, web_search_count
outcome Did the work land? git_commits_made_during, failed_command_ratio, user_satisfaction_lexical
cost What did it cost (effort/tokens/time)? session_duration_seconds, user_correction_count, time_to_first_edit_seconds
risk How trustworthy was the capture/work? unknown_event_count, capture_gap_signal, secret_exposure_signal

Methods: heuristic (pure event walk), regex (text patterns), external (touches the world — e.g. git log). llm_judge is reserved for later.

Commands

# Inventory of registered signals
retro signal list
retro signal list --group outcome

# Compute signals for every imported session
retro signal run

# Restrict scope
retro signal run --host claude
retro signal run --host codex --session-id <id>
retro signal run --signal git_commits_made_during,failed_command_ratio

# Inspect readings for one session
retro signal show claude <session-id>
retro signal show codex  <thread-id>

Output

  • rollout-memory/signals/readings.jsonl — one row per (session, signal) reading.
  • rollout-memory/signals/aggregates.json — per-signal cross-session aggregates (mean / median / p90 / sum / histogram).
  • rollout-memory/signals/summary.md — human view.

Each reading carries evidence_refs (event_ids), confidence, and a method tag so it can be traced back to the rollout. External signals (e.g. git_commits_made_during) emit a null reading with a reason field when their context is missing (cwd gone, not a git repo) rather than silently skipping.

Adding a new signal

from retro.signals.base import SessionContext, reading, register

@register("my_signal", group="outcome", kind="boolean",
          method="heuristic", description="…")
def _my_signal(ctx: SessionContext):
    return reading(ctx, _my_signal, value=True, evidence=[some_event])

retro signal list/run/show and the dashboard pick it up automatically.


Mining

Mining turns captured rollouts into prompt-time memory candidates — paste-ready blocks for the next session. Multiple methods are registered; each produces the same MemoryCandidate shape so they compose. Filters run after methods to prune unsafe / redundant candidates.

Registered methods

Method What it produces Source paper
reme_refine_poc Success patterns, failure triggers, tool lessons, risk rules. ReMe
skill_pro Reusable skills with explicit activation / steps / termination / verification blocks. Skill-Pro
memp_procedural Whole-session procedural memory: goal / preconditions / steps / warnings / outcome. Memp
codex_headless LLM-backed extraction through codex exec --json; saves the headless Codex JSONL capture. local Codex CLI

Registered filters

Filter What it does
risk_aware Drops risk=high and low-confidence candidates, dedupes near-duplicates, caps at 8 items.

The default methods are deterministic and local. codex_headless is opt-in and calls codex exec --json on a compact, redacted rollout summary. It writes the Codex headless stream to rollout-memory/headless/codex_headless/<host>/<session-id>.codex.jsonl.

Commands

# Inventory
retro methods

# Single method, single session (default method is reme_refine_poc)
retro mine codex <thread-id>
retro mine claude <session-id> --method skill_pro
retro mine codex  <thread-id>  --method memp_procedural
retro mine codex  <thread-id>  --method codex_headless

# Run every registered method on one session
retro mine codex <thread-id> --method all

# Apply one or more filters after mining
retro mine codex <thread-id> --method skill_pro --filter risk_aware
retro mine codex <thread-id> --method all       --filter risk_aware

# Bulk mine
retro mine '*' '*' --method all --filter risk_aware
retro mine codex  '*' --method skill_pro
retro mine claude '*' --method memp_procedural --all

Output

For each (method, host, session_id):

  • rollout-memory/mined/<method>/<host>/<id>.json — full structured result (candidates, structured payloads, evidence refs, notes, filters applied).
  • rollout-memory/mined/<method>/<host>/<id>.prompt.md — paste-ready memory block tagged with <retro method="…">, sorted by priority and confidence.

Example skill_pro output

Skill: investigate and patch a bug
  Activation: gi ahead and fix the issues found here
  Steps:
    1. TaskCreate (Scan both ~/.claude/projects/ and ~/.config/claude/projects/…)
    2. read file (file_path=~/…/retro/importers/claude.py)
    3. edit file (file_path=~/…/retro/importers/claude.py)
    …
  Termination: after writing ~/…/Mem/specs/ccusage_comparison_spec.md
  Verification: (no explicit verification observed)

Adding a new method

# src/retro/mining/methods/my_method.py
from ..base import MiningContext, MiningResult, MemoryCandidate, register_method, memory_id

@register_method("my_method", description="…")
def mine_my_method(ctx: MiningContext) -> MiningResult:
    candidates = [
        MemoryCandidate(
            id=memory_id(ctx.session_id, "my_method", 1),
            method="my_method",
            kind="procedure",
            text="…",
            when_to_use="…",
            evidence_refs=[ev.event_id for ev in ctx.events[:3]],
            confidence=0.6,
            priority=4,
        )
    ]
    return MiningResult(
        session_id=ctx.session_id, host=ctx.host, method="my_method",
        task_summary=ctx.task_summary(), candidates=candidates,
    )

Import it in mining/methods/__init__.py. retro methods and --method pick it up.


Memory backend

The memory backend turns mined and authored memories into a local SQLite index. Flat files remain the source of truth; index.sqlite is a derived cache that can be deleted and rebuilt.

The index currently supports:

  • SQLite schema bootstrap with FTS5 keyword recall.
  • Reindexing from rollout-memory/memories/items.jsonl and existing mined artifacts.
  • [[wiki-link]] edge extraction and one-hop linked-memory expansion.
  • Authored markdown import with simple frontmatter.
  • Keyword retrieval with scope/repo filtering and value-aware reranking.
  • Utility updates through memories/events.jsonl.
  • Prompt-time context weaving.
  • Dashboard counts for indexed memory, status/scope/kind, top utility, and lifecycle events.

Commands

# Create rollout-memory/memories/ and an empty index
retro memory init

# Rebuild index.sqlite from files and mined artifacts
retro memory reindex

# Inspect index counts and dangling wiki-links
retro memory doctor

# Import hand-authored markdown memories
retro memory import-authored ~/path/to/memory-notes

# Retrieve ranked memories; include candidates by default while promotion matures
retro memory retrieve --query "pytest retrieval" --cwd /path/to/repo
retro memory retrieve --query "pytest retrieval" --cwd /path/to/repo --accepted-only

# Emit a compact prompt block
retro memory weave --query "pytest retrieval" --cwd /path/to/repo

# Update observed utility after a memory is used
retro memory update-utility --memory-id <id> --reward 0.8 --session-id <session-id>

Authored markdown

Authored memories are normal markdown files. Frontmatter is optional; missing fields get conservative defaults.

---
id: pytest-policy
kind: tool_lesson
scope: global
status: accepted
risk: low
when_to_use: Use when editing tests.
---
Run pytest after changing retrieval. Link to [[debugging-policy]] when relevant.

Accepted memories that contain prompt-injection markers, credential-looking strings, or invisible control characters are downgraded to needs_review on write.

Optional embeddings are still intentionally out of the core path for now; keyword retrieval works with no native dependencies.


Dashboard

A static, local HTML dashboard reads everything under rollout-memory/ and produces:

  • KPI strip (sessions, days, events, tool calls, tokens, est. cost).
  • Signals panel (color-coded by group, per-signal aggregates across sessions).
  • Activity-by-day bars.
  • Searchable / filterable session table.
  • Per-session drill-down with tabs: Summary, Models, Signals, Transcript, Memory.
  • Indexed memory sections for counts, top utility, lifecycle events, and source-session drill-downs when retro memory reindex has been run.

The Models tab shows the per-model token + cost breakdown (input / cache_create / cache_read / output / total / cost) for each session.

Preview

Retro dashboard overview

Retro dashboard session drill-down

Build

# Default mode: auto (use embedded cost when present, else compute from tokens)
retro dashboard build

# Force recomputation from token counts
retro dashboard build --mode calculate

# Show only provider-embedded costUSD; None when missing
retro dashboard build --mode display

# Direct script still works, including via env var
RETRO_COST_MODE=calculate python dashboard/build_dashboard.py

Then open dashboard/index.html directly from disk.

Cost computation

  • LiteLLM pricing snapshot at dashboard/pricing/litellm-pricing.json is the authoritative source for per-model rates (USD per token). The hardcoded DEFAULT_RATES table is the fallback for any model not in the snapshot.
  • Per-model attribution: each token delta is tagged with the active model (turn_context.model for Codex, per-message model for Claude). Multi-model sessions produce per-model rows.
  • Codex token deltas prefer the explicit info.last_token_usage per-turn delta when present, falling back to cumulative subtraction for older Codex builds.
  • Cached-input handling follows ccusage's behavior: subtract for Codex (its input_tokens is cumulative including cache), don't subtract for Anthropic (its input_tokens is already fresh non-cached).
  • Cost source labeling: each session records whether its cost was embedded (provider-supplied) or calculated (token math).

Numbers are always estimates, never billing truth.


Pricing snapshot

The dashboard pulls rates from dashboard/pricing/litellm-pricing.json — a curated subset of LiteLLM's model_prices_and_context_window.json. Refresh it with:

python dashboard/pricing/refresh.py

The refresh script:

  1. Reads the existing snapshot's model list (top-level keys).
  2. Downloads LiteLLM's upstream JSON.
  3. Replaces pricing fields for known models from upstream; leaves the local set otherwise.
  4. Updates _meta.snapshot_taken.

To add a model: add an empty entry under its name in the snapshot, then re-run refresh.py.


Release and publishing

CI lives in .github/workflows/ci.yml. It runs on pushes and pull requests to main, installs the package across Python 3.10-3.13, compiles src/retro, smoke-tests the CLI, builds the wheel/sdist, and validates them with twine check.

Publishing lives in .github/workflows/publish.yml. It runs when a GitHub release is published, builds the package, publishes it to PyPI with trusted publishing, and attaches the wheel/sdist to the GitHub release.

One-time PyPI setup:

  1. Create or claim the PyPI project retro-agent-memory.
  2. Add a trusted publisher for repository sajjadGG/retro.
  3. Use workflow .github/workflows/publish.yml.
  4. Use environment pypi.

Release flow:

# bump version in pyproject.toml first
git tag v0.1.0
git push origin v0.1.0

Then publish a GitHub release for that tag. The release event triggers the PyPI publish job.


Environment

Variable Effect
CLAUDE_CONFIG_DIR Override Claude data roots (comma-separated). Wins over the default ~/.claude + ~/.config/claude.
CODEX_HOME Override Codex roots (comma-separated). Each root is classified into sqlite_home/sessions_dir/jsonl_dir.
RETRO_COST_MODE Default --mode for the dashboard builder. One of auto, calculate, display.

Notes and guarantees

  • Local-only. Nothing is uploaded.
  • Raw is immutable. Re-imports refuse to clobber raw/ unless --force is passed.
  • Capture gaps are reported, not hidden. Unknown event types become unknown normalized events with the original payload preserved; counts surface in retro show, the unknown_event_count signal, and the dashboard.
  • Everything is evidence-linked. Signal readings and mined memories carry the event_ids they were derived from, so the dashboard can trace any claim back to the source rollout.
  • Composable. Capture, signals, mining, and the dashboard each consume the layer below from disk. Re-run any stage independently.
  • Designed beyond ccusage on purpose. ccusage reports tokens and costs. This project captures the whole rollout, evaluates it with signals, and turns it into reusable behavioral memory. See specs/ccusage_comparison_spec.md for the full comparison.

Project layout

src/retro/
  schema.py              # NormalizedEvent + read/write helpers
  storage.py             # artifact layout
  cli.py                 # Typer CLI (`retro`)
  renderer.py            # Markdown view
  importers/
    claude.py            # Claude Code transcript reader (multi-root + env override)
    codex.py             # Codex SQLite + JSONL reader (multi-root)
  signals/
    base.py, runner.py, heuristics.py, external.py
  mining/
    base.py              # MemoryCandidate, MiningResult, registries
    render.py            # prompt-block markdown rendering + write helpers
    methods/
      reme_refine.py     # success patterns + failure triggers + risk rules
      skill_pro.py       # reusable skill blocks
      memp_procedural.py # whole-session procedural memory
    filters/
      risk_aware.py      # prune high-risk / low-confidence / near-duplicates

dashboard/
  build_dashboard.py     # static HTML + rollouts.json builder
  pricing/
    litellm-pricing.json # vendored LiteLLM rates
    refresh.py           # refresh the snapshot from upstream
  README.md
  index.html             # generated
  data/rollouts.json     # generated

specs/                    # design docs (see top of this README)
rollout-memory/          # captured artifacts (gitignored)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retro_ai-0.1.0.tar.gz (4.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retro_ai-0.1.0-py3-none-any.whl (111.7 kB view details)

Uploaded Python 3

File details

Details for the file retro_ai-0.1.0.tar.gz.

File metadata

  • Download URL: retro_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 4.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for retro_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 92b877ec1191d3ff43f0efeeb601d29179a3d51614d39e19d09c7b7299b0e782
MD5 e77b246bf1be0fae6ba62c1c65b2abd8
BLAKE2b-256 2e8655e3764e38d5401105ca147bf6b2b89c95452d065b2e929fe53fb3a8fd64

See more details on using hashes here.

Provenance

The following attestation bundles were made for retro_ai-0.1.0.tar.gz:

Publisher: publish.yml on sajjadGG/retro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file retro_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: retro_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 111.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for retro_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 922148ac8e442e9bee972c7cd7cef79842745643f99ae142a14d3e2349092abe
MD5 1512dc92f82e40250e37c9cb5fa20089
BLAKE2b-256 ed2f39e671e4a150e66503528f5ff068976fadb0fe29dab4d752109ef087c867

See more details on using hashes here.

Provenance

The following attestation bundles were made for retro_ai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on sajjadGG/retro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page