Graph-native working memory for coding agents with causal retrieval and bounded capacity.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

razroo

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

state-trace

Graph-native working memory for coding agents: typed memories, causal retrieval, bounded capacity, and compact briefs for small models.

state-trace is a bounded working-memory layer for coding and debugging agents that need the right file, failure, and next action under tight token budgets. It is not a replacement for a general-purpose temporal knowledge graph like Graphiti — see ARCHITECTURE.md for the honest comparison.

What it is optimized for:

artifact-first retrieval for coding agents
current-vs-stale task state (engine.current_state(), engine.failed_hypotheses())
compact harness-facing briefs for smaller models
online agent loops and post-hoc trajectory ingestion
bounded memory with decay, compression, and lifecycle retention
MCP-mountable, local-first deployment

Headline: SWE-bench-Verified localization — n=500

The credibility benchmark. Cold-start artifact localization on the full SWE-bench-Verified test split: given only the GitHub issue text and hints (no trajectory), rank the correct patch file at 1 and at 5.

pip install -e ".[bench]"
python3 examples/swebench_verified_eval.py --limit 500 --backends no_memory bm25 state_trace graphiti

backend	n	Artifact@1	Artifact@1 CI	Artifact@5	Artifact@5 CI	AvgLatencyMs
no_memory	500	0.000	[0.000, 0.000]	0.000	[0.000, 0.000]	0.01
bm25	500	0.176	[0.144, 0.208]	0.300	[0.262, 0.338]	0.19
state_trace	500	0.216	[0.182, 0.252]	0.322	[0.284, 0.362]	27.43
graphiti	500	0.098	[0.072, 0.126]	0.216	[0.182, 0.254]	5427.39

What this says, plainly:

state_trace leads on both Artifact@1 and Artifact@5 across every baseline.
vs. Graphiti: non-overlapping 95% CIs on both metrics (0.216 vs 0.098 on A@1; 0.322 vs 0.216 on A@5). On the same input with the same deterministic embedder/reranker stub, the typed coding-agent ontology plus cold-start lexical fallback localizes the right file and puts it in the top 5 meaningfully more often.
vs. BM25: a real but narrower lead. A@1 0.216 vs 0.176 — 95% CIs just barely overlap (BM25 upper bound 0.208, state_trace lower bound 0.182), so it's a consistent directional win but not a statistical blowout. A@5 0.322 vs 0.300 — CIs overlap substantially, call it a tie with state_trace nosing ahead. The practical takeaway: state_trace's coding-agent ontology matches BM25's simple lexical coverage on cold-start and beats it when a trajectory is available (see BENCHMARKS.md).
Latency: state_trace retrieves in ~27ms vs BM25's ~0.2ms vs Graphiti's ~5,400ms. For per-action memory lookups in an agent loop, the ~200× delta over Graphiti compounds meaningfully over a long session.

The A@5 ≡ A@1 collapse that appeared in v0.2.0 is fixed in v0.2.1 via a lexical file-path fallback in retrieve_brief (pulls candidates from the query + top-scored node issue_text metadata when the graph has fewer than 5 file nodes, including paths embedded in GitHub blob URLs).

Caveats

Graphiti is run with a deterministic hash-embedder and BM25 + cosine + BFS → RRF reranker (no LLM entity extraction). That's the same simplification graphiti_head_to_head_eval.py uses for reproducibility without API keys. A full Graphiti pipeline with GPT-4-class extraction might close some of the gap, at materially higher cost per ingest.
Cold-start localization from issue text is only one axis. Trajectory-informed retrieval (BENCHMARKS.md) is where state_trace's larger advantage lives.

What makes the architecture different

Typed coding-agent ontology, not generic Entity/Edge:

Nodes: task, observation, decision, file, goal, session, command, test, symbol, patch_hunk, error_signature, episode
Edges: patches_file, fails_in, verified_by, rejected_by, supersedes, contradicts, solves, derived_from, precedes, motivates, and more
Intent routing: the retrieval scorer re-prioritizes edge types per query intent (locate_file, failure_analysis, history, general).

Bounded working memory as a first-class constraint:

enforce_capacity() runs decay, compression, and summarization on every step.
current_state(session) answers "what's live right now" directly — cheap for state-trace, expensive for a general-purpose knowledge graph.
failed_hypotheses(session) returns invalidated, superseded, or unrecovered-error nodes — the "don't propose this again" signal.

Local-first, MCP-mountable:

Hot graph is an in-process networkx.MultiDiGraph. Cold storage is WAL SQLite+FTS5.
state-trace-mcp is a stdio MCP server you can mount in Claude Code / Cursor / Codex CLI.

See ARCHITECTURE.md for why these choices matter vs. Graphiti, and BENCHMARKS.md for the smaller repo-local benchmarks.

vs. Graphiti

Graphiti is the stronger general-purpose temporal knowledge graph for AI agents. state-trace is narrower: working memory for one coding/debugging session at a time. We're not claiming to replace Graphiti — we're claiming a specific lane where the tradeoffs land differently.

Each row below is a concrete, measured axis, not a vibe.

Axis	state-trace	Graphiti	Winner for coding agents
Artifact@1 on SWE-bench-Verified, n=500	0.216 [0.182, 0.252]	0.098 [0.072, 0.126]	state-trace — non-overlapping 95% CIs
Artifact@5 on SWE-bench-Verified, n=500	0.322 [0.284, 0.362]	0.216 [0.182, 0.254]	state-trace — non-overlapping 95% CIs
Per-retrieval latency (same benchmark)	27 ms	5,427 ms	state-trace — ~200× faster
Write path per agent step	Typed insert, zero LLM calls	`add_episode` → LLM entity extraction each step	state-trace — cheaper, deterministic, no API key
Default deploy	Pure Python + local SQLite/JSON; `state-trace-mcp` stdio binary	Neo4j / Kuzu / FalkorDB graph DB + embedder + LLM	state-trace — local-first, no external services
Coding-agent ontology	Typed: `file`, `patch_hunk`, `error_signature`, `test`, `command`, `symbol`, `observation`, `decision`, `task`, `goal`, `session`, `episode`	Generic `EntityNode` / `EntityEdge` / `EpisodicNode`	state-trace — retrieval scorer routes on these types
"What's true right now in this session?"	`engine.current_state(session)` — direct O(graph) query	Inferred from temporal facts via Cypher or LLM	state-trace — first-class API
"What have I already tried and rejected?"	`engine.failed_hypotheses(session)` — direct query returning `invalid_at` + superseded + unrecovered-error nodes	Has to be inferred from `invalid_at` + contradictions	state-trace — first-class API
Working-memory capacity bound	`enforce_capacity` with decay + compression + lifecycle retention. Long-horizon pressure benchmark: Artifact@1 0.771 while staying within a 96-unit budget 100% of the time	Unbounded by design; relies on the graph DB to scale	state-trace for long debugging sessions that need a memory ceiling
Small-model brief	`retrieve_brief` produces ~220-token structured brief (`patch_file`, `rerun_command`, `tests_to_rerun`, `failed_attempts`, `recommended_actions`, …) that fits a tight budget	Returns raw nodes/facts; caller compresses	state-trace — built for small-model harnesses
MCP-mountable	`state-trace-mcp` stdio server in the `[mcp]` extra — 11 tools exposed, drop into `~/.claude/settings.json`	No official MCP server; library-first	state-trace — plug straight into Claude Code / Cursor / Codex / opencode
Long-lived temporal knowledge across weeks	Scoped to a session or repo namespace; no cross-namespace fact merging	First-class; bi-temporal validity, contradiction resolution, fact supersession across episodes	Graphiti
Multi-tenant SaaS scale	Single-writer process model; authoritative graph is in-process networkx	Built for it on Neo4j/Kuzu substrate	Graphiti
Cross-session learning about users / orgs / policies	Out of scope	First-class	Graphiti

When to pick which

Use state-trace when:

Your agent is editing code in a single debugging or refactoring session.
You talk to an MCP client (Claude Code, Cursor, Codex CLI, opencode) and want working memory without standing up a graph DB.
Per-action latency matters — you're calling memory on every tool invocation in an agent loop.
You run on small models where a 220-token structured brief beats a 1,000-token raw dump.
You need "what file should I patch / what did I already try" to be a direct query, not inferred.

Use Graphiti when:

You need a knowledge graph of facts about the world, users, or an organization that evolves across weeks.
Multi-tenant, multi-agent shared memory is part of the design.
You're willing to run Neo4j/Kuzu and pay the LLM-extraction cost per ingest for the ontological payoff.
Your retrieval patterns are richer than "which file, which test, which failed hypothesis."

They solve adjacent problems. The only reason a comparison is even interesting is that both ship as "memory for AI agents" — the honest answer is they're different products that happen to live on the same shelf.

Installation

uv sync                       # or: pip install -e .
pip install -e ".[mcp]"       # stdio MCP server for Claude Code / Cursor / Codex CLI
pip install -e ".[bench]"     # graphiti-core[kuzu] + datasets (for the headline benchmark)
pip install -e ".[llm]"       # OpenAI-backed live benchmarks + LLM ingestion
pip install -e ".[adapters]"  # LangGraph / LlamaIndex adapter shims
pip install -e ".[api]"       # FastAPI app

Distribution name: state-trace. Python import path: state_trace.

Quickstart

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=24.0, storage_path="memory.json")

task = engine.store(
    "Fix login by tracing the refresh token path",
    {"type": "task", "session": "auth-debug", "goal": "restore login", "file": "auth.ts", "importance": 0.92},
)
engine.store(
    "Login still returns 401 after refresh token exchange",
    {"type": "observation", "session": "auth-debug", "goal": "restore login", "file": "auth.ts",
     "blocks": [task.id], "importance": 0.88},
)
engine.store(
    "Authorization header is dropped before the retry request reaches auth.ts",
    {"type": "decision", "session": "auth-debug", "goal": "restore login",
     "related_to": [task.id], "file": "auth.ts", "importance": 0.91},
)

result = engine.retrieve("Why is login still broken?", {"session": "auth-debug", "goal": "restore login"})

Current state, live hypotheses, failed attempts

The architectural wedge. These APIs return a live view of the session without re-ranking:

state = engine.current_state(session="auth-debug", goal="restore login")
# → {"active_task": ..., "latest_observation": ..., "active_files": [...], ...}

failures = engine.failed_hypotheses(session="auth-debug")
# → [{"id": ..., "reason": ["superseded"], "content": "Login still returns 401 ..."}, ...]

current_state filters out invalidated and superseded nodes; failed_hypotheses surfaces them as "do not propose again" context. A general-purpose temporal graph has to infer this from fact updates; here it's a direct query.

MCP Server

pip install -e ".[mcp]"
state-trace-mcp

Environment config:

STATE_TRACE_STORAGE_PATH — durable path; .db/.sqlite uses the SQLite backend. Default: ~/.state-trace/memory.db.
STATE_TRACE_NAMESPACE — default namespace (e.g. the repo slug).
STATE_TRACE_CAPACITY_LIMIT — working-memory budget (default 256).

Tools exposed: store, retrieve, retrieve_brief, record_action, record_observation, record_test_result, ingest_agent_log_file, current_state, failed_hypotheses, list_namespaces, graph_snapshot.

Example Claude Code config (~/.claude/settings.json):

{
  "mcpServers": {
    "state-trace": {
      "command": "state-trace-mcp",
      "env": {
        "STATE_TRACE_STORAGE_PATH": "/Users/me/.state-trace/memory.db",
        "STATE_TRACE_NAMESPACE": "repo-x"
      }
    }
  }
}

Online agent loop

engine = MemoryEngine(capacity_limit=256.0)
ctx = {"session": "auth-debug", "goal": "restore login", "repo": "example/auth-service"}

engine.record_action('open "src/auth.ts"', {**ctx, "files": ["src/auth.ts"]})
engine.record_observation(
    "AttributeError: login still fails with a 401 in src/auth.ts",
    {**ctx, "files": ["src/auth.ts"], "status": "error"},
)
engine.record_action('edit "src/auth.ts"', {**ctx, "files": ["src/auth.ts"], "action_kind": "edit"})
engine.record_test_result(
    "pytest tests/test_auth.py::test_refresh_retry",
    "tests/test_auth.py::test_refresh_retry PASSED",
    {**ctx, "files": ["src/auth.ts", "tests/test_auth.py::test_refresh_retry"]},
)

brief = engine.retrieve_brief(
    "Which file should I patch and what test should I rerun?",
    {"session": "auth-debug", "goal": "restore login"},
    mode="small_model",
)

The brief fields: patch_file, rerun_command, target_files, tests_to_rerun, current_state, failed_attempts, recommended_actions, evidence, symbols, patch_hints, confidence, token_estimate.

Trajectory ingestion

engine = MemoryEngine(capacity_limit=256.0)
engine.store_agent_log_file("examples/data/agent_logs/marshmallow__marshmallow-1867.json")

Supported inputs: normalized agent_log JSON, raw SWE-agent .traj files, raw OpenHands event JSON logs.

Live solve-rate (next credibility step)

examples/swebench_verified_solve_rate.py scaffolds end-to-end solve-rate measurement: state-trace brief → LLM patch proposal → SWE-bench-Verified prediction JSONL. It does not run the swebench docker harness; that step is documented in the script's header.

python3 examples/swebench_verified_solve_rate.py --limit 5 --model gpt-5.1-mini --dry-run

Storage backends

MemoryEngine(storage_path=...) picks the backend from the file extension:

.db / .sqlite / .sqlite3 — durable SQLite with WAL journal + FTS5 seed index. Recommended for long-running agent harnesses.
any other path — JSON blob (simple, single-writer, fine for benchmarks).

See ARCHITECTURE.md for the "why networkx + SQLite, not Neo4j" explainer.

Namespaces

engine = MemoryEngine(storage_path="memory.db", namespace="payments-api")
engine.retrieve("why is login broken?")  # scoped to payments-api by default
engine.retrieve("...", include_all_namespaces=True)  # opt out

Nodes without a namespace remain visible in every view so pre-namespace data is not lost.

Framework adapters

from state_trace.adapters import StateTraceLangGraphMemory, StateTraceLlamaIndexMemory

lg_memory = StateTraceLangGraphMemory(default_session="coding-session")
li_memory = StateTraceLlamaIndexMemory(session_id="agent-session")

Neither adapter imports the host framework; they satisfy the duck-typed memory contract used by each.

FastAPI

from state_trace.api import app  # POST /store, /retrieve, /retrieve_brief, GET /graph

Pass "explain": true on retrieve to include per-node score breakdowns.

Tests

python3 -m pytest -q

Benchmarks

Full set of repo-local benchmarks and their honest caveats lives in BENCHMARKS.md. The SWE-bench-Verified row above is the only one that's at a scale worth citing externally.

Positioning

See vs. Graphiti above for the head-to-head comparison and ARCHITECTURE.md for the architecture tradeoffs in detail. tl;dr: different products, adjacent problems — state-trace owns the narrow coding-agent working-memory lane; Graphiti owns weeks-of-history temporal knowledge graphs.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

razroo

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.3.2

Apr 25, 2026

0.3.1

Apr 25, 2026

0.3.0

Apr 24, 2026

This version

0.2.1

Apr 24, 2026

0.2.0

Apr 24, 2026

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

state_trace-0.2.1.tar.gz (76.0 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

state_trace-0.2.1-py3-none-any.whl (64.2 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file state_trace-0.2.1.tar.gz.

File metadata

Download URL: state_trace-0.2.1.tar.gz
Upload date: Apr 24, 2026
Size: 76.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for state_trace-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`94f5cd6b940dfea90c07c29f68e49b30f01c5315483e3886bf78bacaae1e69cb`
MD5	`f76d5414f66fcee397af992421d512c8`
BLAKE2b-256	`97aa2e80ef0c96672bcba18e0e184b4ae1231b5fe2fb4d6d93ff16fb889f5270`

See more details on using hashes here.

Provenance

The following attestation bundles were made for state_trace-0.2.1.tar.gz:

Publisher: release.yml on razroo/state-trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: state_trace-0.2.1.tar.gz
- Subject digest: 94f5cd6b940dfea90c07c29f68e49b30f01c5315483e3886bf78bacaae1e69cb
- Sigstore transparency entry: 1367351509
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: razroo/state-trace@3d22bd83cb2b2130b59c0949736b953b612ec354
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/razroo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3d22bd83cb2b2130b59c0949736b953b612ec354
- Trigger Event: release

File details

Details for the file state_trace-0.2.1-py3-none-any.whl.

File metadata

Download URL: state_trace-0.2.1-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 64.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for state_trace-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d59800da86927088df04d02c08a4b095e07f33d85549f02bd43106977d10c90a`
MD5	`5905fce2b6cbb0d98683107e17bb1e51`
BLAKE2b-256	`124da85fdb4ef2e7fbbab3d12f8cc4f2369bb75554ee87d20ca58e46601ec1d9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for state_trace-0.2.1-py3-none-any.whl:

Publisher: release.yml on razroo/state-trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: state_trace-0.2.1-py3-none-any.whl
- Subject digest: d59800da86927088df04d02c08a4b095e07f33d85549f02bd43106977d10c90a
- Sigstore transparency entry: 1367351538
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: razroo/state-trace@3d22bd83cb2b2130b59c0949736b953b612ec354
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/razroo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3d22bd83cb2b2130b59c0949736b953b612ec354
- Trigger Event: release

state-trace 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

state-trace

Headline: SWE-bench-Verified localization — n=500

Caveats

What makes the architecture different

vs. Graphiti

When to pick which

Installation

Quickstart

Current state, live hypotheses, failed attempts

MCP Server

Online agent loop

Trajectory ingestion

Live solve-rate (next credibility step)

Storage backends

Namespaces

Framework adapters

FastAPI

Tests

Benchmarks

Positioning

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance