Graph-native working memory for coding agents with causal retrieval and bounded capacity.
Project description
state-trace
Graph-native working memory for coding agents: typed memories, causal retrieval, bounded capacity, and compact briefs for small models.
state-trace is not a generic graph database or a thin wrapper around embeddings. It is a bounded working-memory layer for coding and debugging agents that need the right file, failure, and next action under tight token budgets.
Instead of stuffing text chunks into a vector index and hoping cosine similarity finds the right context, state-trace stores typed memories, typed causal links, and capacity-aware state that can be traversed like working memory.
What it is optimized for:
- artifact-first retrieval for coding agents
- current-vs-stale task state
- compact harness-facing briefs for smaller models
- online agent loops and post-hoc trajectory ingestion
- bounded memory with decay, compression, and lifecycle retention
Current repo-local benchmark snapshot:
- Held-out live benchmark (
12trajectories /4task groups, small live mode):state_tracereachesSuccess 0.417andStepAcc 0.502, ahead ofhybrid(0.083/0.131) andgraphiti(0.000/0.085). - Live replay benchmark (
4tasks, default small mode):state_tracereachesSuccess 1.000andStepAcc 1.000. - Long-horizon pressure benchmark (
capacity_limit=96):state_tracekeepsArtifact@1 0.771while staying within budget at every checkpoint.
If you want to validate the current benchmark surface quickly:
python3 examples/heldout_live_benchmark.py
python3 examples/codex_live_benchmark.py --dry-run
python3 examples/swebench_verified_eval.py --dry-run
Mount it inside an agent harness:
pip install -e ".[mcp]"
state-trace-mcp
Why this beats vector DBs for coding-agent memory
- It stores typed nodes such as
task,observation,decision,file,goal,session,command,test,symbol,patch_hunk, anderror_signature, not anonymous chunks. - Retrieval follows graph structure and causal proximity, so agents can recover chains like
observation -> task -> file. - Memory is capacity-limited. Old, low-value memories decay, compress, or get removed instead of growing without bound.
- Retrieval is state-aware. Active session, goal, and task context directly influence what the engine returns.
- Agent trajectories retain provenance and temporal state, so the engine can distinguish current facts from superseded failures.
- Embeddings are optional for future seeding, but the core engine already works without them.
Core ideas
Node: a typed memory unit withimportance,recency,access_count, and capacity metadata.Edge: a typed causal relationship such asblocks,solves,depends_on, orrelated_to.Episode: a raw provenance node for an ingested agent-log step or issue context.GraphManager: an in-memorynetworkxgraph with durable JSON or SQLite+FTS5 persistence.MemoryEngine: the main entrypoint for storing events, linking memories, applying decay, retrieving causal context, and scoping by namespace.
Installation
uv sync
Or with pip:
pip install -e .
Distribution name: state-trace
Python import path: state_trace
Benchmark/model label in scripts: state_trace
Optional extras:
pip install -e ".[mcp]" # stdio MCP server for Claude Code / Codex CLI
pip install -e ".[bench]" # graphiti + datasets (SWE-bench)
pip install -e ".[llm]" # OpenAI-backed live benchmarks + LLM ingestion
pip install -e ".[adapters]" # LangGraph / LlamaIndex adapter shims
To use the local Codex CLI harness instead of an API key, ensure the CLI is logged in:
codex login status
For API-backed runs, set:
export OPENAI_API_KEY=...
Quickstart
from state_trace import MemoryEngine
engine = MemoryEngine(capacity_limit=24.0, storage_path="memory.json")
task = engine.store(
"Fix login by tracing the refresh token path",
{"type": "task", "session": "auth-debug", "goal": "restore login", "file": "auth.ts", "importance": 0.92},
)
observation = engine.store(
"Login still returns 401 after refresh token exchange",
{
"type": "observation",
"session": "auth-debug",
"goal": "restore login",
"file": "auth.ts",
"blocks": [task.id],
"importance": 0.88,
},
)
engine.store(
"Authorization header is dropped before the retry request reaches auth.ts",
{
"type": "decision",
"session": "auth-debug",
"goal": "restore login",
"related_to": [task.id, observation.id],
"file": "auth.ts",
"importance": 0.91,
},
)
result = engine.retrieve("Why is login still broken?", {"session": "auth-debug", "goal": "restore login"})
print(result)
Example retrieval output:
{
"summary": "Most relevant memories: observation: Login still returns 401 after refresh token exchange | task: Fix login by tracing the refresh token path | file: auth.ts",
"nodes": [
{"id": "observation:login-still-returns-401-after-refresh", "type": "observation", "score": 0.82},
{"id": "task:fix-login-by-tracing-the-refresh-token", "type": "task", "score": 0.78},
{"id": "file:auth-ts", "type": "file", "score": 0.71},
],
"chains": [
["observation:login-still-returns-401-after-refresh", "task:fix-login-by-tracing-the-refresh-token", "file:auth-ts"]
],
"reasoning": "These nodes were selected because they matched the query, the active session, and the surrounding causal graph."
}
Real-world agent logs
state-trace can ingest released agent trajectories instead of only synthetic events.
from state_trace import MemoryEngine
engine = MemoryEngine(capacity_limit=256.0)
engine.store_agent_log_file("examples/data/agent_logs/marshmallow__marshmallow-1867.json")
result = engine.retrieve(
"Which file fixed the 344 vs 345 milliseconds bug?",
{"session": "swe-agent-marshmallow-1867", "repo": "marshmallow-code/marshmallow"},
)
print(result["nodes"][0]["content"])
Supported inputs:
- normalized
agent_logJSON fixtures - raw
SWE-agent.trajfiles - raw OpenHands event JSON logs
Bulk log ingestion keeps explicit sequence, file, and error structure:
- issue -> task node
- action step -> decision node
- raw command -> command node
- targeted validation -> test node
- touched identifiers -> symbol node
- emitted failure signature -> error_signature node
- submitted diff hunk -> patch_hunk node
- command/test output -> observation node
- raw step -> episode provenance node
- edited or inspected files -> stable file anchors
precedes,causes,motivates,validates,verified_by,rejected_by,contradicts,solves,supersedes, andderived_fromedges
The retrieval payload now also carries provenance for ranked nodes, so a caller can trace a returned file or observation back to the concrete episode(s) that produced it.
Vector DB Failure Modes
Run the small pinned-log benchmark:
python3 examples/real_world_agent_log_benchmark.py
The benchmark compares:
state-trace: typed nodes + edge-aware causal traversal + state-aware reranking- flat vector baseline: chunked log text + cosine top-k retrieval
On the pinned real-world trajectories in examples/data/agent_logs, the graph engine wins on rank-1 actionability:
- Marshmallow
TimeDeltabug: graph returnssrc/marshmallow/fields.pyfirst; the vector baseline returns the issue chunk first because it contains all the right tokens but not the actionable artifact. - Pydicom
PixelRepresentationbug: graph returnspydicom/pixel_data_handlers/numpy_handler.pyfirst and also surfaces the failingAttributeErrorobservation; the vector baseline ranks the issue text and edit-noise chunks ahead of the patch target.
This is the key difference: vector search can retrieve text that mentions the answer, while causal memory retrieves the artifact and the causal path around it.
For a broader offline evaluation over released SWE-agent patch trajectories:
python3 examples/offline_retrieval_eval.py
This compares four retrieval styles:
bm25: flat lexical chunk retrievaldense_cosine: local dense-style cosine retrieval over chunk embeddingshybrid: lexical + dense chunk rerankingstate-trace: typed nodes + edge-aware causal traversal + state-aware reranking
Current results on the local SWE-agent checkout at /tmp/SWE-agent:
12 patch trajectories, 4 unique tasks
The benchmark output uses state_trace as the model label because it mirrors the Python package identifier.
| Model | Recall@1 | Recall@3 | Recall@5 | MRR | Artifact@1 |
|---|---|---|---|---|---|
| bm25 | 0.750 | 1.000 | 1.000 | 0.875 | 0.000 |
| dense_cosine | 0.375 | 1.000 | 1.000 | 0.688 | 0.000 |
| hybrid | 0.750 | 1.000 | 1.000 | 0.875 | 0.000 |
| state_trace | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Interpretation:
- Flat chunk retrievers often find text that mentions the patch file somewhere in the log, so their mention-level recall can look strong.
- They still fail on
Artifact@1: the top result is usually the issue chunk or an explanatory chunk, not the actual file node an agent should act on. state-traceis the only system here that consistently returns the actionable artifact itself at rank 1 while preserving the surrounding causal chain.
Graphiti Gap Analysis
If you compare this repo to Graphiti, the right framing is not "replace Graphiti everywhere." Graphiti is the stronger general-purpose open-source temporal context graph for AI agents. The better bet for state-trace is a narrower wedge: bounded working memory for coding/debugging agents.
What Graphiti has that mattered most:
- first-class temporal state modeling
- explicit provenance episodes
- hybrid retrieval beyond naive keyword overlap
What this repo now adds or closes:
- temporal validity on nodes and edges via
event_at,valid_at, andinvalid_at - explicit episode provenance nodes with
derived_fromlinks for every ingested agent-log step - explicit
supersedestransitions when newer observations or edits replace stale task state - hybrid seeding that combines BM25-style lexical scoring, exact file/path matching, state signals, and edge-aware graph traversal
- bounded working memory with decay, compression, and summarization as a first-class design constraint
What still remains a real gap versus Graphiti:
- richer ontology/schema support
- stronger temporal reasoning over fact updates beyond coding-agent trajectories
- production-scale storage/backends and operational tooling
The intended lane for this repo is therefore:
- not "best general temporal graph memory"
- but "best local-first causal working memory for coding agents that need the right artifact at rank 1"
Small-Model Agent Brief
Smaller models benefit from explicit artifact-first memory instead of long freeform retrieval dumps. The engine now exposes a budgeted retrieval brief:
from state_trace import MemoryEngine
engine = MemoryEngine(capacity_limit=256.0)
engine.store_agent_log_file("examples/data/agent_logs/marshmallow__marshmallow-1867.json")
brief = engine.retrieve_brief(
"Which file should I patch and what failed before?",
{"session": "swe-agent-marshmallow-1867", "repo": "marshmallow-code/marshmallow"},
mode="small_model",
)
print(brief["target_files"])
print(brief["recommended_actions"])
The brief is intentionally compact and structured for agent harnesses:
target_filescurrent_statefailed_attemptsrecommended_actionsevidencetoken_estimate
small_model mode fits the brief into a tighter token budget, while large_model keeps more evidence around the same ranked memories.
The brief now also exposes:
patch_filererun_commandtests_to_rerunsymbolspatch_hintsconfidence
Online Agent Loop
state-trace is no longer limited to post-hoc log ingestion. You can record actions and observations inside an agent harness as the loop runs.
from state_trace import MemoryEngine
engine = MemoryEngine(capacity_limit=256.0)
context = {
"session": "auth-debug",
"goal": "restore login",
"issue_title": "Login still broken after refresh",
"repo": "example/auth-service",
}
engine.record_action('open "src/auth.ts"', {**context, "files": ["src/auth.ts"]})
engine.record_observation(
"AttributeError: login still fails with a 401 in src/auth.ts",
{**context, "files": ["src/auth.ts"], "status": "error"},
)
engine.record_action('edit "src/auth.ts"', {**context, "files": ["src/auth.ts"], "action_kind": "edit"})
engine.record_test_result(
"pytest tests/test_auth.py::test_refresh_retry",
"tests/test_auth.py::test_refresh_retry PASSED",
{**context, "files": ["src/auth.ts", "tests/test_auth.py::test_refresh_retry"]},
)
brief = engine.retrieve_brief(
"Which file should I patch and what test should I rerun?",
{"session": "auth-debug", "goal": "restore login"},
mode="small_model",
)
This path uses the same typed memory model as the batch log ingester, so live and post-hoc retrieval share the same graph.
Graphiti Head-To-Head
Run the local benchmark:
python3 examples/graphiti_head_to_head_eval.py
This benchmark compares:
state-trace: current repo retrieval tuned for coding-agent artifact recoverygraphiti_kuzu: Graphiti loaded into a local Kuzu graph from the same normalized agent logs
Important caveat:
- this isolates retrieval behavior, not Graphiti's full LLM-based ingestion pipeline
- Graphiti is populated manually from the same normalized trajectories so the comparison does not depend on external API keys
Current repo-local result on the unique-task SWE-agent patch subset (4 tasks):
The benchmark output uses state_trace as the model label because it mirrors the Python package identifier.
| Model | Recall@1 | Recall@3 | Recall@5 | MRR | Artifact@1 |
|---|---|---|---|---|---|
| graphiti_kuzu | 0.500 | 0.750 | 0.750 | 0.625 | 0.500 |
| state_trace | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
This is the current evidence for the repo's wedge:
- Graphiti remains the stronger general-purpose temporal graph memory project.
- On coding-agent file localization, this repo currently retrieves the actionable artifact more reliably.
- The advantage comes from task-specific typed nodes, causal retrieval, current-vs-stale state, and file-first ranking.
Budgeted Harness Proxy Eval
Run the small-model-oriented harness proxy benchmark:
python3 examples/small_model_harness_eval.py
Important caveat:
- this is a budgeted proxy benchmark, not a live LLM solve-rate benchmark
- it measures whether each memory layer can put the correct artifact into a constrained agent brief that a weaker policy can act on
Current repo-local result on the unique-task SWE-agent subset (4 tasks):
The benchmark output uses state_trace as the memory label because it mirrors the Python package identifier.
Small proxy (220-token brief, first-action bias)
| Memory | Success | AvgTokens |
|---|---|---|
| no_memory | 0.250 | 157.0 |
| state_trace | 1.000 | 218.5 |
| graphiti | 0.500 | 211.5 |
Large proxy (700-token brief, evidence aggregation)
| Memory | Success | AvgTokens |
|---|---|---|
| no_memory | 0.250 | 182.2 |
| state_trace | 1.000 | 433.5 |
| graphiti | 0.500 | 546.0 |
Interpretation:
state-tracecurrently puts the correct patch artifact into a compact agent brief more reliably than the Graphiti retrieval setup used in this repo-local benchmark.- The gap is largest for the small proxy, which mostly trusts the first explicit artifact and is easier to derail by noisy candidates.
- This is the strongest current evidence that the repo's architecture is a good fit for smaller coding models and agent harnesses.
Live Harness Eval
Run the stricter live replay harness:
python3 examples/live_agent_harness_eval.py
Important caveat:
- this is a live step-wise replay benchmark, not yet a real external small-model solve-rate run
- it is stricter than the brief-only proxy because each backend must keep choosing the correct next action over a trajectory prefix
- the default run now uses the smaller live brief budget so the Graphiti comparison can stay enabled by default
--mode bothis still available, but it is materially slower because it runs the full small-and-large pass
python3 examples/live_agent_harness_eval.py --mode both
This benchmark is intended to measure whether state-trace can become the better working-memory layer inside a real harness, not just win one-shot retrieval.
Current repo-local result on the unique-task SWE-agent subset (4 tasks), default small live mode:
| Memory | Success | StepAcc | AvgBrief |
|---|---|---|---|
| no_memory | 0.000 | 0.086 | 156.9 |
| state_trace | 1.000 | 1.000 | 225.7 |
| graphiti | 0.000 | 0.214 | 183.3 |
Interpretation:
- this is the first benchmark in the repo that evaluates memory under a live step-wise replay instead of a one-shot proxy
state-tracenow beats bothno_memoryand the incremental Graphiti setup on task completion and step accuracy in the default run- on the current local replay subset it completes all four tasks in the default small live mode
Held-out Live Benchmark
Run the grouped held-out live benchmark:
python3 examples/heldout_live_benchmark.py
Useful variants:
python3 examples/heldout_live_benchmark.py --backends state_trace --with-ablations
python3 examples/heldout_live_benchmark.py --policy openai --small-model gpt-5.4-mini --large-model gpt-5.4
python3 examples/heldout_live_benchmark.py --policy codex
python3 examples/heldout_live_benchmark.py --policy codex_api
python3 examples/codex_live_benchmark.py
What this benchmark adds:
- grouped held-out evaluation instead of a single local replay subset
- the same candidate actions, token budget, and evaluation loop for every backend
- flat baselines (
bm25,dense_cosine,hybrid) alongsideno_memory,graphiti, andstate_trace - bootstrap confidence intervals on success, step accuracy, and artifact-at-rank-1
- optional model-backed action selection through the OpenAI client if
OPENAI_API_KEYis set - a local Codex CLI policy mode that uses
codex execwith your Codex login, so it can run without an API key - a
codex_apimode if you explicitly want the Responses API with Codex models - local Codex CLI defaults of
gpt-5.4-minifor the small tier andgpt-5.3-codexfor the large tier codex_apidefaults ofgpt-5.1-codex-minifor the small tier andgpt-5.2-codexfor the large tier
To validate the Codex configuration without making API calls:
python3 examples/codex_live_benchmark.py --dry-run
Current repo-local result on the local SWE-agent corpus (12 trajectories / 4 task groups), small live mode:
| Memory | Success | StepAcc | Artifact@1 | AvgBrief | AvgLatencyMs |
|---|---|---|---|---|---|
| no_memory | 0.000 [0.000, 0.000] | 0.059 [0.014, 0.111] | 0.417 [0.167, 0.667] | 153.2 | 1.7 |
| bm25 | 0.083 [0.000, 0.250] | 0.131 [0.017, 0.328] | 0.292 [0.042, 0.542] | 226.6 | 13.5 |
| dense_cosine | 0.000 [0.000, 0.000] | 0.119 [0.017, 0.292] | 0.271 [0.021, 0.521] | 226.3 | 19.9 |
| hybrid | 0.083 [0.000, 0.250] | 0.131 [0.017, 0.328] | 0.292 [0.042, 0.542] | 226.8 | 17.1 |
| state_trace | 0.417 [0.167, 0.667] | 0.502 [0.288, 0.718] | 0.396 [0.146, 0.708] | 228.6 | 1257.7 |
| graphiti | 0.000 [0.000, 0.000] | 0.085 [0.000, 0.242] | 0.250 [0.000, 0.500] | 198.0 | 1283.0 |
Interpretation:
state-traceis ahead of every current baseline on grouped held-out step-wise completion, not just the earlier four-task replay subset- the flat chunk baselines still recover some artifacts, but they do not preserve the correct action sequence often enough to finish tasks
- the benchmark can now be rerun with either the deterministic heuristic policy or a real OpenAI model policy against the same brief format
Long-horizon Memory Pressure
Run the long-horizon pressure benchmark:
python3 examples/long_horizon_memory_eval.py
This benchmark replays each target task while injecting distractor steps from other trajectories. It measures whether the right patch file stays at rank 1 under a fixed memory budget.
Current repo-local result on the local SWE-agent corpus, small mode, capacity_limit=96, noise_per_step=2:
| Memory | Artifact@1 | Capacity | WithinBudget |
|---|---|---|---|
| state_trace | 0.771 [0.646, 0.875] | 92.748 [91.707, 93.644] | 1.000 [1.000, 1.000] |
| state_trace_no_lifecycle | 0.979 [0.938, 1.000] | 232.128 [200.847, 263.441] | 0.125 [0.042, 0.229] |
| hybrid | 0.646 [0.521, 0.771] | 14.646 [12.833, 16.396] | 1.000 [1.000, 1.000] |
Interpretation:
- disabling lifecycle-aware retention can keep more artifact recall, but it does so by blowing past the configured memory budget most of the time
state-tracekeeps recall above the flat hybrid baseline while staying within the budget in every measured checkpoint- this is the clearest current evidence that the repo is behaving like a bounded working-memory system rather than just an unbounded retrieval layer
Ranking Weight Tuning
Run the offline weight-tuning loop:
python3 examples/tune_ranking_weights.py
For a quick smoke run:
python3 examples/tune_ranking_weights.py --samples 0 --limit-cases 1
This searches scoring weights over the local SWE-agent corpus to optimize a combined objective of Artifact@1, MRR, and brief compactness. The script now also runs grouped holdout evaluation so you can see whether the selected weights transfer to unseen task groups.
Current repo-local grouped holdout run with the default config (--samples 0):
| Metric | Mean |
|---|---|
| Artifact@1 | 1.000 |
| MRR | 1.000 |
| AvgBrief | 224.0 |
With a small random search (--samples 6), the best train-time candidate currently ties the default objective on this tiny unique-task subset, which means the repo has the tuning path implemented but does not yet show a strong need to move away from the default weights on the current local corpus.
API
Create the FastAPI app:
from state_trace.api import app
Endpoints:
POST /storePOST /retrieve(supportsexplain,include_all_namespaces)POST /retrieve_brief(supportsexplain,include_all_namespaces,mode)GET /graph
Run it with:
uv run uvicorn state_trace.api:app --reload
Pass "explain": true in a retrieve request to include per-node score
breakdowns (relevance, state_match, causal_proximity, edge_semantic_bonus,
temporal_bonus, …) and the top contributing signals for each ranked memory.
MCP Server
state-trace ships a stdio MCP server so Claude Code / Cursor / Codex CLI
can mount it as durable working memory:
pip install -e ".[mcp]"
state-trace-mcp
Config via environment:
STATE_TRACE_STORAGE_PATH— durable path;.db/.sqliteuses the SQLite backend, otherwise JSON. Default:~/.state-trace/memory.db.STATE_TRACE_NAMESPACE— default namespace (e.g. the repo slug).STATE_TRACE_CAPACITY_LIMIT— working-memory budget (default256).
Tools exposed:
store,retrieve,retrieve_briefrecord_action,record_observation,record_test_resultingest_agent_log_filelist_namespaces,graph_snapshot
Example Claude Code config (~/.claude/settings.json):
{
"mcpServers": {
"state-trace": {
"command": "state-trace-mcp",
"env": {
"STATE_TRACE_STORAGE_PATH": "/Users/me/.state-trace/memory.db",
"STATE_TRACE_NAMESPACE": "repo-x"
}
}
}
}
Storage Backends
MemoryEngine(storage_path=...) picks the backend from the file extension:
.db/.sqlite/.sqlite3— durable SQLite with an FTS5 seed index; WAL journal mode, incremental upserts, process-safe reads.- any other path — JSON blob (simple, single-writer, fine for benchmarks).
SQLite is the recommended backend for long-running agent harnesses; JSON is kept for the repro-friendly benchmark scripts.
Namespaces
Set MemoryEngine(namespace="repo-x") to scope writes and reads to a
namespace. Retrieval filters to the active namespace by default; pass
include_all_namespaces=True to opt out. Nodes without a namespace remain
visible in every view so pre-namespace data is not lost.
engine = MemoryEngine(storage_path="memory.db", namespace="payments-api")
engine.store("Login 401 after refresh", {"type": "observation", "file": "auth.ts"})
engine.retrieve("why is login broken?") # scoped to payments-api
Freeform Transcript Ingestion
ingest_text parses role-tagged or step-numbered transcripts into typed
nodes + edges using the same graph as the structured agent-log ingester:
from state_trace import MemoryEngine
engine = MemoryEngine(storage_path="memory.db", namespace="repo-x")
engine.ingest_text(
"""
Thought: inspect src/auth.ts
Action: edit "src/auth.ts"
Observation: tests/test_auth.py::test_refresh_retry PASSED
""",
{"session": "auth-debug", "goal": "restore login"},
)
For unstructured transcripts, swap in OpenAITranscriptExtractor:
from state_trace.extraction_llm import OpenAITranscriptExtractor
engine.ingest_text(
transcript,
context={"session": "auth-debug"},
extractor=OpenAITranscriptExtractor(model="gpt-5.4-mini"),
)
The LLM extractor falls back to the heuristic parser when openai or
OPENAI_API_KEY is unavailable.
Framework Adapters
Drop-in memory shims for LangGraph and LlamaIndex:
from state_trace.adapters import StateTraceLangGraphMemory, StateTraceLlamaIndexMemory
lg_memory = StateTraceLangGraphMemory(default_session="coding-session")
lg_memory.add_messages(state["messages"])
state["memory_brief"] = lg_memory.retrieve_brief("what file should I patch?")
li_memory = StateTraceLlamaIndexMemory(session_id="agent-session")
li_memory.put({"role": "tool", "content": "pytest PASSED"})
brief_text = li_memory.get("which file to patch?")
Neither adapter imports the host framework; they satisfy the duck-typed memory contract used by each.
SWE-bench-Verified Localization
A larger-scale held-out benchmark than the existing n=12 corpus:
python3 examples/swebench_verified_eval.py --dry-run
pip install -e ".[bench]"
python3 examples/swebench_verified_eval.py --limit 50
python3 examples/swebench_verified_eval.py --limit 500 --backends state_trace bm25 no_memory
Scope: this is an artifact-localization pass — ingest the issue text
(problem_statement + hints_text), ask "which file should I patch?", and
measure Artifact@1 and Artifact@5 against the golden patch files. It does
not run the patch through the swebench docker harness, so it is not a
solve-rate number — it isolates the memory-layer contribution at
SWE-bench-Verified scale.
Capacity management
The engine tracks a bounded working-memory budget.
update_recency()decays recency over time with access-sensitive protection.decay_importance()weakens stale memories while preserving high-importance goals and decisions.enforce_capacity()compresses weak memories, summarizes dense low-value clusters, and removes stale nodes when necessary.
Examples
examples/basic_usage.pyexamples/coding_agent_demo.pyexamples/debugging_demo.pyexamples/real_world_agent_log_benchmark.pyexamples/offline_retrieval_eval.pyexamples/graphiti_head_to_head_eval.pyexamples/small_model_harness_eval.pyexamples/live_agent_harness_eval.pyexamples/heldout_live_benchmark.pyexamples/codex_live_benchmark.pyexamples/long_horizon_memory_eval.pyexamples/tune_ranking_weights.pyexamples/swebench_verified_eval.py
Tests
python3 -m pytest -q
Verification commands used in this repo:
python3 examples/offline_retrieval_eval.py
python3 examples/small_model_harness_eval.py
python3 examples/live_agent_harness_eval.py
python3 examples/tune_ranking_weights.py --samples 0 --limit-cases 1
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file state_trace-0.1.0.tar.gz.
File metadata
- Download URL: state_trace-0.1.0.tar.gz
- Upload date:
- Size: 75.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b417a7e49f03b708d76db32ea214c9169220c94eb6fa452dd966884dd894c39
|
|
| MD5 |
6e85259eb5f9148fc1e5e03da92c6654
|
|
| BLAKE2b-256 |
5658bcb1130251ec4767b4ae86a4ee45795133ea5f4cdf22a105010c34cff6b8
|
Provenance
The following attestation bundles were made for state_trace-0.1.0.tar.gz:
Publisher:
release.yml on razroo/state-trace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
state_trace-0.1.0.tar.gz -
Subject digest:
7b417a7e49f03b708d76db32ea214c9169220c94eb6fa452dd966884dd894c39 - Sigstore transparency entry: 1365861761
- Sigstore integration time:
-
Permalink:
razroo/state-trace@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/razroo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61 -
Trigger Event:
release
-
Statement type:
File details
Details for the file state_trace-0.1.0-py3-none-any.whl.
File metadata
- Download URL: state_trace-0.1.0-py3-none-any.whl
- Upload date:
- Size: 62.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e69969e7636b2991904d913bbad13bcba586f2e4e44f93b9ca2b2067cd56e41
|
|
| MD5 |
75ef02159a50fd6ea5fd0c5cde511b9f
|
|
| BLAKE2b-256 |
8f77aba3193cdfcf7cb2651c745ee461a40a9dcee1c57e8197784591bad1c22a
|
Provenance
The following attestation bundles were made for state_trace-0.1.0-py3-none-any.whl:
Publisher:
release.yml on razroo/state-trace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
state_trace-0.1.0-py3-none-any.whl -
Subject digest:
6e69969e7636b2991904d913bbad13bcba586f2e4e44f93b9ca2b2067cd56e41 - Sigstore transparency entry: 1365861887
- Sigstore integration time:
-
Permalink:
razroo/state-trace@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/razroo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61 -
Trigger Event:
release
-
Statement type: