Skip to main content

Graph-native working memory for coding agents with causal retrieval and bounded capacity.

Project description

state-trace

Graph-native working memory for coding agents: typed memories, causal retrieval, bounded capacity, and compact briefs for small models.

state-trace is not a generic graph database or a thin wrapper around embeddings. It is a bounded working-memory layer for coding and debugging agents that need the right file, failure, and next action under tight token budgets.

Instead of stuffing text chunks into a vector index and hoping cosine similarity finds the right context, state-trace stores typed memories, typed causal links, and capacity-aware state that can be traversed like working memory.

What it is optimized for:

  • artifact-first retrieval for coding agents
  • current-vs-stale task state
  • compact harness-facing briefs for smaller models
  • online agent loops and post-hoc trajectory ingestion
  • bounded memory with decay, compression, and lifecycle retention

Current repo-local benchmark snapshot:

  • Held-out live benchmark (12 trajectories / 4 task groups, small live mode): state_trace reaches Success 0.417 and StepAcc 0.502, ahead of hybrid (0.083 / 0.131) and graphiti (0.000 / 0.085).
  • Live replay benchmark (4 tasks, default small mode): state_trace reaches Success 1.000 and StepAcc 1.000.
  • Long-horizon pressure benchmark (capacity_limit=96): state_trace keeps Artifact@1 0.771 while staying within budget at every checkpoint.

If you want to validate the current benchmark surface quickly:

python3 examples/heldout_live_benchmark.py
python3 examples/codex_live_benchmark.py --dry-run
python3 examples/swebench_verified_eval.py --dry-run

Mount it inside an agent harness:

pip install -e ".[mcp]"
state-trace-mcp

Why this beats vector DBs for coding-agent memory

  • It stores typed nodes such as task, observation, decision, file, goal, session, command, test, symbol, patch_hunk, and error_signature, not anonymous chunks.
  • Retrieval follows graph structure and causal proximity, so agents can recover chains like observation -> task -> file.
  • Memory is capacity-limited. Old, low-value memories decay, compress, or get removed instead of growing without bound.
  • Retrieval is state-aware. Active session, goal, and task context directly influence what the engine returns.
  • Agent trajectories retain provenance and temporal state, so the engine can distinguish current facts from superseded failures.
  • Embeddings are optional for future seeding, but the core engine already works without them.

Core ideas

  • Node: a typed memory unit with importance, recency, access_count, and capacity metadata.
  • Edge: a typed causal relationship such as blocks, solves, depends_on, or related_to.
  • Episode: a raw provenance node for an ingested agent-log step or issue context.
  • GraphManager: an in-memory networkx graph with durable JSON or SQLite+FTS5 persistence.
  • MemoryEngine: the main entrypoint for storing events, linking memories, applying decay, retrieving causal context, and scoping by namespace.

Installation

uv sync

Or with pip:

pip install -e .

Distribution name: state-trace Python import path: state_trace Benchmark/model label in scripts: state_trace

Optional extras:

pip install -e ".[mcp]"       # stdio MCP server for Claude Code / Codex CLI
pip install -e ".[bench]"     # graphiti + datasets (SWE-bench)
pip install -e ".[llm]"       # OpenAI-backed live benchmarks + LLM ingestion
pip install -e ".[adapters]"  # LangGraph / LlamaIndex adapter shims

To use the local Codex CLI harness instead of an API key, ensure the CLI is logged in:

codex login status

For API-backed runs, set:

export OPENAI_API_KEY=...

Quickstart

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=24.0, storage_path="memory.json")

task = engine.store(
    "Fix login by tracing the refresh token path",
    {"type": "task", "session": "auth-debug", "goal": "restore login", "file": "auth.ts", "importance": 0.92},
)

observation = engine.store(
    "Login still returns 401 after refresh token exchange",
    {
        "type": "observation",
        "session": "auth-debug",
        "goal": "restore login",
        "file": "auth.ts",
        "blocks": [task.id],
        "importance": 0.88,
    },
)

engine.store(
    "Authorization header is dropped before the retry request reaches auth.ts",
    {
        "type": "decision",
        "session": "auth-debug",
        "goal": "restore login",
        "related_to": [task.id, observation.id],
        "file": "auth.ts",
        "importance": 0.91,
    },
)

result = engine.retrieve("Why is login still broken?", {"session": "auth-debug", "goal": "restore login"})
print(result)

Example retrieval output:

{
    "summary": "Most relevant memories: observation: Login still returns 401 after refresh token exchange | task: Fix login by tracing the refresh token path | file: auth.ts",
    "nodes": [
        {"id": "observation:login-still-returns-401-after-refresh", "type": "observation", "score": 0.82},
        {"id": "task:fix-login-by-tracing-the-refresh-token", "type": "task", "score": 0.78},
        {"id": "file:auth-ts", "type": "file", "score": 0.71},
    ],
    "chains": [
        ["observation:login-still-returns-401-after-refresh", "task:fix-login-by-tracing-the-refresh-token", "file:auth-ts"]
    ],
    "reasoning": "These nodes were selected because they matched the query, the active session, and the surrounding causal graph."
}

Real-world agent logs

state-trace can ingest released agent trajectories instead of only synthetic events.

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=256.0)
engine.store_agent_log_file("examples/data/agent_logs/marshmallow__marshmallow-1867.json")

result = engine.retrieve(
    "Which file fixed the 344 vs 345 milliseconds bug?",
    {"session": "swe-agent-marshmallow-1867", "repo": "marshmallow-code/marshmallow"},
)
print(result["nodes"][0]["content"])

Supported inputs:

  • normalized agent_log JSON fixtures
  • raw SWE-agent .traj files
  • raw OpenHands event JSON logs

Bulk log ingestion keeps explicit sequence, file, and error structure:

  • issue -> task node
  • action step -> decision node
  • raw command -> command node
  • targeted validation -> test node
  • touched identifiers -> symbol node
  • emitted failure signature -> error_signature node
  • submitted diff hunk -> patch_hunk node
  • command/test output -> observation node
  • raw step -> episode provenance node
  • edited or inspected files -> stable file anchors
  • precedes, causes, motivates, validates, verified_by, rejected_by, contradicts, solves, supersedes, and derived_from edges

The retrieval payload now also carries provenance for ranked nodes, so a caller can trace a returned file or observation back to the concrete episode(s) that produced it.

Vector DB Failure Modes

Run the small pinned-log benchmark:

python3 examples/real_world_agent_log_benchmark.py

The benchmark compares:

  • state-trace: typed nodes + edge-aware causal traversal + state-aware reranking
  • flat vector baseline: chunked log text + cosine top-k retrieval

On the pinned real-world trajectories in examples/data/agent_logs, the graph engine wins on rank-1 actionability:

  • Marshmallow TimeDelta bug: graph returns src/marshmallow/fields.py first; the vector baseline returns the issue chunk first because it contains all the right tokens but not the actionable artifact.
  • Pydicom PixelRepresentation bug: graph returns pydicom/pixel_data_handlers/numpy_handler.py first and also surfaces the failing AttributeError observation; the vector baseline ranks the issue text and edit-noise chunks ahead of the patch target.

This is the key difference: vector search can retrieve text that mentions the answer, while causal memory retrieves the artifact and the causal path around it.

For a broader offline evaluation over released SWE-agent patch trajectories:

python3 examples/offline_retrieval_eval.py

This compares four retrieval styles:

  • bm25: flat lexical chunk retrieval
  • dense_cosine: local dense-style cosine retrieval over chunk embeddings
  • hybrid: lexical + dense chunk reranking
  • state-trace: typed nodes + edge-aware causal traversal + state-aware reranking

Current results on the local SWE-agent checkout at /tmp/SWE-agent:

12 patch trajectories, 4 unique tasks

The benchmark output uses state_trace as the model label because it mirrors the Python package identifier.

Model Recall@1 Recall@3 Recall@5 MRR Artifact@1
bm25 0.750 1.000 1.000 0.875 0.000
dense_cosine 0.375 1.000 1.000 0.688 0.000
hybrid 0.750 1.000 1.000 0.875 0.000
state_trace 1.000 1.000 1.000 1.000 1.000

Interpretation:

  • Flat chunk retrievers often find text that mentions the patch file somewhere in the log, so their mention-level recall can look strong.
  • They still fail on Artifact@1: the top result is usually the issue chunk or an explanatory chunk, not the actual file node an agent should act on.
  • state-trace is the only system here that consistently returns the actionable artifact itself at rank 1 while preserving the surrounding causal chain.

Graphiti Gap Analysis

If you compare this repo to Graphiti, the right framing is not "replace Graphiti everywhere." Graphiti is the stronger general-purpose open-source temporal context graph for AI agents. The better bet for state-trace is a narrower wedge: bounded working memory for coding/debugging agents.

What Graphiti has that mattered most:

  • first-class temporal state modeling
  • explicit provenance episodes
  • hybrid retrieval beyond naive keyword overlap

What this repo now adds or closes:

  • temporal validity on nodes and edges via event_at, valid_at, and invalid_at
  • explicit episode provenance nodes with derived_from links for every ingested agent-log step
  • explicit supersedes transitions when newer observations or edits replace stale task state
  • hybrid seeding that combines BM25-style lexical scoring, exact file/path matching, state signals, and edge-aware graph traversal
  • bounded working memory with decay, compression, and summarization as a first-class design constraint

What still remains a real gap versus Graphiti:

  • richer ontology/schema support
  • stronger temporal reasoning over fact updates beyond coding-agent trajectories
  • production-scale storage/backends and operational tooling

The intended lane for this repo is therefore:

  • not "best general temporal graph memory"
  • but "best local-first causal working memory for coding agents that need the right artifact at rank 1"

Small-Model Agent Brief

Smaller models benefit from explicit artifact-first memory instead of long freeform retrieval dumps. The engine now exposes a budgeted retrieval brief:

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=256.0)
engine.store_agent_log_file("examples/data/agent_logs/marshmallow__marshmallow-1867.json")

brief = engine.retrieve_brief(
    "Which file should I patch and what failed before?",
    {"session": "swe-agent-marshmallow-1867", "repo": "marshmallow-code/marshmallow"},
    mode="small_model",
)
print(brief["target_files"])
print(brief["recommended_actions"])

The brief is intentionally compact and structured for agent harnesses:

  • target_files
  • current_state
  • failed_attempts
  • recommended_actions
  • evidence
  • token_estimate

small_model mode fits the brief into a tighter token budget, while large_model keeps more evidence around the same ranked memories.

The brief now also exposes:

  • patch_file
  • rerun_command
  • tests_to_rerun
  • symbols
  • patch_hints
  • confidence

Online Agent Loop

state-trace is no longer limited to post-hoc log ingestion. You can record actions and observations inside an agent harness as the loop runs.

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=256.0)
context = {
    "session": "auth-debug",
    "goal": "restore login",
    "issue_title": "Login still broken after refresh",
    "repo": "example/auth-service",
}

engine.record_action('open "src/auth.ts"', {**context, "files": ["src/auth.ts"]})
engine.record_observation(
    "AttributeError: login still fails with a 401 in src/auth.ts",
    {**context, "files": ["src/auth.ts"], "status": "error"},
)
engine.record_action('edit "src/auth.ts"', {**context, "files": ["src/auth.ts"], "action_kind": "edit"})
engine.record_test_result(
    "pytest tests/test_auth.py::test_refresh_retry",
    "tests/test_auth.py::test_refresh_retry PASSED",
    {**context, "files": ["src/auth.ts", "tests/test_auth.py::test_refresh_retry"]},
)

brief = engine.retrieve_brief(
    "Which file should I patch and what test should I rerun?",
    {"session": "auth-debug", "goal": "restore login"},
    mode="small_model",
)

This path uses the same typed memory model as the batch log ingester, so live and post-hoc retrieval share the same graph.

Graphiti Head-To-Head

Run the local benchmark:

python3 examples/graphiti_head_to_head_eval.py

This benchmark compares:

  • state-trace: current repo retrieval tuned for coding-agent artifact recovery
  • graphiti_kuzu: Graphiti loaded into a local Kuzu graph from the same normalized agent logs

Important caveat:

  • this isolates retrieval behavior, not Graphiti's full LLM-based ingestion pipeline
  • Graphiti is populated manually from the same normalized trajectories so the comparison does not depend on external API keys

Current repo-local result on the unique-task SWE-agent patch subset (4 tasks):

The benchmark output uses state_trace as the model label because it mirrors the Python package identifier.

Model Recall@1 Recall@3 Recall@5 MRR Artifact@1
graphiti_kuzu 0.500 0.750 0.750 0.625 0.500
state_trace 1.000 1.000 1.000 1.000 1.000

This is the current evidence for the repo's wedge:

  • Graphiti remains the stronger general-purpose temporal graph memory project.
  • On coding-agent file localization, this repo currently retrieves the actionable artifact more reliably.
  • The advantage comes from task-specific typed nodes, causal retrieval, current-vs-stale state, and file-first ranking.

Budgeted Harness Proxy Eval

Run the small-model-oriented harness proxy benchmark:

python3 examples/small_model_harness_eval.py

Important caveat:

  • this is a budgeted proxy benchmark, not a live LLM solve-rate benchmark
  • it measures whether each memory layer can put the correct artifact into a constrained agent brief that a weaker policy can act on

Current repo-local result on the unique-task SWE-agent subset (4 tasks):

The benchmark output uses state_trace as the memory label because it mirrors the Python package identifier.

Small proxy (220-token brief, first-action bias)

Memory Success AvgTokens
no_memory 0.250 157.0
state_trace 1.000 218.5
graphiti 0.500 211.5

Large proxy (700-token brief, evidence aggregation)

Memory Success AvgTokens
no_memory 0.250 182.2
state_trace 1.000 433.5
graphiti 0.500 546.0

Interpretation:

  • state-trace currently puts the correct patch artifact into a compact agent brief more reliably than the Graphiti retrieval setup used in this repo-local benchmark.
  • The gap is largest for the small proxy, which mostly trusts the first explicit artifact and is easier to derail by noisy candidates.
  • This is the strongest current evidence that the repo's architecture is a good fit for smaller coding models and agent harnesses.

Live Harness Eval

Run the stricter live replay harness:

python3 examples/live_agent_harness_eval.py

Important caveat:

  • this is a live step-wise replay benchmark, not yet a real external small-model solve-rate run
  • it is stricter than the brief-only proxy because each backend must keep choosing the correct next action over a trajectory prefix
  • the default run now uses the smaller live brief budget so the Graphiti comparison can stay enabled by default
  • --mode both is still available, but it is materially slower because it runs the full small-and-large pass
python3 examples/live_agent_harness_eval.py --mode both

This benchmark is intended to measure whether state-trace can become the better working-memory layer inside a real harness, not just win one-shot retrieval.

Current repo-local result on the unique-task SWE-agent subset (4 tasks), default small live mode:

Memory Success StepAcc AvgBrief
no_memory 0.000 0.086 156.9
state_trace 1.000 1.000 225.7
graphiti 0.000 0.214 183.3

Interpretation:

  • this is the first benchmark in the repo that evaluates memory under a live step-wise replay instead of a one-shot proxy
  • state-trace now beats both no_memory and the incremental Graphiti setup on task completion and step accuracy in the default run
  • on the current local replay subset it completes all four tasks in the default small live mode

Held-out Live Benchmark

Run the grouped held-out live benchmark:

python3 examples/heldout_live_benchmark.py

Useful variants:

python3 examples/heldout_live_benchmark.py --backends state_trace --with-ablations
python3 examples/heldout_live_benchmark.py --policy openai --small-model gpt-5.4-mini --large-model gpt-5.4
python3 examples/heldout_live_benchmark.py --policy codex
python3 examples/heldout_live_benchmark.py --policy codex_api
python3 examples/codex_live_benchmark.py

What this benchmark adds:

  • grouped held-out evaluation instead of a single local replay subset
  • the same candidate actions, token budget, and evaluation loop for every backend
  • flat baselines (bm25, dense_cosine, hybrid) alongside no_memory, graphiti, and state_trace
  • bootstrap confidence intervals on success, step accuracy, and artifact-at-rank-1
  • optional model-backed action selection through the OpenAI client if OPENAI_API_KEY is set
  • a local Codex CLI policy mode that uses codex exec with your Codex login, so it can run without an API key
  • a codex_api mode if you explicitly want the Responses API with Codex models
  • local Codex CLI defaults of gpt-5.4-mini for the small tier and gpt-5.3-codex for the large tier
  • codex_api defaults of gpt-5.1-codex-mini for the small tier and gpt-5.2-codex for the large tier

To validate the Codex configuration without making API calls:

python3 examples/codex_live_benchmark.py --dry-run

Current repo-local result on the local SWE-agent corpus (12 trajectories / 4 task groups), small live mode:

Memory Success StepAcc Artifact@1 AvgBrief AvgLatencyMs
no_memory 0.000 [0.000, 0.000] 0.059 [0.014, 0.111] 0.417 [0.167, 0.667] 153.2 1.7
bm25 0.083 [0.000, 0.250] 0.131 [0.017, 0.328] 0.292 [0.042, 0.542] 226.6 13.5
dense_cosine 0.000 [0.000, 0.000] 0.119 [0.017, 0.292] 0.271 [0.021, 0.521] 226.3 19.9
hybrid 0.083 [0.000, 0.250] 0.131 [0.017, 0.328] 0.292 [0.042, 0.542] 226.8 17.1
state_trace 0.417 [0.167, 0.667] 0.502 [0.288, 0.718] 0.396 [0.146, 0.708] 228.6 1257.7
graphiti 0.000 [0.000, 0.000] 0.085 [0.000, 0.242] 0.250 [0.000, 0.500] 198.0 1283.0

Interpretation:

  • state-trace is ahead of every current baseline on grouped held-out step-wise completion, not just the earlier four-task replay subset
  • the flat chunk baselines still recover some artifacts, but they do not preserve the correct action sequence often enough to finish tasks
  • the benchmark can now be rerun with either the deterministic heuristic policy or a real OpenAI model policy against the same brief format

Long-horizon Memory Pressure

Run the long-horizon pressure benchmark:

python3 examples/long_horizon_memory_eval.py

This benchmark replays each target task while injecting distractor steps from other trajectories. It measures whether the right patch file stays at rank 1 under a fixed memory budget.

Current repo-local result on the local SWE-agent corpus, small mode, capacity_limit=96, noise_per_step=2:

Memory Artifact@1 Capacity WithinBudget
state_trace 0.771 [0.646, 0.875] 92.748 [91.707, 93.644] 1.000 [1.000, 1.000]
state_trace_no_lifecycle 0.979 [0.938, 1.000] 232.128 [200.847, 263.441] 0.125 [0.042, 0.229]
hybrid 0.646 [0.521, 0.771] 14.646 [12.833, 16.396] 1.000 [1.000, 1.000]

Interpretation:

  • disabling lifecycle-aware retention can keep more artifact recall, but it does so by blowing past the configured memory budget most of the time
  • state-trace keeps recall above the flat hybrid baseline while staying within the budget in every measured checkpoint
  • this is the clearest current evidence that the repo is behaving like a bounded working-memory system rather than just an unbounded retrieval layer

Ranking Weight Tuning

Run the offline weight-tuning loop:

python3 examples/tune_ranking_weights.py

For a quick smoke run:

python3 examples/tune_ranking_weights.py --samples 0 --limit-cases 1

This searches scoring weights over the local SWE-agent corpus to optimize a combined objective of Artifact@1, MRR, and brief compactness. The script now also runs grouped holdout evaluation so you can see whether the selected weights transfer to unseen task groups.

Current repo-local grouped holdout run with the default config (--samples 0):

Metric Mean
Artifact@1 1.000
MRR 1.000
AvgBrief 224.0

With a small random search (--samples 6), the best train-time candidate currently ties the default objective on this tiny unique-task subset, which means the repo has the tuning path implemented but does not yet show a strong need to move away from the default weights on the current local corpus.

API

Create the FastAPI app:

from state_trace.api import app

Endpoints:

  • POST /store
  • POST /retrieve (supports explain, include_all_namespaces)
  • POST /retrieve_brief (supports explain, include_all_namespaces, mode)
  • GET /graph

Run it with:

uv run uvicorn state_trace.api:app --reload

Pass "explain": true in a retrieve request to include per-node score breakdowns (relevance, state_match, causal_proximity, edge_semantic_bonus, temporal_bonus, …) and the top contributing signals for each ranked memory.

MCP Server

state-trace ships a stdio MCP server so Claude Code / Cursor / Codex CLI can mount it as durable working memory:

pip install -e ".[mcp]"
state-trace-mcp

Config via environment:

  • STATE_TRACE_STORAGE_PATH — durable path; .db/.sqlite uses the SQLite backend, otherwise JSON. Default: ~/.state-trace/memory.db.
  • STATE_TRACE_NAMESPACE — default namespace (e.g. the repo slug).
  • STATE_TRACE_CAPACITY_LIMIT — working-memory budget (default 256).

Tools exposed:

  • store, retrieve, retrieve_brief
  • record_action, record_observation, record_test_result
  • ingest_agent_log_file
  • list_namespaces, graph_snapshot

Example Claude Code config (~/.claude/settings.json):

{
  "mcpServers": {
    "state-trace": {
      "command": "state-trace-mcp",
      "env": {
        "STATE_TRACE_STORAGE_PATH": "/Users/me/.state-trace/memory.db",
        "STATE_TRACE_NAMESPACE": "repo-x"
      }
    }
  }
}

Storage Backends

MemoryEngine(storage_path=...) picks the backend from the file extension:

  • .db / .sqlite / .sqlite3 — durable SQLite with an FTS5 seed index; WAL journal mode, incremental upserts, process-safe reads.
  • any other path — JSON blob (simple, single-writer, fine for benchmarks).

SQLite is the recommended backend for long-running agent harnesses; JSON is kept for the repro-friendly benchmark scripts.

Namespaces

Set MemoryEngine(namespace="repo-x") to scope writes and reads to a namespace. Retrieval filters to the active namespace by default; pass include_all_namespaces=True to opt out. Nodes without a namespace remain visible in every view so pre-namespace data is not lost.

engine = MemoryEngine(storage_path="memory.db", namespace="payments-api")
engine.store("Login 401 after refresh", {"type": "observation", "file": "auth.ts"})
engine.retrieve("why is login broken?")  # scoped to payments-api

Freeform Transcript Ingestion

ingest_text parses role-tagged or step-numbered transcripts into typed nodes + edges using the same graph as the structured agent-log ingester:

from state_trace import MemoryEngine

engine = MemoryEngine(storage_path="memory.db", namespace="repo-x")
engine.ingest_text(
    """
    Thought: inspect src/auth.ts
    Action: edit "src/auth.ts"
    Observation: tests/test_auth.py::test_refresh_retry PASSED
    """,
    {"session": "auth-debug", "goal": "restore login"},
)

For unstructured transcripts, swap in OpenAITranscriptExtractor:

from state_trace.extraction_llm import OpenAITranscriptExtractor

engine.ingest_text(
    transcript,
    context={"session": "auth-debug"},
    extractor=OpenAITranscriptExtractor(model="gpt-5.4-mini"),
)

The LLM extractor falls back to the heuristic parser when openai or OPENAI_API_KEY is unavailable.

Framework Adapters

Drop-in memory shims for LangGraph and LlamaIndex:

from state_trace.adapters import StateTraceLangGraphMemory, StateTraceLlamaIndexMemory

lg_memory = StateTraceLangGraphMemory(default_session="coding-session")
lg_memory.add_messages(state["messages"])
state["memory_brief"] = lg_memory.retrieve_brief("what file should I patch?")

li_memory = StateTraceLlamaIndexMemory(session_id="agent-session")
li_memory.put({"role": "tool", "content": "pytest PASSED"})
brief_text = li_memory.get("which file to patch?")

Neither adapter imports the host framework; they satisfy the duck-typed memory contract used by each.

SWE-bench-Verified Localization

A larger-scale held-out benchmark than the existing n=12 corpus:

python3 examples/swebench_verified_eval.py --dry-run
pip install -e ".[bench]"
python3 examples/swebench_verified_eval.py --limit 50
python3 examples/swebench_verified_eval.py --limit 500 --backends state_trace bm25 no_memory

Scope: this is an artifact-localization pass — ingest the issue text (problem_statement + hints_text), ask "which file should I patch?", and measure Artifact@1 and Artifact@5 against the golden patch files. It does not run the patch through the swebench docker harness, so it is not a solve-rate number — it isolates the memory-layer contribution at SWE-bench-Verified scale.

Capacity management

The engine tracks a bounded working-memory budget.

  • update_recency() decays recency over time with access-sensitive protection.
  • decay_importance() weakens stale memories while preserving high-importance goals and decisions.
  • enforce_capacity() compresses weak memories, summarizes dense low-value clusters, and removes stale nodes when necessary.

Examples

  • examples/basic_usage.py
  • examples/coding_agent_demo.py
  • examples/debugging_demo.py
  • examples/real_world_agent_log_benchmark.py
  • examples/offline_retrieval_eval.py
  • examples/graphiti_head_to_head_eval.py
  • examples/small_model_harness_eval.py
  • examples/live_agent_harness_eval.py
  • examples/heldout_live_benchmark.py
  • examples/codex_live_benchmark.py
  • examples/long_horizon_memory_eval.py
  • examples/tune_ranking_weights.py
  • examples/swebench_verified_eval.py

Tests

python3 -m pytest -q

Verification commands used in this repo:

python3 examples/offline_retrieval_eval.py
python3 examples/small_model_harness_eval.py
python3 examples/live_agent_harness_eval.py
python3 examples/tune_ranking_weights.py --samples 0 --limit-cases 1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

state_trace-0.1.0.tar.gz (75.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

state_trace-0.1.0-py3-none-any.whl (62.7 kB view details)

Uploaded Python 3

File details

Details for the file state_trace-0.1.0.tar.gz.

File metadata

  • Download URL: state_trace-0.1.0.tar.gz
  • Upload date:
  • Size: 75.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for state_trace-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7b417a7e49f03b708d76db32ea214c9169220c94eb6fa452dd966884dd894c39
MD5 6e85259eb5f9148fc1e5e03da92c6654
BLAKE2b-256 5658bcb1130251ec4767b4ae86a4ee45795133ea5f4cdf22a105010c34cff6b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for state_trace-0.1.0.tar.gz:

Publisher: release.yml on razroo/state-trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file state_trace-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: state_trace-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 62.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for state_trace-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e69969e7636b2991904d913bbad13bcba586f2e4e44f93b9ca2b2067cd56e41
MD5 75ef02159a50fd6ea5fd0c5cde511b9f
BLAKE2b-256 8f77aba3193cdfcf7cb2651c745ee461a40a9dcee1c57e8197784591bad1c22a

See more details on using hashes here.

Provenance

The following attestation bundles were made for state_trace-0.1.0-py3-none-any.whl:

Publisher: release.yml on razroo/state-trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page