Graph-native working memory for coding agents with causal retrieval and bounded capacity.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

razroo

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

state-trace

Graph-native working memory for coding agents: typed memories, causal retrieval, bounded capacity, and compact briefs for small models.

state-trace is not a generic graph database or a thin wrapper around embeddings. It is a bounded working-memory layer for coding and debugging agents that need the right file, failure, and next action under tight token budgets.

Instead of stuffing text chunks into a vector index and hoping cosine similarity finds the right context, state-trace stores typed memories, typed causal links, and capacity-aware state that can be traversed like working memory.

What it is optimized for:

artifact-first retrieval for coding agents
current-vs-stale task state
compact harness-facing briefs for smaller models
online agent loops and post-hoc trajectory ingestion
bounded memory with decay, compression, and lifecycle retention

Current repo-local benchmark snapshot:

Held-out live benchmark (12 trajectories / 4 task groups, small live mode): state_trace reaches Success 0.417 and StepAcc 0.502, ahead of hybrid (0.083 / 0.131) and graphiti (0.000 / 0.085).
Live replay benchmark (4 tasks, default small mode): state_trace reaches Success 1.000 and StepAcc 1.000.
Long-horizon pressure benchmark (capacity_limit=96): state_trace keeps Artifact@1 0.771 while staying within budget at every checkpoint.

If you want to validate the current benchmark surface quickly:

python3 examples/heldout_live_benchmark.py
python3 examples/codex_live_benchmark.py --dry-run
python3 examples/swebench_verified_eval.py --dry-run

Mount it inside an agent harness:

pip install -e ".[mcp]"
state-trace-mcp

Why this beats vector DBs for coding-agent memory

It stores typed nodes such as task, observation, decision, file, goal, session, command, test, symbol, patch_hunk, and error_signature, not anonymous chunks.
Retrieval follows graph structure and causal proximity, so agents can recover chains like observation -> task -> file.
Memory is capacity-limited. Old, low-value memories decay, compress, or get removed instead of growing without bound.
Retrieval is state-aware. Active session, goal, and task context directly influence what the engine returns.
Agent trajectories retain provenance and temporal state, so the engine can distinguish current facts from superseded failures.
Embeddings are optional for future seeding, but the core engine already works without them.

Core ideas

Node: a typed memory unit with importance, recency, access_count, and capacity metadata.
Edge: a typed causal relationship such as blocks, solves, depends_on, or related_to.
Episode: a raw provenance node for an ingested agent-log step or issue context.
GraphManager: an in-memory networkx graph with durable JSON or SQLite+FTS5 persistence.
MemoryEngine: the main entrypoint for storing events, linking memories, applying decay, retrieving causal context, and scoping by namespace.

Installation

uv sync

Or with pip:

pip install -e .

Distribution name: state-trace Python import path: state_trace Benchmark/model label in scripts: state_trace

Optional extras:

pip install -e ".[mcp]"       # stdio MCP server for Claude Code / Codex CLI
pip install -e ".[bench]"     # graphiti + datasets (SWE-bench)
pip install -e ".[llm]"       # OpenAI-backed live benchmarks + LLM ingestion
pip install -e ".[adapters]"  # LangGraph / LlamaIndex adapter shims

To use the local Codex CLI harness instead of an API key, ensure the CLI is logged in:

codex login status

For API-backed runs, set:

export OPENAI_API_KEY=...

Quickstart

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=24.0, storage_path="memory.json")

task = engine.store(
    "Fix login by tracing the refresh token path",
    {"type": "task", "session": "auth-debug", "goal": "restore login", "file": "auth.ts", "importance": 0.92},
)

observation = engine.store(
    "Login still returns 401 after refresh token exchange",
    {
        "type": "observation",
        "session": "auth-debug",
        "goal": "restore login",
        "file": "auth.ts",
        "blocks": [task.id],
        "importance": 0.88,
    },
)

engine.store(
    "Authorization header is dropped before the retry request reaches auth.ts",
    {
        "type": "decision",
        "session": "auth-debug",
        "goal": "restore login",
        "related_to": [task.id, observation.id],
        "file": "auth.ts",
        "importance": 0.91,
    },
)

result = engine.retrieve("Why is login still broken?", {"session": "auth-debug", "goal": "restore login"})
print(result)

Example retrieval output:

{
    "summary": "Most relevant memories: observation: Login still returns 401 after refresh token exchange | task: Fix login by tracing the refresh token path | file: auth.ts",
    "nodes": [
        {"id": "observation:login-still-returns-401-after-refresh", "type": "observation", "score": 0.82},
        {"id": "task:fix-login-by-tracing-the-refresh-token", "type": "task", "score": 0.78},
        {"id": "file:auth-ts", "type": "file", "score": 0.71},
    ],
    "chains": [
        ["observation:login-still-returns-401-after-refresh", "task:fix-login-by-tracing-the-refresh-token", "file:auth-ts"]
    ],
    "reasoning": "These nodes were selected because they matched the query, the active session, and the surrounding causal graph."
}

Real-world agent logs

state-trace can ingest released agent trajectories instead of only synthetic events.

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=256.0)
engine.store_agent_log_file("examples/data/agent_logs/marshmallow__marshmallow-1867.json")

result = engine.retrieve(
    "Which file fixed the 344 vs 345 milliseconds bug?",
    {"session": "swe-agent-marshmallow-1867", "repo": "marshmallow-code/marshmallow"},
)
print(result["nodes"][0]["content"])

Supported inputs:

normalized agent_log JSON fixtures
raw SWE-agent .traj files
raw OpenHands event JSON logs

Bulk log ingestion keeps explicit sequence, file, and error structure:

issue -> task node
action step -> decision node
raw command -> command node
targeted validation -> test node
touched identifiers -> symbol node
emitted failure signature -> error_signature node
submitted diff hunk -> patch_hunk node
command/test output -> observation node
raw step -> episode provenance node
edited or inspected files -> stable file anchors
precedes, causes, motivates, validates, verified_by, rejected_by, contradicts, solves, supersedes, and derived_from edges

The retrieval payload now also carries provenance for ranked nodes, so a caller can trace a returned file or observation back to the concrete episode(s) that produced it.

Vector DB Failure Modes

Run the small pinned-log benchmark:

python3 examples/real_world_agent_log_benchmark.py

The benchmark compares:

state-trace: typed nodes + edge-aware causal traversal + state-aware reranking
flat vector baseline: chunked log text + cosine top-k retrieval

On the pinned real-world trajectories in examples/data/agent_logs, the graph engine wins on rank-1 actionability:

Marshmallow TimeDelta bug: graph returns src/marshmallow/fields.py first; the vector baseline returns the issue chunk first because it contains all the right tokens but not the actionable artifact.
Pydicom PixelRepresentation bug: graph returns pydicom/pixel_data_handlers/numpy_handler.py first and also surfaces the failing AttributeError observation; the vector baseline ranks the issue text and edit-noise chunks ahead of the patch target.

This is the key difference: vector search can retrieve text that mentions the answer, while causal memory retrieves the artifact and the causal path around it.

For a broader offline evaluation over released SWE-agent patch trajectories:

python3 examples/offline_retrieval_eval.py

This compares four retrieval styles:

bm25: flat lexical chunk retrieval
dense_cosine: local dense-style cosine retrieval over chunk embeddings
hybrid: lexical + dense chunk reranking
state-trace: typed nodes + edge-aware causal traversal + state-aware reranking

Current results on the local SWE-agent checkout at /tmp/SWE-agent:

12 patch trajectories, 4 unique tasks

The benchmark output uses state_trace as the model label because it mirrors the Python package identifier.

Model	Recall@1	Recall@3	Recall@5	MRR	Artifact@1
bm25	0.750	1.000	1.000	0.875	0.000
dense_cosine	0.375	1.000	1.000	0.688	0.000
hybrid	0.750	1.000	1.000	0.875	0.000
state_trace	1.000	1.000	1.000	1.000	1.000

Interpretation:

Flat chunk retrievers often find text that mentions the patch file somewhere in the log, so their mention-level recall can look strong.
They still fail on Artifact@1: the top result is usually the issue chunk or an explanatory chunk, not the actual file node an agent should act on.
state-trace is the only system here that consistently returns the actionable artifact itself at rank 1 while preserving the surrounding causal chain.

Graphiti Gap Analysis

If you compare this repo to Graphiti, the right framing is not "replace Graphiti everywhere." Graphiti is the stronger general-purpose open-source temporal context graph for AI agents. The better bet for state-trace is a narrower wedge: bounded working memory for coding/debugging agents.

What Graphiti has that mattered most:

first-class temporal state modeling
explicit provenance episodes
hybrid retrieval beyond naive keyword overlap

What this repo now adds or closes:

temporal validity on nodes and edges via event_at, valid_at, and invalid_at
explicit episode provenance nodes with derived_from links for every ingested agent-log step
explicit supersedes transitions when newer observations or edits replace stale task state
hybrid seeding that combines BM25-style lexical scoring, exact file/path matching, state signals, and edge-aware graph traversal
bounded working memory with decay, compression, and summarization as a first-class design constraint

What still remains a real gap versus Graphiti:

richer ontology/schema support
stronger temporal reasoning over fact updates beyond coding-agent trajectories
production-scale storage/backends and operational tooling

The intended lane for this repo is therefore:

not "best general temporal graph memory"
but "best local-first causal working memory for coding agents that need the right artifact at rank 1"

Small-Model Agent Brief

Smaller models benefit from explicit artifact-first memory instead of long freeform retrieval dumps. The engine now exposes a budgeted retrieval brief:

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=256.0)
engine.store_agent_log_file("examples/data/agent_logs/marshmallow__marshmallow-1867.json")

brief = engine.retrieve_brief(
    "Which file should I patch and what failed before?",
    {"session": "swe-agent-marshmallow-1867", "repo": "marshmallow-code/marshmallow"},
    mode="small_model",
)
print(brief["target_files"])
print(brief["recommended_actions"])

The brief is intentionally compact and structured for agent harnesses:

target_files
current_state
failed_attempts
recommended_actions
evidence
token_estimate

small_model mode fits the brief into a tighter token budget, while large_model keeps more evidence around the same ranked memories.

The brief now also exposes:

patch_file
rerun_command
tests_to_rerun
symbols
patch_hints
confidence

Online Agent Loop

state-trace is no longer limited to post-hoc log ingestion. You can record actions and observations inside an agent harness as the loop runs.

from state_trace import MemoryEngine

engine = MemoryEngine(capacity_limit=256.0)
context = {
    "session": "auth-debug",
    "goal": "restore login",
    "issue_title": "Login still broken after refresh",
    "repo": "example/auth-service",
}

engine.record_action('open "src/auth.ts"', {**context, "files": ["src/auth.ts"]})
engine.record_observation(
    "AttributeError: login still fails with a 401 in src/auth.ts",
    {**context, "files": ["src/auth.ts"], "status": "error"},
)
engine.record_action('edit "src/auth.ts"', {**context, "files": ["src/auth.ts"], "action_kind": "edit"})
engine.record_test_result(
    "pytest tests/test_auth.py::test_refresh_retry",
    "tests/test_auth.py::test_refresh_retry PASSED",
    {**context, "files": ["src/auth.ts", "tests/test_auth.py::test_refresh_retry"]},
)

brief = engine.retrieve_brief(
    "Which file should I patch and what test should I rerun?",
    {"session": "auth-debug", "goal": "restore login"},
    mode="small_model",
)

This path uses the same typed memory model as the batch log ingester, so live and post-hoc retrieval share the same graph.

Graphiti Head-To-Head

Run the local benchmark:

python3 examples/graphiti_head_to_head_eval.py

This benchmark compares:

state-trace: current repo retrieval tuned for coding-agent artifact recovery
graphiti_kuzu: Graphiti loaded into a local Kuzu graph from the same normalized agent logs

Important caveat:

this isolates retrieval behavior, not Graphiti's full LLM-based ingestion pipeline
Graphiti is populated manually from the same normalized trajectories so the comparison does not depend on external API keys

Current repo-local result on the unique-task SWE-agent patch subset (4 tasks):

The benchmark output uses state_trace as the model label because it mirrors the Python package identifier.

Model	Recall@1	Recall@3	Recall@5	MRR	Artifact@1
graphiti_kuzu	0.500	0.750	0.750	0.625	0.500
state_trace	1.000	1.000	1.000	1.000	1.000

This is the current evidence for the repo's wedge:

Graphiti remains the stronger general-purpose temporal graph memory project.
On coding-agent file localization, this repo currently retrieves the actionable artifact more reliably.
The advantage comes from task-specific typed nodes, causal retrieval, current-vs-stale state, and file-first ranking.

Budgeted Harness Proxy Eval

Run the small-model-oriented harness proxy benchmark:

python3 examples/small_model_harness_eval.py

Important caveat:

this is a budgeted proxy benchmark, not a live LLM solve-rate benchmark
it measures whether each memory layer can put the correct artifact into a constrained agent brief that a weaker policy can act on

Current repo-local result on the unique-task SWE-agent subset (4 tasks):

The benchmark output uses state_trace as the memory label because it mirrors the Python package identifier.

Small proxy (220-token brief, first-action bias)

Memory	Success	AvgTokens
no_memory	0.250	157.0
state_trace	1.000	218.5
graphiti	0.500	211.5

Large proxy (700-token brief, evidence aggregation)

Memory	Success	AvgTokens
no_memory	0.250	182.2
state_trace	1.000	433.5
graphiti	0.500	546.0

Interpretation:

state-trace currently puts the correct patch artifact into a compact agent brief more reliably than the Graphiti retrieval setup used in this repo-local benchmark.
The gap is largest for the small proxy, which mostly trusts the first explicit artifact and is easier to derail by noisy candidates.
This is the strongest current evidence that the repo's architecture is a good fit for smaller coding models and agent harnesses.

Live Harness Eval

Run the stricter live replay harness:

python3 examples/live_agent_harness_eval.py

Important caveat:

this is a live step-wise replay benchmark, not yet a real external small-model solve-rate run
it is stricter than the brief-only proxy because each backend must keep choosing the correct next action over a trajectory prefix
the default run now uses the smaller live brief budget so the Graphiti comparison can stay enabled by default
--mode both is still available, but it is materially slower because it runs the full small-and-large pass

python3 examples/live_agent_harness_eval.py --mode both

This benchmark is intended to measure whether state-trace can become the better working-memory layer inside a real harness, not just win one-shot retrieval.

Current repo-local result on the unique-task SWE-agent subset (4 tasks), default small live mode:

Memory	Success	StepAcc	AvgBrief
no_memory	0.000	0.086	156.9
state_trace	1.000	1.000	225.7
graphiti	0.000	0.214	183.3

Interpretation:

this is the first benchmark in the repo that evaluates memory under a live step-wise replay instead of a one-shot proxy
state-trace now beats both no_memory and the incremental Graphiti setup on task completion and step accuracy in the default run
on the current local replay subset it completes all four tasks in the default small live mode

Held-out Live Benchmark

Run the grouped held-out live benchmark:

python3 examples/heldout_live_benchmark.py

Useful variants:

python3 examples/heldout_live_benchmark.py --backends state_trace --with-ablations
python3 examples/heldout_live_benchmark.py --policy openai --small-model gpt-5.4-mini --large-model gpt-5.4
python3 examples/heldout_live_benchmark.py --policy codex
python3 examples/heldout_live_benchmark.py --policy codex_api
python3 examples/codex_live_benchmark.py

What this benchmark adds:

grouped held-out evaluation instead of a single local replay subset
the same candidate actions, token budget, and evaluation loop for every backend
flat baselines (bm25, dense_cosine, hybrid) alongside no_memory, graphiti, and state_trace
bootstrap confidence intervals on success, step accuracy, and artifact-at-rank-1
optional model-backed action selection through the OpenAI client if OPENAI_API_KEY is set
a local Codex CLI policy mode that uses codex exec with your Codex login, so it can run without an API key
a codex_api mode if you explicitly want the Responses API with Codex models
local Codex CLI defaults of gpt-5.4-mini for the small tier and gpt-5.3-codex for the large tier
codex_api defaults of gpt-5.1-codex-mini for the small tier and gpt-5.2-codex for the large tier

To validate the Codex configuration without making API calls:

python3 examples/codex_live_benchmark.py --dry-run

Current repo-local result on the local SWE-agent corpus (12 trajectories / 4 task groups), small live mode:

Memory	Success	StepAcc	Artifact@1	AvgBrief	AvgLatencyMs
no_memory	0.000 [0.000, 0.000]	0.059 [0.014, 0.111]	0.417 [0.167, 0.667]	153.2	1.7
bm25	0.083 [0.000, 0.250]	0.131 [0.017, 0.328]	0.292 [0.042, 0.542]	226.6	13.5
dense_cosine	0.000 [0.000, 0.000]	0.119 [0.017, 0.292]	0.271 [0.021, 0.521]	226.3	19.9
hybrid	0.083 [0.000, 0.250]	0.131 [0.017, 0.328]	0.292 [0.042, 0.542]	226.8	17.1
state_trace	0.417 [0.167, 0.667]	0.502 [0.288, 0.718]	0.396 [0.146, 0.708]	228.6	1257.7
graphiti	0.000 [0.000, 0.000]	0.085 [0.000, 0.242]	0.250 [0.000, 0.500]	198.0	1283.0

Interpretation:

state-trace is ahead of every current baseline on grouped held-out step-wise completion, not just the earlier four-task replay subset
the flat chunk baselines still recover some artifacts, but they do not preserve the correct action sequence often enough to finish tasks
the benchmark can now be rerun with either the deterministic heuristic policy or a real OpenAI model policy against the same brief format

Long-horizon Memory Pressure

Run the long-horizon pressure benchmark:

python3 examples/long_horizon_memory_eval.py

This benchmark replays each target task while injecting distractor steps from other trajectories. It measures whether the right patch file stays at rank 1 under a fixed memory budget.

Current repo-local result on the local SWE-agent corpus, small mode, capacity_limit=96, noise_per_step=2:

Memory	Artifact@1	Capacity	WithinBudget
state_trace	0.771 [0.646, 0.875]	92.748 [91.707, 93.644]	1.000 [1.000, 1.000]
state_trace_no_lifecycle	0.979 [0.938, 1.000]	232.128 [200.847, 263.441]	0.125 [0.042, 0.229]
hybrid	0.646 [0.521, 0.771]	14.646 [12.833, 16.396]	1.000 [1.000, 1.000]

Interpretation:

disabling lifecycle-aware retention can keep more artifact recall, but it does so by blowing past the configured memory budget most of the time
state-trace keeps recall above the flat hybrid baseline while staying within the budget in every measured checkpoint
this is the clearest current evidence that the repo is behaving like a bounded working-memory system rather than just an unbounded retrieval layer

Ranking Weight Tuning

Run the offline weight-tuning loop:

python3 examples/tune_ranking_weights.py

For a quick smoke run:

python3 examples/tune_ranking_weights.py --samples 0 --limit-cases 1

This searches scoring weights over the local SWE-agent corpus to optimize a combined objective of Artifact@1, MRR, and brief compactness. The script now also runs grouped holdout evaluation so you can see whether the selected weights transfer to unseen task groups.

Current repo-local grouped holdout run with the default config (--samples 0):

Metric	Mean
Artifact@1	1.000
MRR	1.000
AvgBrief	224.0

With a small random search (--samples 6), the best train-time candidate currently ties the default objective on this tiny unique-task subset, which means the repo has the tuning path implemented but does not yet show a strong need to move away from the default weights on the current local corpus.

API

Create the FastAPI app:

from state_trace.api import app

Endpoints:

POST /store
POST /retrieve (supports explain, include_all_namespaces)
POST /retrieve_brief (supports explain, include_all_namespaces, mode)
GET /graph

Run it with:

uv run uvicorn state_trace.api:app --reload

Pass "explain": true in a retrieve request to include per-node score breakdowns (relevance, state_match, causal_proximity, edge_semantic_bonus, temporal_bonus, …) and the top contributing signals for each ranked memory.

MCP Server

state-trace ships a stdio MCP server so Claude Code / Cursor / Codex CLI can mount it as durable working memory:

pip install -e ".[mcp]"
state-trace-mcp

Config via environment:

STATE_TRACE_STORAGE_PATH — durable path; .db/.sqlite uses the SQLite backend, otherwise JSON. Default: ~/.state-trace/memory.db.
STATE_TRACE_NAMESPACE — default namespace (e.g. the repo slug).
STATE_TRACE_CAPACITY_LIMIT — working-memory budget (default 256).

Tools exposed:

store, retrieve, retrieve_brief
record_action, record_observation, record_test_result
ingest_agent_log_file
list_namespaces, graph_snapshot

Example Claude Code config (~/.claude/settings.json):

{
  "mcpServers": {
    "state-trace": {
      "command": "state-trace-mcp",
      "env": {
        "STATE_TRACE_STORAGE_PATH": "/Users/me/.state-trace/memory.db",
        "STATE_TRACE_NAMESPACE": "repo-x"
      }
    }
  }
}

Storage Backends

MemoryEngine(storage_path=...) picks the backend from the file extension:

.db / .sqlite / .sqlite3 — durable SQLite with an FTS5 seed index; WAL journal mode, incremental upserts, process-safe reads.
any other path — JSON blob (simple, single-writer, fine for benchmarks).

SQLite is the recommended backend for long-running agent harnesses; JSON is kept for the repro-friendly benchmark scripts.

Namespaces

Set MemoryEngine(namespace="repo-x") to scope writes and reads to a namespace. Retrieval filters to the active namespace by default; pass include_all_namespaces=True to opt out. Nodes without a namespace remain visible in every view so pre-namespace data is not lost.

engine = MemoryEngine(storage_path="memory.db", namespace="payments-api")
engine.store("Login 401 after refresh", {"type": "observation", "file": "auth.ts"})
engine.retrieve("why is login broken?")  # scoped to payments-api

Freeform Transcript Ingestion

ingest_text parses role-tagged or step-numbered transcripts into typed nodes + edges using the same graph as the structured agent-log ingester:

from state_trace import MemoryEngine

engine = MemoryEngine(storage_path="memory.db", namespace="repo-x")
engine.ingest_text(
    """
    Thought: inspect src/auth.ts
    Action: edit "src/auth.ts"
    Observation: tests/test_auth.py::test_refresh_retry PASSED
    """,
    {"session": "auth-debug", "goal": "restore login"},
)

For unstructured transcripts, swap in OpenAITranscriptExtractor:

from state_trace.extraction_llm import OpenAITranscriptExtractor

engine.ingest_text(
    transcript,
    context={"session": "auth-debug"},
    extractor=OpenAITranscriptExtractor(model="gpt-5.4-mini"),
)

The LLM extractor falls back to the heuristic parser when openai or OPENAI_API_KEY is unavailable.

Framework Adapters

Drop-in memory shims for LangGraph and LlamaIndex:

from state_trace.adapters import StateTraceLangGraphMemory, StateTraceLlamaIndexMemory

lg_memory = StateTraceLangGraphMemory(default_session="coding-session")
lg_memory.add_messages(state["messages"])
state["memory_brief"] = lg_memory.retrieve_brief("what file should I patch?")

li_memory = StateTraceLlamaIndexMemory(session_id="agent-session")
li_memory.put({"role": "tool", "content": "pytest PASSED"})
brief_text = li_memory.get("which file to patch?")

Neither adapter imports the host framework; they satisfy the duck-typed memory contract used by each.

SWE-bench-Verified Localization

A larger-scale held-out benchmark than the existing n=12 corpus:

python3 examples/swebench_verified_eval.py --dry-run
pip install -e ".[bench]"
python3 examples/swebench_verified_eval.py --limit 50
python3 examples/swebench_verified_eval.py --limit 500 --backends state_trace bm25 no_memory

Scope: this is an artifact-localization pass — ingest the issue text (problem_statement + hints_text), ask "which file should I patch?", and measure Artifact@1 and Artifact@5 against the golden patch files. It does not run the patch through the swebench docker harness, so it is not a solve-rate number — it isolates the memory-layer contribution at SWE-bench-Verified scale.

Capacity management

The engine tracks a bounded working-memory budget.

update_recency() decays recency over time with access-sensitive protection.
decay_importance() weakens stale memories while preserving high-importance goals and decisions.
enforce_capacity() compresses weak memories, summarizes dense low-value clusters, and removes stale nodes when necessary.

Examples

examples/basic_usage.py
examples/coding_agent_demo.py
examples/debugging_demo.py
examples/real_world_agent_log_benchmark.py
examples/offline_retrieval_eval.py
examples/graphiti_head_to_head_eval.py
examples/small_model_harness_eval.py
examples/live_agent_harness_eval.py
examples/heldout_live_benchmark.py
examples/codex_live_benchmark.py
examples/long_horizon_memory_eval.py
examples/tune_ranking_weights.py
examples/swebench_verified_eval.py

Tests

python3 -m pytest -q

Verification commands used in this repo:

python3 examples/offline_retrieval_eval.py
python3 examples/small_model_harness_eval.py
python3 examples/live_agent_harness_eval.py
python3 examples/tune_ranking_weights.py --samples 0 --limit-cases 1

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

razroo

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.3.2

Apr 25, 2026

0.3.1

Apr 25, 2026

0.3.0

Apr 24, 2026

0.2.1

Apr 24, 2026

0.2.0

Apr 24, 2026

This version

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

state_trace-0.1.0.tar.gz (75.7 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

state_trace-0.1.0-py3-none-any.whl (62.7 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file state_trace-0.1.0.tar.gz.

File metadata

Download URL: state_trace-0.1.0.tar.gz
Upload date: Apr 23, 2026
Size: 75.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for state_trace-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7b417a7e49f03b708d76db32ea214c9169220c94eb6fa452dd966884dd894c39`
MD5	`6e85259eb5f9148fc1e5e03da92c6654`
BLAKE2b-256	`5658bcb1130251ec4767b4ae86a4ee45795133ea5f4cdf22a105010c34cff6b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for state_trace-0.1.0.tar.gz:

Publisher: release.yml on razroo/state-trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: state_trace-0.1.0.tar.gz
- Subject digest: 7b417a7e49f03b708d76db32ea214c9169220c94eb6fa452dd966884dd894c39
- Sigstore transparency entry: 1365861761
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: razroo/state-trace@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/razroo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61
- Trigger Event: release

File details

Details for the file state_trace-0.1.0-py3-none-any.whl.

File metadata

Download URL: state_trace-0.1.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 62.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for state_trace-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e69969e7636b2991904d913bbad13bcba586f2e4e44f93b9ca2b2067cd56e41`
MD5	`75ef02159a50fd6ea5fd0c5cde511b9f`
BLAKE2b-256	`8f77aba3193cdfcf7cb2651c745ee461a40a9dcee1c57e8197784591bad1c22a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for state_trace-0.1.0-py3-none-any.whl:

Publisher: release.yml on razroo/state-trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: state_trace-0.1.0-py3-none-any.whl
- Subject digest: 6e69969e7636b2991904d913bbad13bcba586f2e4e44f93b9ca2b2067cd56e41
- Sigstore transparency entry: 1365861887
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: razroo/state-trace@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/razroo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3d73711e6c5b0a6263e978f31e67d7aaf4ea9d61
- Trigger Event: release

state-trace 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

state-trace

Why this beats vector DBs for coding-agent memory

Core ideas

Installation

Quickstart

Real-world agent logs

Vector DB Failure Modes

Graphiti Gap Analysis

Small-Model Agent Brief

Online Agent Loop

Graphiti Head-To-Head

Budgeted Harness Proxy Eval

Live Harness Eval

Held-out Live Benchmark

Long-horizon Memory Pressure

Ranking Weight Tuning

API

MCP Server

Storage Backends

Namespaces

Freeform Transcript Ingestion

Framework Adapters

SWE-bench-Verified Localization

Capacity management

Examples

Tests

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance