Skip to main content

Graph control for AI-agent workflows.

Project description

AgentProp

AgentProp logo

Graph control for agent workflows.

CI PyPI skills.sh MCP Security Version License Status

AgentProp studies AI-agent workflows as directed weighted graphs. Agents, tools, context packets, verifier calls, terminal commands, and failure states become nodes and edges in a graph that can be measured, simulated, and controlled.

The research wedge is simple:

  • Metric dimension is the core contribution: framing verifier placement as a resolving set makes failure localization a provable property — if resolving coverage is 1.0, every distinct failure produces a unique signature and any single faulty node is uniquely identifiable. With fault-tolerant metric dimension, this holds even if one verifier itself fails. No weighted-heuristic placement can promise this.
  • Quality cascade models how correctness and compression propagate, so context allocation follows the quality actually reaching each node.
  • Randomized Zero Forcing (RZF) is a secondary, scoped result: process-based RZF centrality helps on large workflows where static centrality misjudges reachability; on small graphs (under ~15 nodes) classical centrality is competitive. Reported honestly, not as a universal win.
  • Runtime control turns those ideas into actions: verify, retry, stop, switch strategy, or send more context.

AgentProp is not another agent orchestrator. It wraps a workflow you already have: each step your agent proposes work, the controller inspects the accumulated ExecutionEvent history, and decides what happens next.

   task ─► ┌─ AgentProp control loop ───────────────────────┐
           │  ┌────────┐  propose   ┌─────────────────────┐  │
           │  │  your  │ ─────────► │ Stopping Controller │  │
           │  │ agent  │ ◄───────── │ CONTINUE/VERIFY/    │  │ ─► result +
           │  └────────┘  decision  │ SWITCH/FINALIZE     │  │    decision trace
           │      └─ ExecutionEvent ┴─────────────────────┘  │
           │     (tokens, exit code, verifier_passed, ...)   │
           └─────────────────────────────────────────────────┘

Every decision is logged, so the trace is auditable. The only contract your harness must satisfy is emitting one ExecutionEvent per step. AgentProp ships dependency-light adapters for LangGraph, AutoGen, CrewAI, OpenAI Agents, and LlamaIndex (see framework integrations), and controls any other harness that can return an ExecutionEvent.

Why metric dimension matters (intuition): a workflow only fails usefully if you can tell which node failed. With verifiers placed badly, a bad planner output and a bad tester output can produce the same observable signature — so you cannot route a fix. A resolving set guarantees each node's vector of distances to the verifiers is unique, giving every distinct failure a distinct fingerprint. See verifier semantics.

Early Signal

On one Terminal-Bench 2.1 smoke task using Harbor's codex agent with gpt-5.5, the AgentProp A2 controller preserved success while reducing spend:

Task Arm Result Tokens Cost Time
regex-log A0 raw Codex pass 123,731 $0.333551 203.8s
regex-log A2 AgentProp control pass 81,949 $0.196834 173.6s

That is 33.8% fewer tokens, 41.0% lower cost, and 14.8% less wall time on a pass-preserving comparison. This is a single-task early signal, not a benchmark claim; the point is that AgentProp can already act as a spend-aware controller around live coding-agent execution.

What Is Implemented

  • Directed weighted AgentGraph with JSON validation, NetworkX conversion, and Graphviz export.
  • Propagation models: Independent Cascade, Linear Threshold, Bootstrap Percolation, deterministic Zero Forcing, Randomized Zero Forcing, learned propagation, and Quality Cascade.
  • Graph algorithms for seed selection, pruning, bottlenecks, k-core, bridges, articulation points, centrality, verifier placement, and resolving coverage.
  • Metric-dimension verifier placement, including fault-tolerant resolving coverage for single-verifier failure.
  • RZF process-based centrality for seed selection and scaling studies.
  • Runtime controllers for graph-node execution, terminal-loop control, verifier forcing, local-pass distrust, retry/stop/switch decisions, and category-conditioned bandit policies.
  • ControlSession, a small public facade that starts with graph analysis, observes real execution events, returns control decisions, and saves traces.
  • Optional ML/DL/RL baselines: learned seed scorers, torch GNNs, Q-learning, REINFORCE, PPO, and artifact/checkpoint tooling.
  • Coding-agent integration helpers for Codex, Claude Code, FastMCP tools, and framework adapters.

Install

python -m pip install agentprop

For development:

python -m pip install -e ".[dev]"
python -m pytest

Optional extras:

python -m pip install -e ".[dl]"  # torch-backed graph models
python -m pip install -e ".[rl]"  # Gymnasium-compatible RL experiments
python -m pip install -e ".[mcp]" # FastMCP server for editor-agent tools

Quick Start

Analyze a built-in workflow:

agentprop analyze planner_coder_tester_reviewer

Recommend context seed nodes under the RZF propagation model:

agentprop optimize planner_coder_tester_reviewer \
  --budget 2 \
  --algorithm greedy \
  --model rzf

Compare graph propagation policies:

PYTHONPATH=src:. python experiments/run_benchmark.py \
  --workflows chain planner_coder_tester_reviewer research_writer_verifier \
  --algorithms rzf-centrality greedy betweenness pagerank random \
  --models quality-cascade independent-cascade \
  --budget 2 --trials 50 --decay --decay-seed 0 \
  --out-dir results/my_run

Generate verifier-placement evidence:

PYTHONPATH=src:. python experiments/verifier_placement_evidence.py

Run the RZF scaling study:

PYTHONPATH=src:. python experiments/rzf_scaling_study.py

Both scripts are deterministic and print an expected-output block at the top of the source so you can confirm you reproduced the published numbers (metric dimension reaching a resolving set at lower budget k, and RZF leading on large graphs). The headline figures are summarized in reproducible results.

Run a key-free control-layer demo:

agentprop control-demo --demo terminal --out-dir reports/control-demo

The demo writes trace.jsonl, summary.json, and report.md. The trace starts with graph analysis, then records runtime events, features, decisions, and the final outcome.

Use the runtime control facade from Python:

from agentprop.runtime import ControlSession, ExecutionEvent

session = ControlSession.start(
    "planner_coder_tester_reviewer",
    task_id="task-123",
    category="implementation",
    token_budget=120_000,
    baseline_tokens=180_000,
)
decision = session.observe(
    ExecutionEvent(
        step=1,
        command="pytest -q",
        verifier_run=True,
        verifier_passed=False,
        error_signature="AssertionError:test_edge_case",
        tokens_used=18_000,
    )
)
session.write_artifacts("reports/task-123")

Coding-Agent Integration

AgentProp can be used with Codex CLI, Claude Code, or any MCP-capable editor agent as a workflow-analysis layer. It does not need model API keys to generate briefs or run local graph analysis; Codex can keep using codex login, and Claude Code can use the included skill/MCP-style integration.

agentprop agent-instructions planner_coder_tester_reviewer \
  --target codex \
  --out reports/codex_agent_brief.md

agentprop agent-instructions planner_coder_tester_reviewer \
  --target claude-code \
  --out reports/claude_code_agent_brief.md

Use these briefs for everyday implementation/review tasks, or run agentprop-mcp when a coding agent should call AgentProp tools directly while designing or debugging a multi-agent workflow.

python -m pip install "agentprop[mcp]"
agentprop-mcp

The MCP server uses FastMCP when the extra is installed and exposes both analysis tools and live control-session tools. See the control layer quickstart and coding-agent integration guide.

The installable agent skill lives at skills/agentprop-workflow-optimizer:

npx skills add https://github.com/aryan5v/AgentProp --skill agentprop-workflow-optimizer

Research Position

AgentProp sits between graph theory, diffusion models, and agent evaluation. The core hypothesis is that agent workflows should be optimized as communication graphs under quality, cost, and observability constraints, rather than treated as opaque prompt loops.

Key inspirations:

See the documentation index, research references, and the literature review for more detail.

Status

AgentProp is public alpha research software. The graph backbone, propagation models, runtime-control APIs, CLI, tests, and experiment scripts are usable, but the benchmark evidence is still early. Treat live-agent results as directional until larger repeated studies are published.

Development

ruff check .
mypy src
pytest

CI runs the same gates on pull requests. AgentProp is released under the Apache 2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentprop-0.1.0a3.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentprop-0.1.0a3-py3-none-any.whl (159.3 kB view details)

Uploaded Python 3

File details

Details for the file agentprop-0.1.0a3.tar.gz.

File metadata

  • Download URL: agentprop-0.1.0a3.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentprop-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 83da4368c79f6fac1033d5b2e033a365b6fac670a93ce16edcd996e597c6d123
MD5 b1775e2639079d1903759a0d2abf3540
BLAKE2b-256 60c9cdc8dd1e14ed17ff05c4d95fcfd24c5bb7a9969be6effdb314865eb8792c

See more details on using hashes here.

File details

Details for the file agentprop-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: agentprop-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 159.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentprop-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 0d8e2295a9f63d5b29fc204bda9cf4704617deb75814626b25d8f69f52d895d8
MD5 9e37be29935cd80a4d694f43c2b0ef96
BLAKE2b-256 ad805aa07f149f2857da42f00c0ddaed70b8403f2b38488aa72c49cdba6827bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page