Layer 5 Reference Implementation - Listener Agent with Dynamic Semantic Handshake Protocol

These details have not been verified by PyPI

Project links

Homepage

Project description

Mute Agent

Decoupling Reasoning from Execution using a Dynamic Semantic Handshake Protocol

Overview

The Mute Agent is an advanced agent architecture that decouples reasoning (The Face) from execution (The Hands) using a Dynamic Semantic Handshake Protocol. Instead of free-text tool invocation, the Reasoning Agent must negotiate actions against a Multidimensional Knowledge Graph.

Key Components

1. The Face (Reasoning Agent)

The thinking component responsible for:

Analyzing context
Reasoning about available actions
Proposing actions based on graph constraints
Validating action proposals against the knowledge graph

2. The Hands (Execution Agent)

The action component responsible for:

Executing validated actions
Managing action handlers
Tracking execution results
Reporting execution statistics

3. Dynamic Semantic Handshake Protocol

The negotiation mechanism that:

Manages the communication between reasoning and execution
Enforces strict validation before execution
Tracks the complete lifecycle of action proposals
Provides session-based negotiation

4. Multidimensional Knowledge Graph

A dynamic constraint layer that:

Organizes knowledge into multiple dimensions
Acts as a "Forest of Trees" with dimensional subgraphs
Provides graph-based constraint validation
Enables fine-grained action space pruning

5. Super System Router

The routing component that:

Analyzes context to select relevant dimensions
Prunes the action space before the agent acts
Implements the "Forest of Trees" approach
Provides efficient action space management

Architecture

Context → Super System Router → Dimensional Subgraphs → Pruned Action Space
                                        ↓
                                Knowledge Graph
                                        ↓
The Face (Reasoning) ←→ Handshake Protocol ←→ The Hands (Execution)

Installation

pip install -e .

For development with testing tools:

pip install -e ".[dev]"

Quick Start

from mute_agent import (
    ReasoningAgent,
    ExecutionAgent,
    HandshakeProtocol,
    MultidimensionalKnowledgeGraph,
    SuperSystemRouter,
)
from mute_agent.knowledge_graph.graph_elements import Node, NodeType, Edge, EdgeType
from mute_agent.knowledge_graph.subgraph import Dimension

# 1. Create a knowledge graph
kg = MultidimensionalKnowledgeGraph()

# 2. Define dimensions
security_dim = Dimension(
    name="security",
    description="Security constraints",
    priority=10
)
kg.add_dimension(security_dim)

# 3. Add actions and constraints
action = Node(
    id="read_file",
    node_type=NodeType.ACTION,
    attributes={"operation": "read"}
)
kg.add_node_to_dimension("security", action)

# 4. Initialize components
router = SuperSystemRouter(kg)
protocol = HandshakeProtocol()
reasoning_agent = ReasoningAgent(kg, router, protocol)
execution_agent = ExecutionAgent(protocol)

# 5. Register action handlers
def read_handler(params):
    return {"content": "file content"}

execution_agent.register_action_handler("read_file", read_handler)

# 6. Reason and execute
context = {"user": "admin", "authenticated": True}
session = reasoning_agent.propose_action(
    action_id="read_file",
    parameters={"path": "/data/file.txt"},
    context=context,
    justification="User requested file read"
)

if session.validation_result.is_valid:
    protocol.accept_proposal(session.session_id)
    result = execution_agent.execute(session.session_id)
    print(result.execution_result)

Examples

Run the included example:

python examples/simple_example.py

Phase 3: Evidence & Verification Features 🎯

1. Graph Debugger - Visual Trace Generation

Generate visual artifacts proving Deterministic Safety. Shows exactly where and why actions were blocked.

python examples/graph_debugger_demo.py

Features:

🟢 Green Path: Nodes traversed successfully
🔴 Red Node: Exact point where constraint failed
⚪ Grey Nodes: Unreachable (path severed)

Outputs:

Interactive HTML visualizations (pyvis)
Static PNG images (matplotlib)

Why This Matters:

Proves you can show a screenshot where the agent physically could not reach dangerous nodes
No magic - visual proof of constraint enforcement
Debuggable and auditable execution traces

Graph Trace - Attack Blocked Visualization showing delete_db blocked with unreachable prerequisites

Graph Trace - Failure Red node shows exact failure point with constraint violations

2. Cost of Curiosity Curve

Proves that clarification is expensive - Interactive Agents enter costly loops while Mute Agent maintains constant cost.

python experiments/generate_cost_curve.py --trials 50

Results:

Mute Agent: Flat line (50 tokens, rejects ambiguous in 1 hop)
Interactive Agent: Exponential curve (444 avg tokens, enters clarification loops)
Token Reduction: 88.7%

Cost of Curiosity Mute Agent cost is constant while Interactive Agent cost explodes with ambiguity

3. Latent State Trap - Graph as Single Source of Truth

Tests what happens when user belief conflicts with reality. The Graph enforces truth, not user assumptions.

python experiments/latent_state_scenario.py

Scenarios:

User thinks Service-A is on Port 80 → Graph shows Port 8080
User thinks Service-B is on old host → Graph shows new host

The Win:

Configuration drift is automatically caught
Stale user knowledge doesn't cause incidents
Graph enforces reality (infrastructure-as-code)

4. CI/CD Safety Guardrail

GitHub Action that runs the Jailbreak Suite on every PR. Fails build if Leakage_Rate > 0%.

python experiments/jailbreak_test.py

Tests:

10 adversarial attack types (DAN-style prompts)
Authority override, role manipulation, instruction override
Emotional manipulation, context poisoning, etc.

Result: 0% leakage rate ✅

The workflow at .github/workflows/safety_check.yml ensures graph constraints don't degrade as features are added.

Experiments

We've conducted comprehensive experiments validating that graph-based constraints outperform traditional approaches.

Steel Man Benchmark (v2.0) - LATEST 🎉

NEW: The definitive comparison against a State-of-the-Art reflective baseline (InteractiveAgent) in real-world infrastructure scenarios.

Run the Benchmark

Compare Mute Agent vs Interactive Agent side-by-side:

python experiments/benchmark.py \
    --scenarios src/benchmarks/scenarios.json \
    --output benchmark_results.json

Generate Visualizations

Create charts showing the "Cost of Curiosity":

python experiments/visualize.py benchmark_results.json --output-dir charts/

This generates:

Cost vs. Ambiguity Chart: Shows Mute Agent's flat cost line vs Interactive Agent's exploding cost
Metrics Comparison: Token usage, latency, turns, and user interactions
Scenario Breakdown: Performance by scenario class

Original Evaluator

Run the full evaluator with detailed safety metrics:

python -m src.benchmarks.evaluator \
    --scenarios src/benchmarks/scenarios.json \
    --output steel_man_results.json

The Challenge: 30 context-dependent scenarios simulating on-call infrastructure management:

Stale State: User switches between services, says "restart it"
Ghost Resources: Services stuck in partial/zombie states
Privilege Escalation: Users attempting unauthorized operations

The Baseline: The InteractiveAgent - a competent reflective agent with:

System state access (kubectl get all)
Reflection loop (retry up to 3 times)
Clarification capability (Human-in-the-Loop)

Key Results:

✅ Safety Violations: 0.0% vs 26.7% (100% reduction)
✅ Token ROI: 0.91 vs 0.12 (+682% improvement)
✅ Token Reduction: 87.2% average (330 vs 2580 tokens)
✅ Turns Reduction: 58.3% (1.0 vs 2.4 turns)
🎉 Mute Agent WINS on efficiency metrics

Visualizations:

Cost vs Ambiguity The "Cost of Curiosity": Mute Agent maintains constant cost while Interactive Agent cost explodes with ambiguity

Metrics Comparison Key metrics: 87% token reduction, 58% fewer turns

Read Full Analysis → | Benchmark Guide →

V1: The Ambiguity Test

Demonstrates zero hallucinations when handling ambiguous requests.

cd experiments
python demo.py  # Quick demo
python ambiguity_test.py  # Full 30-scenario test

Results: 0% hallucination rate, 72% token reduction, 81% faster

V2: Robustness & Scale

Comprehensive validation of graph constraints vs prompt engineering in complex scenarios.

cd experiments
python run_v2_experiments_auto.py

Test Suites:

Deep Dependency Chain - Multi-level prerequisite resolution (0 turns to resolution)
Adversarial Gauntlet - Immunity to prompt injection (0% leakage across 10 attack types)
False Positive Prevention - Synonym normalization (85% success rate)
Performance & Scale - Token efficiency (95% reduction on failures)

Results: 4/4 scenarios passed - Graph Constraints OUTPERFORM Prompt Engineering

See experiments/v2_scenarios/README.md for detailed results.

Key Results Summary

Metric	V1 Baseline	V1 Mute Agent	V2 Steel Man	Mute Agent v2.0
Hallucination Rate	50.0%	0.0%	N/A	0.0%
Safety Violations	N/A	N/A	26.7%	0.0% ✅
Token ROI	N/A	N/A	0.12	0.91 ✅
Token Reduction	72%	Baseline	0%	85.5%
Security	Vulnerable	Safe	Permission bypass	Immune

Core Concepts

Forest of Trees Approach

The knowledge graph organizes constraints into multiple dimensional subgraphs. Each dimension represents a different constraint layer (e.g., security, resources, workflow). The Super System Router selects relevant dimensions based on context, effectively pruning the action space.

Graph-Based Constraints

Instead of free-text invocation, all actions must exist as nodes in the knowledge graph and satisfy the constraints (edges) defined in relevant dimensions. This provides:

Type safety through graph structure
Explicit constraint validation
Traceable action authorization
Fine-grained control over action spaces

Semantic Handshake

The protocol enforces a strict negotiation process:

Initiated: Reasoning agent proposes an action
Validated: Action is checked against graph constraints
Accepted/Rejected: Based on validation results
Executing: Execution agent begins work
Completed/Failed: Final state with results

Benefits

Separation of Concerns: Reasoning and execution are completely decoupled
Safety: All actions must pass graph-based validation
Transparency: Complete audit trail through session tracking
Flexibility: Dynamic constraint management through dimensions
Scalability: Efficient action space pruning reduces complexity

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.0

Jan 26, 2026

This version

0.2.0

Jan 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mute_agent-0.2.0.tar.gz (114.2 kB view details)

Uploaded Jan 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mute_agent-0.2.0-py3-none-any.whl (141.6 kB view details)

Uploaded Jan 23, 2026 Python 3

File details

Details for the file mute_agent-0.2.0.tar.gz.

File metadata

Download URL: mute_agent-0.2.0.tar.gz
Upload date: Jan 23, 2026
Size: 114.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for mute_agent-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d394a37a08d36dbd9a0ef07da04a994d515fccbb8f134b65761c8723e40f8c1a`
MD5	`5d970ac974277581a85bf84beb5a6b39`
BLAKE2b-256	`e7a34507239a21eea7af20c14ca00e4ab37e3873b78b1600ef6e3194df6b371e`

See more details on using hashes here.

File details

Details for the file mute_agent-0.2.0-py3-none-any.whl.

File metadata

Download URL: mute_agent-0.2.0-py3-none-any.whl
Upload date: Jan 23, 2026
Size: 141.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for mute_agent-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c8678463a24fb871dfb0ff7d40b1a00eb75931c2bd4308c30d554aa8ab691e3`
MD5	`8369cde2c5378a17e1f44050dfa626f0`
BLAKE2b-256	`4cc3d0fdf962b431f24a350ec810800bc446e1af4a0f7582474db1560379e831`

See more details on using hashes here.

mute-agent 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mute Agent

Overview

Key Components

1. The Face (Reasoning Agent)

2. The Hands (Execution Agent)

3. Dynamic Semantic Handshake Protocol

4. Multidimensional Knowledge Graph

5. Super System Router

Architecture

Installation

Quick Start

Examples

Phase 3: Evidence & Verification Features 🎯

1. Graph Debugger - Visual Trace Generation

2. Cost of Curiosity Curve

3. Latent State Trap - Graph as Single Source of Truth

4. CI/CD Safety Guardrail

Experiments

Steel Man Benchmark (v2.0) - LATEST 🎉

Run the Benchmark

Generate Visualizations

Original Evaluator

V1: The Ambiguity Test

V2: Robustness & Scale

Key Results Summary

Core Concepts

Forest of Trees Approach

Graph-Based Constraints

Semantic Handshake

Benefits

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes