XEPV Base Sequence Analysis Toolkit for AI Agent Execution Traces
Project description
Base Sequence Toolkit
XEPV Base Sequence Analysis Framework for AI Agent Execution Traces
What is this?
When an AI agent solves a task, it executes a sequence of actions: reading files, writing code, running tests, searching for information. Base Sequence Toolkit classifies each action into one of four base types:
| Base | Name | Description | Examples |
|---|---|---|---|
| X | eXplore | Information gathering | readFile, webSearch, ls, grep |
| E | Execute | State-changing actions | writeFile, edit, npm install |
| P | Plan | Pure reasoning/strategy | LLM thinking without tool calls |
| V | Verify | Validation & testing | pytest, tsc --noEmit, read-after-write |
A task execution becomes a base sequence like XEEVXEV — a compact fingerprint of the agent's behavioral strategy.
This toolkit provides:
- Classifier: Map tool calls → XEPV base types (stateful, context-aware)
- Analyzer: N-gram patterns, transition matrices, positional effects, risk profiles
- Governor: Three-layer adaptive intervention system (rule engine + statistics + self-adaptation)
- Adapters: Ready-made integrations for SWE-agent and DunCrew trace formats
- CLI: One-command analysis from terminal
Key Findings
Analysis of 2,000+ SWE-agent trajectories and 500+ DunCrew execution traces revealed:
| Finding | Detail |
|---|---|
| V-base deficit | V (Verify) comprises only ~3.3% of all bases in SWE-agent traces. Resolved tasks have significantly more V bases than unresolved ones. |
| E→V transition bottleneck | The probability of transitioning from E (Execute) to V (Verify) is only 0.6% — agents almost never verify after executing. |
| Exploration spirals | Consecutive X runs (XXX+) are strongly associated with task failure. |
| Late planning is harmful | P bases appearing in the second half of execution correlate with lower success rates. |
| E-V pairing predicts success | The "golden path" pattern (Execute then Verify) is the strongest positive predictor. |
Installation
pip install base-sequence-toolkit
# With SWE-agent HuggingFace support:
pip install base-sequence-toolkit[swe-agent]
# For development:
pip install base-sequence-toolkit[dev]
Or install from source:
git clone https://github.com/FatBy/base-sequence-toolkit.git
cd base-sequence-toolkit
pip install -e ".[dev,swe-agent]"
Quick Start
Analyze SWE-agent Trajectories
from base_sequence_toolkit.adapters.swe_agent import load_from_huggingface
from base_sequence_toolkit.core.analyzer import run_full_analysis
# Load and classify 500 SWE-agent trajectories
results = load_from_huggingface(max_records=500)
# Extract sequences and outcomes
sequences = [r.base_sequence for r in results]
outcomes = [r.resolved for r in results]
# Run comprehensive analysis
report = run_full_analysis(sequences, outcomes)
print(report.format_summary())
Classify Your Own Agent's Actions
from base_sequence_toolkit import classify_step, create_context, StepClassification, BaseType
ctx = create_context()
steps = [
StepClassification(tool_name="readFile", args={"path": "src/main.py"}),
StepClassification(tool_name="writeFile", args={"path": "src/main.py", "content": "..."}, status="success"),
StepClassification(tool_name="readFile", args={"path": "src/main.py"}), # read-after-write → V
StepClassification(tool_name="runCmd", shell_command="pytest tests/"),
]
for step in steps:
base = classify_step(step, ctx)
print(f"{step.tool_name:15s} → {base}")
# Output:
# readFile → X (first access to unknown file)
# writeFile → E (state-changing action)
# readFile → V (reading file we just wrote)
# runCmd → V (running tests after writes)
Analyze Custom Sequences
from base_sequence_toolkit.core.analyzer import (
compute_sequence_stats,
compute_transition_matrix,
extract_risk_profile,
find_discriminative_patterns,
)
# Compute stats for a single sequence
stats = compute_sequence_stats("XEEVXEV")
print(f"Length: {stats.length}")
print(f"EV pairs: {stats.ev_pairs}")
print(f"Has verification: {stats.has_verification}")
print(f"Switch rate: {stats.switch_rate}")
# Transition matrix across multiple sequences
matrix = compute_transition_matrix(["XEEVXEV", "XXXEEE", "XEVEV"])
print(matrix.format())
# Risk profile
risk = extract_risk_profile("XXXEEEEE")
print(f"Risk flags: {risk.flags()}")
# → ['consecutive_x≥3', 'no_verification']
# Find patterns that distinguish success from failure
patterns = find_discriminative_patterns(
resolved_sequences=["XEVEV", "XEEVE"],
unresolved_sequences=["XXXEE", "XXXXX"],
min_count=1,
)
for p in patterns[:5]:
print(f" {p.pattern}: lift={p.lift:.2f}")
Governor: Adaptive Intervention System
The Governor is a three-layer closed-loop regulator that monitors agent execution in real-time and injects corrective prompts when behavioral anti-patterns are detected. Zero LLM dependency — pure rule-based logic.
Architecture
| Layer | Name | Function | Latency |
|---|---|---|---|
| Layer 1 | Online Rule Engine | 7 rules + 8-dim feature vector → prompt injection | 0ms |
| Layer 2 | Statistical Accumulator | 72-bucket success tracking + A/B intervention records | O(1) |
| Layer 3 | Threshold Self-Adaptation | Chi-squared test → auto-adjust rule thresholds | Periodic |
7 Built-in Rules
| Rule | Description |
|---|---|
consecutive_x_brake |
Stops exploration spirals (XXX...) |
step_length_fuse |
Step budget guard (disabled in v4 — data showed >15 steps = 97.4% success) |
switch_rate_warning |
Detects erratic strategy switching |
diversity_collapse |
Catches strategy entropy collapse (all same type) |
late_planning_warning |
Penalizes re-planning past halfway |
missing_verification |
Enforces P→V golden path (96.9% success rate) |
explore_dominance |
Flags excessive X/(X+E) ratio |
8-Dimensional Feature Vector
consecutive_x — trailing consecutive X count
step_count — total steps so far
x_ratio_last5 — X ratio in last 5 steps
switch_rate — adjacent-different-base ratio
p_in_late_half — P appears in second half
last_p_followed_by_v — most recent P followed by V
max_e_run_length — longest consecutive E run
xe_ratio — X / (X + E)
Quick Start: Real-time Monitoring
from base_sequence_toolkit import BaseSequenceGovernor, InterventionRecord
governor = BaseSequenceGovernor()
# Layer 1: Evaluate during execution (after each tool call)
bases = ["X", "E", "E", "X", "X", "X", "X"]
signal = governor.evaluate(bases)
if signal.triggered:
print(f"Rules fired: {signal.triggered_rules}")
print(f"Inject into LLM: {signal.prompt_injection}")
# → inject signal.prompt_injection into LLM system prompt
# Access 8-dim features
print(f"Consecutive X: {signal.features.consecutive_x}")
print(f"XE ratio: {signal.features.xe_ratio}")
Quick Start: Post-Execution Learning
# Layer 2+3: After task completes, record outcome
adjustments = governor.record_trace(
base_sequence="X-E-E-X-X-X-X",
success=False,
interventions=[
InterventionRecord(
rule="consecutive_x_brake",
step_index=5,
features=signal.features,
)
],
)
# Layer 3 auto-adapts thresholds when enough data accumulates
if adjustments:
print(f"Threshold adjustments: {adjustments}")
# View statistics
summary = governor.get_stats_summary()
print(f"Total traces: {summary['total_traces']}")
print(f"Buckets: {summary['bucket_count']}")
Standalone Functions
from base_sequence_toolkit import extract_features, evaluate_sequence, RuleThresholds
# Extract features without full Governor
features = extract_features(["X", "E", "V", "E", "X"])
print(f"Switch rate: {features.switch_rate}")
print(f"XE ratio: {features.xe_ratio}")
# Evaluate with custom thresholds
custom = RuleThresholds(consecutive_x_brake=3, switch_rate_warning=0.5)
signal = evaluate_sequence(["X", "X", "X", "X"], thresholds=custom)
print(f"Triggered: {signal.triggered_rules}")
CLI Usage
# Analyze SWE-agent trajectories
bst-analyze swe-agent -n 500 -o results/
# Analyze DunCrew execution traces
bst-analyze duncrew /path/to/exec_traces/ -o results/
# Analyze pre-computed JSON data
bst-analyze json data/my_sequences.json
Writing Your Own Adapter
To integrate with a new agent framework, implement a function that converts your trace format into StepClassification objects:
from base_sequence_toolkit import classify_step, create_context, StepClassification
def classify_my_agent_trace(trace: dict) -> str:
"""Convert your agent's trace into an XEPV base sequence."""
ctx = create_context()
bases = []
for action in trace["actions"]:
step = StepClassification(
tool_name=action["tool"],
args=action.get("arguments", {}),
status="success" if action["succeeded"] else "error",
shell_command=action.get("command"),
)
base = classify_step(step, ctx)
bases.append(str(base))
return "".join(bases)
Project Structure
base-sequence-toolkit/
├── base_sequence_toolkit/
│ ├── __init__.py
│ ├── cli.py # CLI entry point
│ ├── core/
│ │ ├── classifier.py # Generic XEPV classifier
│ │ ├── analyzer.py # Analysis primitives
│ │ └── governor.py # Three-layer adaptive Governor
│ └── adapters/
│ ├── swe_agent.py # SWE-agent trajectory adapter
│ └── duncrew.py # DunCrew trace adapter
├── examples/
│ └── swe_agent_analysis.py # End-to-end example
├── tests/
│ ├── test_classifier.py
│ ├── test_analyzer.py
│ ├── test_governor.py
│ └── test_swe_agent.py
├── data/ # Pre-computed results (optional)
├── pyproject.toml
├── LICENSE
└── README.md
Citation
If you use this toolkit in your research, please cite:
@software{base_sequence_toolkit,
title = {Base Sequence Toolkit: XEPV Analysis Framework for AI Agent Execution Traces},
year = {2025},
url = {https://github.com/FatBy/base-sequence-toolkit}
}
License
MIT License. See LICENSE for details.
Related
- XEPV Framework: The base classification system described in our paper
- DunCrew: AI operating system with XEPV-based execution governance
- SWE-bench: Software engineering benchmark for AI agents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file base_sequence_toolkit-0.2.0.tar.gz.
File metadata
- Download URL: base_sequence_toolkit-0.2.0.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbd7cf54ed92663b020744ccbfa344c2bce7db260e33d801ff9772ac80ddab84
|
|
| MD5 |
6abfb9797fcc7ef5bfe36dfed1846c04
|
|
| BLAKE2b-256 |
dadc0ecce1081fc605ea6b808dcbcce2f0177d8b35905b466a06862988104dc0
|
File details
Details for the file base_sequence_toolkit-0.2.0-py3-none-any.whl.
File metadata
- Download URL: base_sequence_toolkit-0.2.0-py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a40adfafb73d36997225c776668a8cba8ce16927c6cbdcceb99ad399f41e5cc3
|
|
| MD5 |
1aafc87b80488103df1d1445985764bc
|
|
| BLAKE2b-256 |
3a2992e268e1cfdc2d630d6ffe0a59fa11c2749d3806c1e616a7ed0deebe1954
|