XEPV Base Sequence Analysis Toolkit for AI Agent Execution Traces
Project description
Base Sequence Toolkit
XEPV Base Sequence Analysis Framework for AI Agent Execution Traces
What is this?
When an AI agent solves a task, it executes a sequence of actions: reading files, writing code, running tests, searching for information. Base Sequence Toolkit classifies each action into one of four base types:
| Base | Name | Description | Examples |
|---|---|---|---|
| X | eXplore | Information gathering | readFile, webSearch, ls, grep |
| E | Execute | State-changing actions | writeFile, edit, npm install |
| P | Plan | Pure reasoning/strategy | LLM thinking without tool calls |
| V | Verify | Validation & testing | pytest, tsc --noEmit, read-after-write |
A task execution becomes a base sequence like XEEVXEV — a compact fingerprint of the agent's behavioral strategy.
This toolkit provides:
- Classifier: Map tool calls → XEPV base types (stateful, context-aware)
- Analyzer: N-gram patterns, transition matrices, positional effects, risk profiles
- Adapters: Ready-made integrations for SWE-agent and DunCrew trace formats
- CLI: One-command analysis from terminal
Key Findings
Analysis of 2,000+ SWE-agent trajectories and 500+ DunCrew execution traces revealed:
| Finding | Detail |
|---|---|
| V-base deficit | V (Verify) comprises only ~3.3% of all bases in SWE-agent traces. Resolved tasks have significantly more V bases than unresolved ones. |
| E→V transition bottleneck | The probability of transitioning from E (Execute) to V (Verify) is only 0.6% — agents almost never verify after executing. |
| Exploration spirals | Consecutive X runs (XXX+) are strongly associated with task failure. |
| Late planning is harmful | P bases appearing in the second half of execution correlate with lower success rates. |
| E-V pairing predicts success | The "golden path" pattern (Execute then Verify) is the strongest positive predictor. |
Installation
pip install base-sequence-toolkit
# With SWE-agent HuggingFace support:
pip install base-sequence-toolkit[swe-agent]
# For development:
pip install base-sequence-toolkit[dev]
Or install from source:
git clone https://github.com/user/base-sequence-toolkit.git
cd base-sequence-toolkit
pip install -e ".[dev,swe-agent]"
Quick Start
Analyze SWE-agent Trajectories
from base_sequence_toolkit.adapters.swe_agent import load_from_huggingface
from base_sequence_toolkit.core.analyzer import run_full_analysis
# Load and classify 500 SWE-agent trajectories
results = load_from_huggingface(max_records=500)
# Extract sequences and outcomes
sequences = [r.base_sequence for r in results]
outcomes = [r.resolved for r in results]
# Run comprehensive analysis
report = run_full_analysis(sequences, outcomes)
print(report.format_summary())
Classify Your Own Agent's Actions
from base_sequence_toolkit import classify_step, create_context, StepClassification, BaseType
ctx = create_context()
steps = [
StepClassification(tool_name="readFile", args={"path": "src/main.py"}),
StepClassification(tool_name="writeFile", args={"path": "src/main.py", "content": "..."}, status="success"),
StepClassification(tool_name="readFile", args={"path": "src/main.py"}), # read-after-write → V
StepClassification(tool_name="runCmd", shell_command="pytest tests/"),
]
for step in steps:
base = classify_step(step, ctx)
print(f"{step.tool_name:15s} → {base}")
# Output:
# readFile → X (first access to unknown file)
# writeFile → E (state-changing action)
# readFile → V (reading file we just wrote)
# runCmd → V (running tests after writes)
Analyze Custom Sequences
from base_sequence_toolkit.core.analyzer import (
compute_sequence_stats,
compute_transition_matrix,
extract_risk_profile,
find_discriminative_patterns,
)
# Compute stats for a single sequence
stats = compute_sequence_stats("XEEVXEV")
print(f"Length: {stats.length}")
print(f"EV pairs: {stats.ev_pairs}")
print(f"Has verification: {stats.has_verification}")
print(f"Switch rate: {stats.switch_rate}")
# Transition matrix across multiple sequences
matrix = compute_transition_matrix(["XEEVXEV", "XXXEEE", "XEVEV"])
print(matrix.format())
# Risk profile
risk = extract_risk_profile("XXXEEEEE")
print(f"Risk flags: {risk.flags()}")
# → ['consecutive_x≥3', 'no_verification']
# Find patterns that distinguish success from failure
patterns = find_discriminative_patterns(
resolved_sequences=["XEVEV", "XEEVE"],
unresolved_sequences=["XXXEE", "XXXXX"],
min_count=1,
)
for p in patterns[:5]:
print(f" {p.pattern}: lift={p.lift:.2f}")
CLI Usage
# Analyze SWE-agent trajectories
bst-analyze swe-agent -n 500 -o results/
# Analyze DunCrew execution traces
bst-analyze duncrew /path/to/exec_traces/ -o results/
# Analyze pre-computed JSON data
bst-analyze json data/my_sequences.json
Writing Your Own Adapter
To integrate with a new agent framework, implement a function that converts your trace format into StepClassification objects:
from base_sequence_toolkit import classify_step, create_context, StepClassification
def classify_my_agent_trace(trace: dict) -> str:
"""Convert your agent's trace into an XEPV base sequence."""
ctx = create_context()
bases = []
for action in trace["actions"]:
step = StepClassification(
tool_name=action["tool"],
args=action.get("arguments", {}),
status="success" if action["succeeded"] else "error",
shell_command=action.get("command"),
)
base = classify_step(step, ctx)
bases.append(str(base))
return "".join(bases)
Project Structure
base-sequence-toolkit/
├── base_sequence_toolkit/
│ ├── __init__.py
│ ├── cli.py # CLI entry point
│ ├── core/
│ │ ├── classifier.py # Generic XEPV classifier
│ │ └── analyzer.py # Analysis primitives
│ └── adapters/
│ ├── swe_agent.py # SWE-agent trajectory adapter
│ └── duncrew.py # DunCrew trace adapter
├── examples/
│ └── swe_agent_analysis.py # End-to-end example
├── tests/
│ ├── test_classifier.py
│ ├── test_analyzer.py
│ └── test_swe_agent.py
├── data/ # Pre-computed results (optional)
├── pyproject.toml
├── LICENSE
└── README.md
Citation
If you use this toolkit in your research, please cite:
@software{base_sequence_toolkit,
title = {Base Sequence Toolkit: XEPV Analysis Framework for AI Agent Execution Traces},
year = {2025},
url = {https://github.com/FatBy/base-sequence-toolkit}
}
License
MIT License. See LICENSE for details.
Related
- XEPV Framework: The base classification system described in our paper
- DunCrew: AI operating system with XEPV-based execution governance
- SWE-bench: Software engineering benchmark for AI agents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file base_sequence_toolkit-0.1.0.tar.gz.
File metadata
- Download URL: base_sequence_toolkit-0.1.0.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fae5d9f8a26aa44641e26a735073b75313bfbea7b8bbe63715d80d0c9df793d6
|
|
| MD5 |
e0e14d155e377d369fd256495058b4a4
|
|
| BLAKE2b-256 |
3cd6ffc248e2c80be1bf2e70b3b3e2098dfdc2340a84efade664d06038ac0089
|
File details
Details for the file base_sequence_toolkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: base_sequence_toolkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec8ad86ec62425b89634e28db201965ec3907e0162c166fd42290f5572bca22a
|
|
| MD5 |
2fb38f3bb385d94a3df4ee18594ebc2c
|
|
| BLAKE2b-256 |
c9523413fe25759f428c979831a81000f312def5fe006f93d0c60cfbe39899d6
|