Skip to main content

XEPV Base Sequence Analysis Toolkit for AI Agent Execution Traces

Project description

Base Sequence Toolkit

XEPV Base Sequence Analysis Framework for AI Agent Execution Traces

Python 3.10+ License: MIT

What is this?

When an AI agent solves a task, it executes a sequence of actions: reading files, writing code, running tests, searching for information. Base Sequence Toolkit classifies each action into one of four base types:

Base Name Description Examples
X eXplore Information gathering readFile, webSearch, ls, grep
E Execute State-changing actions writeFile, edit, npm install
P Plan Pure reasoning/strategy LLM thinking without tool calls
V Verify Validation & testing pytest, tsc --noEmit, read-after-write

A task execution becomes a base sequence like XEEVXEV — a compact fingerprint of the agent's behavioral strategy.

This toolkit provides:

  • Classifier: Map tool calls → XEPV base types (stateful, context-aware)
  • Analyzer: N-gram patterns, transition matrices, positional effects, risk profiles
  • Adapters: Ready-made integrations for SWE-agent and DunCrew trace formats
  • CLI: One-command analysis from terminal

Key Findings

Analysis of 2,000+ SWE-agent trajectories and 500+ DunCrew execution traces revealed:

Finding Detail
V-base deficit V (Verify) comprises only ~3.3% of all bases in SWE-agent traces. Resolved tasks have significantly more V bases than unresolved ones.
E→V transition bottleneck The probability of transitioning from E (Execute) to V (Verify) is only 0.6% — agents almost never verify after executing.
Exploration spirals Consecutive X runs (XXX+) are strongly associated with task failure.
Late planning is harmful P bases appearing in the second half of execution correlate with lower success rates.
E-V pairing predicts success The "golden path" pattern (Execute then Verify) is the strongest positive predictor.

Installation

pip install base-sequence-toolkit

# With SWE-agent HuggingFace support:
pip install base-sequence-toolkit[swe-agent]

# For development:
pip install base-sequence-toolkit[dev]

Or install from source:

git clone https://github.com/user/base-sequence-toolkit.git
cd base-sequence-toolkit
pip install -e ".[dev,swe-agent]"

Quick Start

Analyze SWE-agent Trajectories

from base_sequence_toolkit.adapters.swe_agent import load_from_huggingface
from base_sequence_toolkit.core.analyzer import run_full_analysis

# Load and classify 500 SWE-agent trajectories
results = load_from_huggingface(max_records=500)

# Extract sequences and outcomes
sequences = [r.base_sequence for r in results]
outcomes = [r.resolved for r in results]

# Run comprehensive analysis
report = run_full_analysis(sequences, outcomes)
print(report.format_summary())

Classify Your Own Agent's Actions

from base_sequence_toolkit import classify_step, create_context, StepClassification, BaseType

ctx = create_context()

steps = [
    StepClassification(tool_name="readFile", args={"path": "src/main.py"}),
    StepClassification(tool_name="writeFile", args={"path": "src/main.py", "content": "..."}, status="success"),
    StepClassification(tool_name="readFile", args={"path": "src/main.py"}),  # read-after-write → V
    StepClassification(tool_name="runCmd", shell_command="pytest tests/"),
]

for step in steps:
    base = classify_step(step, ctx)
    print(f"{step.tool_name:15s}{base}")

# Output:
# readFile        → X  (first access to unknown file)
# writeFile       → E  (state-changing action)
# readFile        → V  (reading file we just wrote)
# runCmd          → V  (running tests after writes)

Analyze Custom Sequences

from base_sequence_toolkit.core.analyzer import (
    compute_sequence_stats,
    compute_transition_matrix,
    extract_risk_profile,
    find_discriminative_patterns,
)

# Compute stats for a single sequence
stats = compute_sequence_stats("XEEVXEV")
print(f"Length: {stats.length}")
print(f"EV pairs: {stats.ev_pairs}")
print(f"Has verification: {stats.has_verification}")
print(f"Switch rate: {stats.switch_rate}")

# Transition matrix across multiple sequences
matrix = compute_transition_matrix(["XEEVXEV", "XXXEEE", "XEVEV"])
print(matrix.format())

# Risk profile
risk = extract_risk_profile("XXXEEEEE")
print(f"Risk flags: {risk.flags()}")
# → ['consecutive_x≥3', 'no_verification']

# Find patterns that distinguish success from failure
patterns = find_discriminative_patterns(
    resolved_sequences=["XEVEV", "XEEVE"],
    unresolved_sequences=["XXXEE", "XXXXX"],
    min_count=1,
)
for p in patterns[:5]:
    print(f"  {p.pattern}: lift={p.lift:.2f}")

CLI Usage

# Analyze SWE-agent trajectories
bst-analyze swe-agent -n 500 -o results/

# Analyze DunCrew execution traces
bst-analyze duncrew /path/to/exec_traces/ -o results/

# Analyze pre-computed JSON data
bst-analyze json data/my_sequences.json

Writing Your Own Adapter

To integrate with a new agent framework, implement a function that converts your trace format into StepClassification objects:

from base_sequence_toolkit import classify_step, create_context, StepClassification

def classify_my_agent_trace(trace: dict) -> str:
    """Convert your agent's trace into an XEPV base sequence."""
    ctx = create_context()
    bases = []

    for action in trace["actions"]:
        step = StepClassification(
            tool_name=action["tool"],
            args=action.get("arguments", {}),
            status="success" if action["succeeded"] else "error",
            shell_command=action.get("command"),
        )
        base = classify_step(step, ctx)
        bases.append(str(base))

    return "".join(bases)

Project Structure

base-sequence-toolkit/
├── base_sequence_toolkit/
│   ├── __init__.py
│   ├── cli.py                    # CLI entry point
│   ├── core/
│   │   ├── classifier.py         # Generic XEPV classifier
│   │   └── analyzer.py           # Analysis primitives
│   └── adapters/
│       ├── swe_agent.py          # SWE-agent trajectory adapter
│       └── duncrew.py            # DunCrew trace adapter
├── examples/
│   └── swe_agent_analysis.py     # End-to-end example
├── tests/
│   ├── test_classifier.py
│   ├── test_analyzer.py
│   └── test_swe_agent.py
├── data/                         # Pre-computed results (optional)
├── pyproject.toml
├── LICENSE
└── README.md

Citation

If you use this toolkit in your research, please cite:

@software{base_sequence_toolkit,
  title = {Base Sequence Toolkit: XEPV Analysis Framework for AI Agent Execution Traces},
  year = {2025},
  url = {https://github.com/FatBy/base-sequence-toolkit}
}

License

MIT License. See LICENSE for details.

Related

  • XEPV Framework: The base classification system described in our paper
  • DunCrew: AI operating system with XEPV-based execution governance
  • SWE-bench: Software engineering benchmark for AI agents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

base_sequence_toolkit-0.1.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

base_sequence_toolkit-0.1.0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file base_sequence_toolkit-0.1.0.tar.gz.

File metadata

  • Download URL: base_sequence_toolkit-0.1.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for base_sequence_toolkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fae5d9f8a26aa44641e26a735073b75313bfbea7b8bbe63715d80d0c9df793d6
MD5 e0e14d155e377d369fd256495058b4a4
BLAKE2b-256 3cd6ffc248e2c80be1bf2e70b3b3e2098dfdc2340a84efade664d06038ac0089

See more details on using hashes here.

File details

Details for the file base_sequence_toolkit-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for base_sequence_toolkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec8ad86ec62425b89634e28db201965ec3907e0162c166fd42290f5572bca22a
MD5 2fb38f3bb385d94a3df4ee18594ebc2c
BLAKE2b-256 c9523413fe25759f428c979831a81000f312def5fe006f93d0c60cfbe39899d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page