Skip to main content

Agent session intelligence — behavioral analysis, grading, remediation, replay, and outcome testing for AI agent sessions

Project description

sesh

Agent session intelligence. Behavioral analysis, grading, replay, and outcome testing for AI coding agents.

sesh parses agent session transcripts, detects anti-patterns, grades sessions, replays sessions step by step, compares outcomes between sessions, and generates remediation patches. Built so agents can analyze themselves — and so humans can see what their agents are doing.

Three interfaces, one engine

Interface For How
CLI Developers in the terminal sesh reflect, sesh replay, sesh test, sesh fix
MCP Server Agents at runtime Add to your MCP config, agent calls sesh_reflect
Web Dashboard Humans who want observability sesh-web → browser dashboard on localhost

All three use the same analysis engine and database. Install once, use from anywhere.

Install

pip install agentsesh

Or from source:

git clone https://github.com/ateeples/agentsesh.git
cd agentsesh
pip install -e .

Quick start

# Initialize in your project
sesh init

# Auto-discover and ingest all Claude Code sessions
sesh watch --once

# See your most recent session analysis
sesh reflect

# Cross-session trends
sesh report

# Search past sessions
sesh search "authentication bug"

# Keep ingesting new sessions in the background
sesh watch

# Launch the dashboard
sesh-web

MCP Server (agent-native)

Add sesh to your agent's MCP configuration so it can self-analyze at runtime.

Claude Code (~/.claude/settings.json):

{
  "mcpServers": {
    "sesh": {
      "command": "sesh-mcp",
      "env": {
        "SESH_DB": "/path/to/your/project/.sesh/sesh.db"
      }
    }
  }
}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "sesh": {
      "command": "sesh-mcp",
      "env": {
        "SESH_DB": "/path/to/your/project/.sesh/sesh.db"
      }
    }
  }
}

Once configured, the agent has access to these tools:

Tool What it does
sesh_reflect Full session analysis — grade, patterns, tool usage
sesh_report Cross-session trends — improving, stable, or declining
sesh_handoff Structured handoff doc for session continuity
sesh_search Full-text search across all sessions
sesh_list List recent sessions with grades
sesh_stats Lifetime aggregate statistics
sesh_log Ingest a new transcript
sesh_sync Auto-discover and ingest new sessions
sesh_patterns Detailed pattern breakdown for a session

An agent can call sesh_sync then sesh_reflect at session start to auto-ingest new transcripts and review its last session, or sesh_report to see if it's been improving or repeating the same mistakes.

Web Dashboard

sesh-web                # http://127.0.0.1:7433
sesh-web --port 8080    # custom port

The dashboard shows:

  • Session grades and score trends
  • Grade distribution across all sessions
  • Tool usage breakdown
  • Recurring anti-patterns
  • Full-text session search

Auto-refreshes every 30 seconds — leave it open while your agent works.

Session replay

Reconstruct exactly what happened, step by step.

# Replay most recent session
sesh replay

# Replay with inline pattern annotations
sesh replay --annotate

# Show only errors — where did things go wrong?
sesh replay --errors

# Show only tool calls (no user/assistant text)
sesh replay --tools

# Zoom into a specific range
sesh replay --range 15-30

# Filter to specific tools
sesh replay --tool Edit,Bash

# Compact mode (no output previews)
sesh replay --compact

# Full output for every step
sesh replay -v

Replay prefers the original JSONL source file for full fidelity (user messages, assistant text, thinking blocks, complete tool output). Falls back to the database if the source file is gone (tool calls only, 300-char output previews).

Outcome testing

Compare what actually happened between sessions — not just process quality, but results.

# Compare two most recent sessions
sesh test

# Compare specific sessions
sesh test <baseline_id> <candidate_id>

Measures: error-retry loops, files reworked, rework edits, terminal error state, success rate, test/build/lint pass rates. Verdict: improved, regressed, mixed, or unchanged.

Remediation

Turn session analysis into actionable fixes.

# Full remediation report
sesh fix

# Output a CLAUDE.md patch (ready to paste)
sesh fix --patch

Each anti-pattern maps to specific remediation actions and a CLAUDE.md snippet. --patch generates a combined patch you can paste directly into your agent's instruction file.

What it detects

sesh runs 9 pattern detectors on every session:

Pattern What it catches
repeated_searches Same search query run multiple times
write_without_read Editing a file that was never read
error_rate Overall error percentage above threshold
error_streak Consecutive errors (agent stuck in a loop)
low_read_ratio Not enough reading relative to writing
bash_overuse Using bash for cat/grep/find when dedicated tools exist
write_then_read Writing before understanding (acted before reading)
scattered_files Touching too many directories (unfocused session)
missed_parallelism Sequential reads that could have been parallel

Grading

Sessions are scored 0–100 and graded A+ through F:

  • Start at 100, apply deductions for anti-patterns
  • Bonuses for strong read/write ratios and good parallelism
  • A+ (95+), A (90+), B (75+), C (60+), D (45+), F (<45)
  • All weights configurable via .sesh/config.json

CLI reference

sesh init                    Initialize .sesh/ in current directory
sesh log <file>              Ingest a session transcript
sesh log --dir <dir>         Batch ingest all transcripts in a directory
sesh reflect [session_id]    Analyze a session (default: most recent)
sesh replay [session_id]     Step-by-step session replay
sesh test [a] [b]            Compare outcome metrics between sessions
sesh fix [session_id]        Generate remediation recommendations
sesh report [--last N]       Cross-session trend analysis
sesh handoff [session_id]    Generate handoff document
sesh search <query>          Full-text search
sesh list [--last N]         List sessions
sesh stats                   Aggregate statistics
sesh export <session_id>     Export session as JSON
sesh watch [dirs...]         Auto-ingest new sessions (polls continuously)
sesh watch --once [dirs...]  One-shot scan and ingest

Replay flags:
  --errors                   Show only error steps
  --tools                    Show only tool calls
  --range 5-15               Show specific step range
  --tool Edit,Bash           Filter to specific tool(s)
  --annotate                 Show inline pattern annotations
  --compact                  No output previews
  -v, --verbose              Show full output for each step
  --db-only                  Skip source file, use DB only

Fix flags:
  --patch                    Output CLAUDE.md patch only

Watch flags:
  --interval N               Poll interval in seconds (default: 30)
  --settle N                 Seconds since last modification (default: 60)

Global flags:
  --json                     Output as JSON
  --db <path>                Override database path
  --quiet                    Suppress non-essential output

Configuration

sesh init creates .sesh/config.json with sensible defaults. Everything is tunable:

{
  "patterns": {
    "thresholds": {
      "error_rate_concern": 0.15,
      "bash_overuse_min": 3,
      "error_streak_min": 3
    }
  },
  "grading": {
    "error_rate_max_deduction": 20,
    "blind_edit_deduction": 5,
    "read_ratio_bonus": 5
  }
}

Supported formats

  • Claude Code (.jsonl) — fully supported
  • OpenAI Codex CLI (.jsonl) — fully supported (auto-detected, maps exec_command→Bash, apply_patch→Edit)
  • Generic — planned

Requirements

  • Python 3.10+
  • mcp package (for MCP server only)
  • No other external dependencies

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentsesh-0.6.0.tar.gz (64.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentsesh-0.6.0-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file agentsesh-0.6.0.tar.gz.

File metadata

  • Download URL: agentsesh-0.6.0.tar.gz
  • Upload date:
  • Size: 64.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentsesh-0.6.0.tar.gz
Algorithm Hash digest
SHA256 f671aefb996f3054a7f4dd0d0803c468957d2b5941e21e6e0054eea1f42d1a56
MD5 40db18303cc718e2dbb1e1e7d7355daf
BLAKE2b-256 f6dfe2d4fbbddab6a8f04d8c13425eed61a1e80ad179231b980c78aa1e5dd5b7

See more details on using hashes here.

File details

Details for the file agentsesh-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: agentsesh-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentsesh-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 371df51bc450425ac57d51eaccccfc7e44dbb89f6ac8bdd8601b2f83e291b550
MD5 ae005c3364e238a445e45dabfeb4d656
BLAKE2b-256 b9d4e342e60c00b76633bf7b03632e9b20ce62b98bf10c9bb56782f5ed38b83a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page