Skip to main content

Library for analyzing AI agent trajectories — extract actions, summarize, embed, and visualize.

Project description

Hodoscope

Analyze AI agent trajectories: extract actions, summarize them with LLMs, embed the summaries, and create interactive visualizations to identify behavioral patterns.

Installation

pip install hodoscope

For development (editable install with tests):

pip install -e ".[dev]"

Configuration

Create a .env file in the project root:

OPENAI_API_KEY=your-openai-key
GEMINI_API_KEY=your-gemini-key

# Optional: override defaults
# SUMMARIZE_MODEL=openai/gpt-5.2
# EMBEDDING_MODEL=gemini/gemini-embedding-001
# EMBED_DIM=768
# MAX_WORKERS=10

Quick Start

CLI

# Analyze a single .eval file
hodoscope analyze run.eval

# Analyze all .eval files in a directory
hodoscope analyze evals/

# Visualize results
hodoscope viz run.hodoscope.json

# Compare models
hodoscope viz model_a.hodoscope.json model_b.hodoscope.json --group-by model

# Show metadata
hodoscope info run.hodoscope.json

Python API

import hodoscope

# Load trajectories from .eval file
trajectories, fields = hodoscope.load_eval("run.eval", limit=5, sample=True)

# Or from a directory of trajectory JSONs
trajectories, fields = hodoscope.load_trajectory_dir("path/to/samples/")

# Summarize + embed (requires API keys)
summaries = hodoscope.process_trajectories(trajectories, summarize_model="openai/gpt-4o")

# Extract actions only (no LLM calls)
actions = hodoscope.extract_actions(trajectories[0]["messages"])

# Group and visualize in-memory summaries
grouped = hodoscope.group_summaries_from_list(summaries, group_by="score")
hodoscope.visualize_action_summaries(grouped, "plots/", methods=["tsne"])

# Or save to disk and use the file-based workflow
hodoscope.write_analysis_json("output.hodoscope.json", summaries, fields, source="run.eval")

CLI Reference

hodoscope analyze

Process source files (.eval, directories, Docent collections) into .hodoscope.json analysis files.

hodoscope analyze SOURCES [OPTIONS]

Options:
  --docent-id TEXT                Docent collection ID as source
  -o, --output TEXT               Output JSON path (single source only)
  --field TEXT                    KEY=VALUE metadata (repeatable)
  -l, --limit INTEGER            Limit trajectories per source
  --save-samples PATH            Save extracted trajectory JSONs to directory
  --embed-dim INTEGER             Embedding dimensionality (default: 768)
  -m, --model-name TEXT           Override auto-detected model name
  --summarize-model TEXT          LiteLLM model for summarization (default: openai/gpt-5.2)
  --embedding-model TEXT          LiteLLM model for embeddings (default: gemini/gemini-embedding-001)
  --sample / --no-sample          Randomly sample trajectories (use with --limit)
  --seed INTEGER                  Random seed for --sample reproducibility
  --resume / --no-resume          Resume from existing output (default: on)
  --reasoning-effort [low|medium|high]  Reasoning effort for summarization model
  --max-workers INTEGER           Max parallel workers for LLM calls (default: 10)

Examples:

hodoscope analyze run.eval                             # .eval → analysis JSON
hodoscope analyze *.eval                               # batch: all .eval files
hodoscope analyze evals/                               # batch: dir of .eval files
hodoscope analyze run.eval -o my_output.json           # custom output path
hodoscope analyze run.eval --field env=prod            # add custom metadata
hodoscope analyze run.eval --save-samples ./samples/   # save extracted trajectories
hodoscope analyze --docent-id COLLECTION_ID            # docent source
hodoscope analyze path/to/samples/                     # directory of trajectory JSONs
hodoscope analyze run.eval --summarize-model gemini/gemini-2.0-flash
hodoscope analyze run.eval --limit 5 --sample --seed 42
hodoscope analyze run.eval --no-resume                 # overwrite existing output

hodoscope viz

Visualize analysis JSON files with interactive plots. Groups summaries by any metadata field.

hodoscope viz SOURCES [OPTIONS]

Options:
  --group-by TEXT     Field to group by (default: model)
  --plots TEXT        Plot types: pca, tsne, umap, trimap, pacmap, dynamic, density
  --output-dir TEXT   Directory for HTML output files
  --open              Open the generated HTML in the default browser

Examples:

hodoscope viz output.json                              # visualize (groups by model)
hodoscope viz output.json --group-by task              # group by task
hodoscope viz output.json --group-by score             # group by score field
hodoscope viz *.json                                   # batch: all JSONs
hodoscope viz a.json b.json --group-by model           # cross-file comparison
hodoscope viz output.json --plots tsne umap            # specific plot types
hodoscope viz output.json --open                       # open in default browser

hodoscope info

Show metadata, summary counts, and API key status for analysis JSON files.

hodoscope info output.json
hodoscope info results/

Python API Reference

The library exposes composable building blocks as first-class public functions. The CLI is a thin wrapper on top.

Loading trajectories

import hodoscope

# From .eval file (Inspect AI format)
trajectories, fields = hodoscope.load_eval("run.eval", limit=10, sample=True, seed=42)

# From directory of trajectory JSONs
trajectories, fields = hodoscope.load_trajectory_dir("path/to/samples/")

# From Docent collection
trajectories, fields = hodoscope.load_docent("COLLECTION_ID")

All loaders return (trajectories, fields) where trajectories is a list of trajectory dicts (each with a messages key) and fields is auto-detected file-level metadata. For .eval files, fields include model, task, dataset_name, solver, run_id, accuracy, and more.

Processing

# Full pipeline: extract actions → summarize with LLM → embed (requires API keys)
summaries = hodoscope.process_trajectories(
    trajectories,
    summarize_model="openai/gpt-4o",       # optional, defaults from env/config
    embedding_model="gemini/gemini-embedding-001",
    embed_dim=768,
)

# Extract actions only (no LLM calls, pure data transform)
actions = hodoscope.extract_actions(trajectories[0]["messages"])

Grouping and visualization

# Group in-memory summaries by any metadata field
grouped = hodoscope.group_summaries_from_list(summaries, group_by="score")

# Or group from saved analysis files
doc = hodoscope.read_analysis_json("output.hodoscope.json")
grouped = hodoscope.group_summaries([doc], group_by="model")

# Visualize
hodoscope.visualize_action_summaries(grouped, "plots/", methods=["tsne", "pca"])

Saving results

hodoscope.write_analysis_json(
    "output.hodoscope.json",
    summaries=summaries,
    fields=fields,
    source="run.eval",
    embedding_model="gemini/gemini-embedding-001",
    embedding_dimensionality=768,
)

Output Format

Each hodoscope analyze run produces a .hodoscope.json file:

{
  "version": 1,
  "created_at": "2026-02-10T12:00:00Z",
  "source": "path/to/run.eval",
  "fields": {
    "model": "gpt-5",
    "task": "swe_bench",
    "dataset_name": "swe_bench_verified",
    "solver": "system_message, generate",
    "accuracy": 0.8
  },
  "embedding_model": "gemini-embedding-001",
  "embedding_dimensionality": 3072,
  "summaries": [
    {
      "trajectory_id": "django__django-12345_epoch_1",
      "turn_id": 3,
      "summary": "Update assertion to match expected output",
      "action_text": "...",
      "embedding": "<base85-encoded float32 array>",
      "metadata": {
        "score": 1.0,
        "epoch": 1,
        "instance_id": "django__django-12345",
        "target": "expected output",
        "input_tokens": 620,
        "output_tokens": 20,
        "total_tokens": 640
      }
    }
  ]
}

Key concepts:

  • fields: File-level metadata auto-detected from .eval header (model, task, dataset_name, solver, run_id, accuracy, etc.) plus custom --field values. Same for all summaries.
  • metadata: Per-trajectory metadata. All sample.metadata keys from .eval files are passed through, plus extracted keys (score, epoch, target, token usage, etc.). Varies per summary.
  • --group-by resolution: Checks per-summary metadata first, then file-level fields.
  • embedding: RFC 1924 base85-encoded float32 numpy array.

Universal Trajectory Format

All trajectory sources are normalized to a canonical JSON schema before processing:

{
  "id": "unique-trajectory-id",
  "source": "eval",
  "model": "gpt-5",
  "input": "Task description...",
  "messages": [{"role": "user", "content": "..."}],
  "metadata": {
    "epoch": 1,
    "score": 1.0,
    "instance_id": "django__django-12345",
    "target": "expected output",
    "input_tokens": 620,
    "output_tokens": 20,
    "total_tokens": 640,
    "response_time": 1.23,
    "label_confidence": 0.89
  }
}

Testing

# Unit tests (no API keys needed)
pytest tests/test_io.py tests/test_viz.py tests/test_api.py

# End-to-end tests (requires API keys)
pytest tests/test_analyze.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hodoscope-0.2.1.tar.gz (64.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hodoscope-0.2.1-py3-none-any.whl (60.4 kB view details)

Uploaded Python 3

File details

Details for the file hodoscope-0.2.1.tar.gz.

File metadata

  • Download URL: hodoscope-0.2.1.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hodoscope-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f444519b3367338845b79ed9d5f825fc17fde692337bfa0b010a94a8f256bc47
MD5 60332750bbb3044c8bb51e8574136b18
BLAKE2b-256 f8cbd43577b054b14f603eb6e6ae3f406e5214e2ef3fe63c7b33195b11c21548

See more details on using hashes here.

Provenance

The following attestation bundles were made for hodoscope-0.2.1.tar.gz:

Publisher: publish.yml on AR-FORUM/hodoscope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hodoscope-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: hodoscope-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hodoscope-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0d2988c4811d1e6df470a8679593f15fce4ccf347f7c7bb508cedad97cc994ed
MD5 d5991b9cb962d2308db6081e4509f75a
BLAKE2b-256 eb3b1b197d394ed3972656a6be7587e8bc2ed4bee5504dcda0f180f60b2fed46

See more details on using hashes here.

Provenance

The following attestation bundles were made for hodoscope-0.2.1-py3-none-any.whl:

Publisher: publish.yml on AR-FORUM/hodoscope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page