Skip to main content

Library for analyzing AI agent trajectories — extract actions, summarize, embed, and visualize.

Project description

Hodoscope

Analyze AI agent trajectories: extract actions, summarize them with LLMs, embed the summaries, and create interactive visualizations to identify behavioral patterns.

Installation

pip install hodoscope

For development (editable install with tests):

pip install -e ".[dev]"

Configuration

Create a .env file in the project root:

OPENAI_API_KEY=your-openai-key
GEMINI_API_KEY=your-gemini-key

# Optional: override defaults
# SUMMARIZE_MODEL=openai/gpt-5.2
# EMBEDDING_MODEL=gemini/gemini-embedding-001
# EMBED_DIM=768
# MAX_WORKERS=10

Quick Start

CLI

# Analyze a single .eval file
hodoscope analyze run.eval

# Analyze all .eval files in a directory
hodoscope analyze evals/

# Visualize results
hodoscope viz run.hodoscope.json

# Compare models
hodoscope viz model_a.hodoscope.json model_b.hodoscope.json --group-by model

# Show metadata
hodoscope info run.hodoscope.json

Python API

import hodoscope

# Load trajectories from .eval file
trajectories, fields = hodoscope.load_eval("run.eval", limit=5, sample=True)

# Or from a directory of trajectory JSONs
trajectories, fields = hodoscope.load_trajectory_dir("path/to/samples/")

# Summarize + embed (requires API keys)
summaries = hodoscope.process_trajectories(trajectories, summarize_model="openai/gpt-4o")

# Extract actions only (no LLM calls)
actions = hodoscope.extract_actions(trajectories[0]["messages"])

# Group and visualize in-memory summaries
grouped = hodoscope.group_summaries_from_list(summaries, group_by="score")
hodoscope.visualize_action_summaries(grouped, "plots/", methods=["tsne"])

# Or save to disk and use the file-based workflow
hodoscope.write_analysis_json("output.hodoscope.json", summaries, fields, source="run.eval")

CLI Reference

hodoscope analyze

Process source files (.eval, directories, Docent collections) into .hodoscope.json analysis files.

hodoscope analyze SOURCES [OPTIONS]

Options:
  --docent-id TEXT                Docent collection ID as source
  -o, --output TEXT               Output JSON path (single source only)
  --field TEXT                    KEY=VALUE metadata (repeatable)
  -l, --limit INTEGER            Limit trajectories per source
  --save-samples PATH            Save extracted trajectory JSONs to directory
  --embed-dim INTEGER             Embedding dimensionality (default: 768)
  -m, --model-name TEXT           Override auto-detected model name
  --summarize-model TEXT          LiteLLM model for summarization (default: openai/gpt-5.2)
  --embedding-model TEXT          LiteLLM model for embeddings (default: gemini/gemini-embedding-001)
  --sample / --no-sample          Randomly sample trajectories (use with --limit)
  --seed INTEGER                  Random seed for --sample reproducibility
  --resume / --no-resume          Resume from existing output (default: on)
  --reasoning-effort [low|medium|high]  Reasoning effort for summarization model
  --max-workers INTEGER           Max parallel workers for LLM calls (default: 10)

Examples:

hodoscope analyze run.eval                             # .eval → analysis JSON
hodoscope analyze *.eval                               # batch: all .eval files
hodoscope analyze evals/                               # batch: dir of .eval files
hodoscope analyze run.eval -o my_output.json           # custom output path
hodoscope analyze run.eval --field env=prod            # add custom metadata
hodoscope analyze run.eval --save-samples ./samples/   # save extracted trajectories
hodoscope analyze --docent-id COLLECTION_ID            # docent source
hodoscope analyze path/to/samples/                     # directory of trajectory JSONs
hodoscope analyze run.eval --summarize-model gemini/gemini-2.0-flash
hodoscope analyze run.eval --limit 5 --sample --seed 42
hodoscope analyze run.eval --no-resume                 # overwrite existing output

hodoscope viz

Visualize analysis JSON files with interactive plots. Groups summaries by any metadata field.

hodoscope viz SOURCES [OPTIONS]

Options:
  --group-by TEXT     Field to group by (default: model)
  --plots TEXT        Plot types: pca, tsne, umap, trimap, pacmap, dynamic, density
  --output-dir TEXT   Directory for HTML output files
  --open              Open the generated HTML in the default browser

Examples:

hodoscope viz output.json                              # visualize (groups by model)
hodoscope viz output.json --group-by task              # group by task
hodoscope viz output.json --group-by score             # group by score field
hodoscope viz *.json                                   # batch: all JSONs
hodoscope viz a.json b.json --group-by model           # cross-file comparison
hodoscope viz output.json --plots tsne umap            # specific plot types
hodoscope viz output.json --open                       # open in default browser

hodoscope info

Show metadata, summary counts, and API key status for analysis JSON files.

hodoscope info output.json
hodoscope info results/

Python API Reference

The library exposes composable building blocks as first-class public functions. The CLI is a thin wrapper on top.

Loading trajectories

import hodoscope

# From .eval file (Inspect AI format)
trajectories, fields = hodoscope.load_eval("run.eval", limit=10, sample=True, seed=42)

# From directory of trajectory JSONs
trajectories, fields = hodoscope.load_trajectory_dir("path/to/samples/")

# From Docent collection
trajectories, fields = hodoscope.load_docent("COLLECTION_ID")

All loaders return (trajectories, fields) where trajectories is a list of trajectory dicts (each with a messages key) and fields is auto-detected file-level metadata. For .eval files, fields include model, task, dataset_name, solver, run_id, accuracy, and more.

Processing

# Full pipeline: extract actions → summarize with LLM → embed (requires API keys)
summaries = hodoscope.process_trajectories(
    trajectories,
    summarize_model="openai/gpt-4o",       # optional, defaults from env/config
    embedding_model="gemini/gemini-embedding-001",
    embed_dim=768,
)

# Extract actions only (no LLM calls, pure data transform)
actions = hodoscope.extract_actions(trajectories[0]["messages"])

Grouping and visualization

# Group in-memory summaries by any metadata field
grouped = hodoscope.group_summaries_from_list(summaries, group_by="score")

# Or group from saved analysis files
doc = hodoscope.read_analysis_json("output.hodoscope.json")
grouped = hodoscope.group_summaries([doc], group_by="model")

# Visualize
hodoscope.visualize_action_summaries(grouped, "plots/", methods=["tsne", "pca"])

Saving results

hodoscope.write_analysis_json(
    "output.hodoscope.json",
    summaries=summaries,
    fields=fields,
    source="run.eval",
    embedding_model="gemini/gemini-embedding-001",
    embedding_dimensionality=768,
)

Output Format

Each hodoscope analyze run produces a .hodoscope.json file:

{
  "version": 1,
  "created_at": "2026-02-10T12:00:00Z",
  "source": "path/to/run.eval",
  "fields": {
    "model": "gpt-5",
    "task": "swe_bench",
    "dataset_name": "swe_bench_verified",
    "solver": "system_message, generate",
    "accuracy": 0.8
  },
  "embedding_model": "gemini-embedding-001",
  "embedding_dimensionality": 3072,
  "summaries": [
    {
      "trajectory_id": "django__django-12345_epoch_1",
      "turn_id": 3,
      "summary": "Update assertion to match expected output",
      "action_text": "...",
      "embedding": "<base85-encoded float32 array>",
      "metadata": {
        "score": 1.0,
        "epoch": 1,
        "instance_id": "django__django-12345",
        "target": "expected output",
        "input_tokens": 620,
        "output_tokens": 20,
        "total_tokens": 640
      }
    }
  ]
}

Key concepts:

  • fields: File-level metadata auto-detected from .eval header (model, task, dataset_name, solver, run_id, accuracy, etc.) plus custom --field values. Same for all summaries.
  • metadata: Per-trajectory metadata. All sample.metadata keys from .eval files are passed through, plus extracted keys (score, epoch, target, token usage, etc.). Varies per summary.
  • --group-by resolution: Checks per-summary metadata first, then file-level fields.
  • embedding: RFC 1924 base85-encoded float32 numpy array.

Universal Trajectory Format

All trajectory sources are normalized to a canonical JSON schema before processing:

{
  "id": "unique-trajectory-id",
  "source": "eval",
  "model": "gpt-5",
  "input": "Task description...",
  "messages": [{"role": "user", "content": "..."}],
  "metadata": {
    "epoch": 1,
    "score": 1.0,
    "instance_id": "django__django-12345",
    "target": "expected output",
    "input_tokens": 620,
    "output_tokens": 20,
    "total_tokens": 640,
    "response_time": 1.23,
    "label_confidence": 0.89
  }
}

Testing

# Unit tests (no API keys needed)
pytest tests/test_io.py tests/test_viz.py tests/test_api.py

# End-to-end tests (requires API keys)
pytest tests/test_analyze.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hodoscope-0.2.3.tar.gz (65.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hodoscope-0.2.3-py3-none-any.whl (60.9 kB view details)

Uploaded Python 3

File details

Details for the file hodoscope-0.2.3.tar.gz.

File metadata

  • Download URL: hodoscope-0.2.3.tar.gz
  • Upload date:
  • Size: 65.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hodoscope-0.2.3.tar.gz
Algorithm Hash digest
SHA256 1be128dbc03418647134ff7f521befd5335be0efb4c3ef8861a0b1e32ba8223f
MD5 f7c531cbf8f9033b121373f7fa67ad1a
BLAKE2b-256 9e7b84dc278f546e83fa52ada506bec53651ddc87c59320e0e4577a381b824f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for hodoscope-0.2.3.tar.gz:

Publisher: publish.yml on AR-FORUM/hodoscope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hodoscope-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: hodoscope-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 60.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hodoscope-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ee000205e341bad5b8b9d3a63afa991df69824be4f6b4740bcef799a20f88a41
MD5 2fe73381c9d183d28f1be6809eccf2cf
BLAKE2b-256 7d08927bb4cdf7d327c5472597ab49fc4b8e84fa42452dda8d79c00df3563522

See more details on using hashes here.

Provenance

The following attestation bundles were made for hodoscope-0.2.3-py3-none-any.whl:

Publisher: publish.yml on AR-FORUM/hodoscope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page