Library for analyzing AI agent trajectories — extract actions, summarize, embed, and visualize.
Project description
Hodoscope
Analyze AI agent trajectories: extract actions, summarize them with LLMs, embed the summaries, and create interactive visualizations to identify behavioral patterns.
Installation
pip install hodoscope
For development (editable install with tests):
pip install -e ".[dev]"
Configuration
Create a .env file in the project root:
OPENAI_API_KEY=your-openai-key
GEMINI_API_KEY=your-gemini-key
# Optional: override defaults
# SUMMARIZE_MODEL=openai/gpt-5.2
# EMBEDDING_MODEL=gemini/gemini-embedding-001
# EMBED_DIM=768
# MAX_WORKERS=10
Quick Start
CLI
# Analyze a single .eval file
hodoscope analyze run.eval
# Analyze all .eval files in a directory
hodoscope analyze evals/
# Visualize results
hodoscope viz run.hodoscope.json
# Compare models
hodoscope viz model_a.hodoscope.json model_b.hodoscope.json --group-by model
# Show metadata
hodoscope info run.hodoscope.json
Python API
import hodoscope
# Load trajectories from .eval file
trajectories, fields = hodoscope.load_eval("run.eval", limit=5, sample=True)
# Or from a directory of trajectory JSONs
trajectories, fields = hodoscope.load_trajectory_dir("path/to/samples/")
# Summarize + embed (requires API keys)
summaries = hodoscope.process_trajectories(trajectories, summarize_model="openai/gpt-4o")
# Extract actions only (no LLM calls)
actions = hodoscope.extract_actions(trajectories[0]["messages"])
# Group and visualize in-memory summaries
grouped = hodoscope.group_summaries_from_list(summaries, group_by="score")
hodoscope.visualize_action_summaries(grouped, "plots/", methods=["tsne"])
# Or save to disk and use the file-based workflow
hodoscope.write_analysis_json("output.hodoscope.json", summaries, fields, source="run.eval")
CLI Reference
hodoscope analyze
Process source files (.eval, directories, Docent collections) into .hodoscope.json analysis files.
hodoscope analyze SOURCES [OPTIONS]
Options:
--docent-id TEXT Docent collection ID as source
-o, --output TEXT Output JSON path (single source only)
--field TEXT KEY=VALUE metadata (repeatable)
-l, --limit INTEGER Limit trajectories per source
--save-samples PATH Save extracted trajectory JSONs to directory
--embed-dim INTEGER Embedding dimensionality (default: 768)
-m, --model-name TEXT Override auto-detected model name
--summarize-model TEXT LiteLLM model for summarization (default: openai/gpt-5.2)
--embedding-model TEXT LiteLLM model for embeddings (default: gemini/gemini-embedding-001)
--sample / --no-sample Randomly sample trajectories (use with --limit)
--seed INTEGER Random seed for --sample reproducibility
--resume / --no-resume Resume from existing output (default: on)
--reasoning-effort [low|medium|high] Reasoning effort for summarization model
--max-workers INTEGER Max parallel workers for LLM calls (default: 10)
Examples:
hodoscope analyze run.eval # .eval → analysis JSON
hodoscope analyze *.eval # batch: all .eval files
hodoscope analyze evals/ # batch: dir of .eval files
hodoscope analyze run.eval -o my_output.json # custom output path
hodoscope analyze run.eval --field env=prod # add custom metadata
hodoscope analyze run.eval --save-samples ./samples/ # save extracted trajectories
hodoscope analyze --docent-id COLLECTION_ID # docent source
hodoscope analyze path/to/samples/ # directory of trajectory JSONs
hodoscope analyze run.eval --summarize-model gemini/gemini-2.0-flash
hodoscope analyze run.eval --limit 5 --sample --seed 42
hodoscope analyze run.eval --no-resume # overwrite existing output
hodoscope viz
Visualize analysis JSON files with interactive plots. Groups summaries by any metadata field.
hodoscope viz SOURCES [OPTIONS]
Options:
--group-by TEXT Field to group by (default: model)
--plots TEXT Plot types: pca, tsne, umap, trimap, pacmap, dynamic, density
--output-dir TEXT Directory for HTML output files
--open Open the generated HTML in the default browser
Examples:
hodoscope viz output.json # visualize (groups by model)
hodoscope viz output.json --group-by task # group by task
hodoscope viz output.json --group-by score # group by score field
hodoscope viz *.json # batch: all JSONs
hodoscope viz a.json b.json --group-by model # cross-file comparison
hodoscope viz output.json --plots tsne umap # specific plot types
hodoscope viz output.json --open # open in default browser
hodoscope info
Show metadata, summary counts, and API key status for analysis JSON files.
hodoscope info output.json
hodoscope info results/
Python API Reference
The library exposes composable building blocks as first-class public functions. The CLI is a thin wrapper on top.
Loading trajectories
import hodoscope
# From .eval file (Inspect AI format)
trajectories, fields = hodoscope.load_eval("run.eval", limit=10, sample=True, seed=42)
# From directory of trajectory JSONs
trajectories, fields = hodoscope.load_trajectory_dir("path/to/samples/")
# From Docent collection
trajectories, fields = hodoscope.load_docent("COLLECTION_ID")
All loaders return (trajectories, fields) where trajectories is a list of trajectory dicts (each with a messages key) and fields is auto-detected file-level metadata. For .eval files, fields include model, task, dataset_name, solver, run_id, accuracy, and more.
Processing
# Full pipeline: extract actions → summarize with LLM → embed (requires API keys)
summaries = hodoscope.process_trajectories(
trajectories,
summarize_model="openai/gpt-4o", # optional, defaults from env/config
embedding_model="gemini/gemini-embedding-001",
embed_dim=768,
)
# Extract actions only (no LLM calls, pure data transform)
actions = hodoscope.extract_actions(trajectories[0]["messages"])
Grouping and visualization
# Group in-memory summaries by any metadata field
grouped = hodoscope.group_summaries_from_list(summaries, group_by="score")
# Or group from saved analysis files
doc = hodoscope.read_analysis_json("output.hodoscope.json")
grouped = hodoscope.group_summaries([doc], group_by="model")
# Visualize
hodoscope.visualize_action_summaries(grouped, "plots/", methods=["tsne", "pca"])
Saving results
hodoscope.write_analysis_json(
"output.hodoscope.json",
summaries=summaries,
fields=fields,
source="run.eval",
embedding_model="gemini/gemini-embedding-001",
embedding_dimensionality=768,
)
Output Format
Each hodoscope analyze run produces a .hodoscope.json file:
{
"version": 1,
"created_at": "2026-02-10T12:00:00Z",
"source": "path/to/run.eval",
"fields": {
"model": "gpt-5",
"task": "swe_bench",
"dataset_name": "swe_bench_verified",
"solver": "system_message, generate",
"accuracy": 0.8
},
"embedding_model": "gemini-embedding-001",
"embedding_dimensionality": 3072,
"summaries": [
{
"trajectory_id": "django__django-12345_epoch_1",
"turn_id": 3,
"summary": "Update assertion to match expected output",
"action_text": "...",
"embedding": "<base85-encoded float32 array>",
"metadata": {
"score": 1.0,
"epoch": 1,
"instance_id": "django__django-12345",
"target": "expected output",
"input_tokens": 620,
"output_tokens": 20,
"total_tokens": 640
}
}
]
}
Key concepts:
fields: File-level metadata auto-detected from .eval header (model, task, dataset_name, solver, run_id, accuracy, etc.) plus custom--fieldvalues. Same for all summaries.metadata: Per-trajectory metadata. Allsample.metadatakeys from .eval files are passed through, plus extracted keys (score, epoch, target, token usage, etc.). Varies per summary.--group-byresolution: Checks per-summarymetadatafirst, then file-levelfields.embedding: RFC 1924 base85-encodedfloat32numpy array.
Universal Trajectory Format
All trajectory sources are normalized to a canonical JSON schema before processing:
{
"id": "unique-trajectory-id",
"source": "eval",
"model": "gpt-5",
"input": "Task description...",
"messages": [{"role": "user", "content": "..."}],
"metadata": {
"epoch": 1,
"score": 1.0,
"instance_id": "django__django-12345",
"target": "expected output",
"input_tokens": 620,
"output_tokens": 20,
"total_tokens": 640,
"response_time": 1.23,
"label_confidence": 0.89
}
}
Testing
# Unit tests (no API keys needed)
pytest tests/test_io.py tests/test_viz.py tests/test_api.py
# End-to-end tests (requires API keys)
pytest tests/test_analyze.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hodoscope-0.2.2.tar.gz.
File metadata
- Download URL: hodoscope-0.2.2.tar.gz
- Upload date:
- Size: 64.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d3209ad5cd15b57e6dd32fb3464937f36cc8ce44ab583e7960bb6bfb1cd260c
|
|
| MD5 |
d30617e172a18b1a0550951df94c59c0
|
|
| BLAKE2b-256 |
d94eec5f9351b7aa3eab066ca661cd3e6096aba4dc7b0cf108c3536dfd8b8702
|
Provenance
The following attestation bundles were made for hodoscope-0.2.2.tar.gz:
Publisher:
publish.yml on AR-FORUM/hodoscope
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hodoscope-0.2.2.tar.gz -
Subject digest:
6d3209ad5cd15b57e6dd32fb3464937f36cc8ce44ab583e7960bb6bfb1cd260c - Sigstore transparency entry: 972509476
- Sigstore integration time:
-
Permalink:
AR-FORUM/hodoscope@179e07f0b78d52a78503f205525e4d520de7febc -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/AR-FORUM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@179e07f0b78d52a78503f205525e4d520de7febc -
Trigger Event:
push
-
Statement type:
File details
Details for the file hodoscope-0.2.2-py3-none-any.whl.
File metadata
- Download URL: hodoscope-0.2.2-py3-none-any.whl
- Upload date:
- Size: 60.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81d9be4d5789239bcadc87bf0e999d285ed4a7c4f20a661f3c993d4aa7e15acf
|
|
| MD5 |
ffb47d1317380f73c96fcea0165dde18
|
|
| BLAKE2b-256 |
9c10aa054b619642e6ac6c60bece1f7e4e5aa0e6333716103290574f3d9cb8f1
|
Provenance
The following attestation bundles were made for hodoscope-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on AR-FORUM/hodoscope
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hodoscope-0.2.2-py3-none-any.whl -
Subject digest:
81d9be4d5789239bcadc87bf0e999d285ed4a7c4f20a661f3c993d4aa7e15acf - Sigstore transparency entry: 972509481
- Sigstore integration time:
-
Permalink:
AR-FORUM/hodoscope@179e07f0b78d52a78503f205525e4d520de7febc -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/AR-FORUM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@179e07f0b78d52a78503f205525e4d520de7febc -
Trigger Event:
push
-
Statement type: