Postgres-backed eval scheduler for Harbor agent tasks — queuing, retries, and monitoring
Project description
Oddish CLI
Run Harbor tasks on local or hosted Oddish infrastructure.
oddish is a Python CLI for submitting Harbor tasks, running multi-trial sweeps,
monitoring experiments, and pulling logs and artifacts back to disk. If you
already use harbor run, Oddish adds persistent state, retries, queueing, and
better operational tooling around the same task format.
Python 3.12+ is required.
Quick Start
uv pip install oddish
# Hosted Oddish
export ODDISH_API_KEY="ok_..."
# For local/self-hosted Oddish instead:
# export ODDISH_API_URL="http://localhost:8000"
# Submit a run
oddish run -d swebench@1.0 -a codex -m openai/gpt-5.2 --n-trials 3
# Watch progress
oddish status
# Pull logs and artifacts locally
oddish pull <task_id> --watch
For hosted usage, the CLI targets Oddish Cloud by default and expects
ODDISH_API_KEY. For local/self-hosted usage, point the CLI at your API with
ODDISH_API_URL; localhost does not require auth by default.
Installation
uv pip install oddish
Common environment variables:
# Hosted Oddish
export ODDISH_API_KEY="ok_..."
# Local or self-hosted Oddish
export ODDISH_API_URL="http://localhost:8000"
# Optional dashboard override
export ODDISH_DASHBOARD_URL="https://www.oddish.app"
Need to deploy your own stack? See ../SELF_HOSTING.md.
Need package internals, architecture, or development notes? See AGENTS.md.
Commands
The installed console script is:
oddish --help
Available commands:
oddish runsubmits a task, dataset, or sweep configoddish statusshows system, task, or experiment statusoddish pulldownloads logs and artifact files locallyoddish cleandeletes task data or resets local infrastructure
oddish run
Use oddish run for:
- a single local Harbor task directory
- a local dataset directory containing multiple tasks
- a Harbor registry dataset via
--dataset - a YAML or JSON sweep config via
--config
Examples:
# Local task
oddish run ./my-task -a claude-code -m anthropic/claude-sonnet-4-5
# Local dataset
oddish run ./my-dataset -a codex -m openai/gpt-5.2 --n-trials 3
# Harbor registry dataset
oddish run -d swebench@1.0 -a codex -m openai/gpt-5.2 --n-trials 3
# Filter a dataset
oddish run -d swebench@1.0 -t "django__*" -l 10 -a claude-code
# Submit in the background
oddish run ./my-task -a claude-code --background
Common flags:
-a, --agentselects the agent-m, --modelselects the model--n-trialsruns multiple trials per task-d, --datasetpulls tasks from the Harbor registry-c, --configloads a YAML or JSON sweep config-t, --task-nameand-x, --exclude-task-namefilter tasks by glob-l, --n-taskslimits how many tasks run-e, --envselects the execution environment--experimentgroups runs into an explicit experiment--backgroundsubmits and returns immediately--run-analysisruns post-trial analysis and verdict generation
Supported --env values:
dockerdaytonae2bmodalrunloopgke
When --env is omitted:
- local API URLs default to
docker - hosted Oddish (
*.modal.run) defaults tomodal - other remote APIs default to
docker
Sweep Configs
oddish run -c sweep.yaml accepts YAML or JSON. A minimal config:
agents:
- name: claude-code
model_name: anthropic/claude-sonnet-4-5
n_trials: 3
- name: codex
model_name: openai/gpt-5.2
n_trials: 3
dataset: swebench@1.0
n_tasks: 10
priority: low
Per-agent overrides such as environment variables, kwargs, and timeouts are passed through Harbor agent config fields.
oddish status
Examples:
# System overview
oddish status
# System overview with extra queue details
oddish status --verbose
# Watch a task
oddish status <task_id> --watch
# Watch an experiment
oddish status --experiment <experiment_id> --watch
oddish pull
Examples:
# Pull one trial
oddish pull <trial_id>
# Keep syncing a task while it runs
oddish pull <task_id> --watch --interval 5
# Pull an entire experiment, including task files
oddish pull <experiment_id> --include-task-files
By default, pull output is written to ./oddish-pulls/<target>.
oddish clean
Examples:
# Delete a task and its trials
oddish clean <task_id>
# Delete an entire experiment
oddish clean --experiment <experiment_id>
# Stop local infrastructure but keep data
oddish clean --stop-only
Typical Workflow
# 1. Submit a run
oddish run -d swebench@1.0 -a claude-code -m anthropic/claude-sonnet-4-5
# 2. Inspect or watch it
oddish status
oddish status <task_id> --watch
# 3. Pull outputs when you want them locally
oddish pull <task_id> --watch
More Technical Docs
- Package internals and implementation notes:
AGENTS.md - Self-hosting and deployment:
../SELF_HOSTING.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oddish-0.1.3.tar.gz.
File metadata
- Download URL: oddish-0.1.3.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d1cfd077caa59ad74e11b050cb9e701a507b6ceb3bb1cdf42bb58f1355e96d9
|
|
| MD5 |
b393063049fc5da3fcb4a954d5366fc1
|
|
| BLAKE2b-256 |
e5d1b5633333edb2ff77a43121962088bfeaffc53eeb20d68283a7ce8c6eb6d8
|
File details
Details for the file oddish-0.1.3-py3-none-any.whl.
File metadata
- Download URL: oddish-0.1.3-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34c3b128c3913d69ef1dc1a1dc383a09d75ab06befd51920d7953dd303792c8b
|
|
| MD5 |
d0b6cbc28a805c32f506f3ad88b38d57
|
|
| BLAKE2b-256 |
e664ffc36025d6fdb0fd08b9f14f191fa5e5b0f21e4de579d97113ac2062ccbb
|