Skip to main content

Postgres-backed eval scheduler for Harbor agent tasks — queuing, retries, and monitoring

Project description

Oddish Core Library

This README is focused on implementation details for the oddish package.

Deep Technical Documentation

This covers architecture, configuration, and operational details.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  CLI (oddish run/status) or API Client                      │
│  - Uses env vars for API URL + auth                         │
│  - Submits tasks via HTTP                                   │
│  - Watches tasks or experiments via CLI status              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  FastAPI Server (python -m oddish.api)                      │
│  - POST /tasks/upload, /tasks/sweep                         │
│  - GET /tasks, /tasks/{id}, /trials/{id}/logs               │
│  - Auto-starts workers by default                           │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Postgres                                                   │
│  - experiments table (grouping + sharing metadata)          │
│  - tasks table (task metadata + verdict)                    │
│  - trials table (runs + analysis)                           │
│  - pgqueuer tables (job queue)                              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  PGQueuer Workers                                           │
│  - Poll queue via SELECT FOR UPDATE SKIP LOCKED             │
│  - Queue-key concurrency limits                             │
│  - Execute trials, analyses, and verdict jobs               │
└─────────────────────────────────────────────────────────────┘

Local Development

Quick start (recommended)

cp env.example .env
docker compose up -d db
uv sync
uv run python -m oddish.db setup
uv run python -m oddish.api

This starts Postgres, runs the API with workers, and is a good baseline for local development.

CLI configuration

The CLI can talk to either:

  • Local API: http://localhost:8000 (self-hosted)
  • Hosted API (optional): https://abundant-ai--api.modal.run

For local use, point the CLI at your server:

export ODDISH_API_URL="http://localhost:8000"

The local API does not enforce auth by default. For the hosted API, set an API key:

export ODDISH_API_URL="https://abundant-ai--api.modal.run"
export ODDISH_API_KEY="ok_..."

CLI config precedence

The CLI resolves API settings in this order:

  1. ODDISH_API_URL / ODDISH_API_KEY / ODDISH_DASHBOARD_URL
  2. ODDISH_DEFAULT_API_URL / ODDISH_DEFAULT_DASHBOARD_URL
  3. Built-in defaults:
    • API: https://abundant-ai--api.modal.run
    • Dashboard: https://www.oddish.app

Database setup commands

uv run python -m oddish.db setup            # Full setup (Alembic + PGQueuer)
uv run python -m oddish.db init             # Run Alembic migrations only
uv run python -m oddish.db install-pgqueuer # Install PGQueuer tables only
uv run python -m oddish.db reset            # Drop and recreate all tables
uv run python -m oddish.db purge            # Delete all data (preserves schema)

API flags

# Set queue-key concurrency
uv run python -m oddish.api --n-concurrent '{"openai/gpt-5.2": 8, "anthropic/claude-sonnet-4-5": 8}'

# Custom host/port
uv run python -m oddish.api --host 0.0.0.0 --port 9000

API Endpoints

Method Endpoint Description
GET /health Health check
GET /docs Swagger UI
GET /tasks List tasks
GET /tasks/{task_id} Task details with trials
DELETE /tasks/{task_id} Delete a task and its trials
DELETE /experiments/{experiment_id} Delete an experiment and its tasks
PATCH /experiments/{experiment_id} Update experiment name
POST /tasks/upload Upload task tarball
POST /tasks/sweep Create evaluation sweep
GET /tasks/{task_id}/trials/{index} Fetch trial by index
GET /trials/{trial_id}/logs Fetch trial logs
GET /trials/{trial_id}/result Fetch result.json

Docker Compose

The docker-compose.yml orchestrates local development:

# Database only (for local Python dev)
docker compose up -d db

# Full stack (containerized)
docker compose up -d db api worker

# One-time database initialization
docker compose run --rm db-init

Services:

Service Description
db Postgres 16
api FastAPI server (python -m oddish.api)
worker Standalone worker (python -m oddish.workers.queue.worker)
db-init One-time setup: runs Alembic migrations + PGQueuer install

Configuration

Oddish loads environment variables from .env by default (via Pydantic Settings with the ODDISH_ prefix).

Database URL

Both formats supported:

DATABASE_URL=postgresql+asyncpg://oddish:oddish@localhost:5432/oddish
ODDISH_DATABASE_URL=postgresql+asyncpg://...  # Alternative

DATABASE_URL takes precedence over ODDISH_DATABASE_URL.

Storage

Local (default):

  • Tasks: /tmp/oddish-tasks
  • Harbor artifacts: /tmp/harbor-jobs

S3/R2 (production):

ODDISH_S3_ENABLED=true
ODDISH_S3_BUCKET=data
ODDISH_S3_ACCESS_KEY=...
ODDISH_S3_SECRET_KEY=...
ODDISH_S3_ENDPOINT_URL=https://...

Task uploads land under tasks/<task_id>/. Trial artifacts are uploaded under tasks/<task_id>/trials/<trial_id>/ when possible, with a fallback of trials/<trial_id>/ for legacy IDs.

Execution Environments

Oddish runs Harbor tasks in a sandboxed environment.

CLI behavior when --env is omitted:

  • Local API URL (localhost) defaults to docker
  • Hosted Modal API URL (*.modal.run) defaults to modal
  • Other remote API URLs default to docker

You can always override per task with: oddish run --env {docker|daytona|e2b|modal|runloop|gke}.

Queue-Key Routing

Oddish routes jobs by queue key (normalized model string) for PGQueuer entrypoints. Agent names still map to provider buckets for compatibility/attribution, but queueing uses get_queue_key_for_trial(agent, model) and defaults to the agent fallback only when no model is provided.

Concurrency Control

Queue-key concurrency is fixed at API startup (not per job).

Order of precedence:

  1. Manual API startup: python -m oddish.api --n-concurrent '{"openai/gpt-5.2": 8}'
  2. Default: ODDISH_DEFAULT_MODEL_CONCURRENCY (with optional model overrides)

For self-hosted setups, set concurrency on API startup:

uv run python -m oddish.api --n-concurrent '{"openai/gpt-5.2": 8, "anthropic/claude-sonnet-4-5": 8}'

Changing concurrency requires restarting the API process.

LLM API Keys

Only set keys for providers you use:

ANTHROPIC_API_KEY=sk-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

Sandbox Provider Keys

Set keys for the sandbox environments you use:

DAYTONA_API_KEY=...
MODAL_TOKEN_ID=...
MODAL_TOKEN_SECRET=...

Execution Pipeline

Tasks move through a multi-stage pipeline when run_analysis is enabled:

  1. Trials run for each agent/model pair (status: pending → queued → running → success/failed).
  2. Analyses run per trial after completion to classify outcomes.
  3. Verdict runs once per task to summarize analyses.

Each stage is a PGQueuer job routed through queue-key entrypoints. Task status progresses from pendingrunninganalyzingverdict_pendingcompleted (or failed on terminal error).

Experiments

Every task belongs to an experiment. If no experiment is provided, Oddish generates a short, human-friendly name (oddish.experiment.generate_experiment_name). The CLI can watch an experiment with oddish status --experiment <id>.

Workers

Auto-start behavior

By default, python -m oddish.api spawns workers in background threads.

For separate worker processes (for scaling), run:

uv run python -m oddish.workers.queue.worker

In Docker Compose, this maps to the dedicated worker service.

How PGQueuer works

Workers claim jobs atomically via Postgres:

SELECT * FROM pgqueuer
WHERE status = 'queued' AND entrypoint = 'openai/gpt-5.2'
ORDER BY priority DESC
LIMIT 1
FOR UPDATE SKIP LOCKED;

UPDATE pgqueuer SET status = 'processing' WHERE id = ?;
  • FOR UPDATE: Locks the row
  • SKIP LOCKED: Other workers skip locked rows
  • Result: Each job claimed by exactly one worker

Concurrency enforcement

PGQueuer checks processing count before claiming:

SELECT COUNT(*) FROM pgqueuer
WHERE entrypoint = 'openai/gpt-5.2' AND status = 'processing';

If count >= limit (e.g., 8), worker waits. Concurrency is database state, not worker count.

Job routing

Queue entrypoints are created per queue key (typically model identifiers). Each entrypoint handles jobs with job_type of trial, analysis, or verdict.

CLI Reference

# Point at local API
export ODDISH_API_URL="http://localhost:8000"

# Run a task
oddish run ./my-task -a claude-code -m claude-sonnet-4-5

# Run a sweep
oddish run -d terminalbench@2.0 -c sweep.yaml

# Optional run flags
oddish run ./my-task --run-analysis --env daytona --override-cpus 4

# Pass environment variables and kwargs to the agent
oddish run ./my-task -a claude-code -m claude-sonnet-4-5 \
  --ae AWS_REGION=us-east-1 --ak max_thinking_tokens=8000

# Force rebuild and collect artifacts
oddish run ./my-task -a claude-code --force-build --artifact /workspace/output.txt

# Monitor
oddish status
oddish status <task_id> --watch
oddish status --experiment <experiment_id> --watch

# Cleanup
oddish clean <task_id>
oddish clean --experiment <id>
oddish clean --all-experiments

Sweep Config Files

A sweep config file (YAML or JSON, passed via oddish run -c sweep.yaml) defines which agents and models to evaluate:

agents:
  - name: claude-code
    model_name: anthropic/claude-sonnet-4-5
    n_trials: 3
    env:                        # optional: agent env vars
      CUSTOM_VAR: "value"
    kwargs:                     # optional: agent kwargs
      max_thinking_tokens: 8000

  - name: codex
    model_name: openai/gpt-5.2
    n_trials: 3
    timeout_minutes: 120        # optional: per-agent timeout

# Task source (pick one):
path: ./my-task                 # local task or dataset directory
dataset: swebench@1.0           # registry dataset

# Optional filtering:
task_names: ["task-*"]
exclude_task_names: ["*-slow"]
n_tasks: 10

# Optional fields:
environment: daytona
priority: low
experiment_id: exp_123

Harbor Execution Config

All Harbor execution settings (environment resources, verifier config, artifacts) are passed via a nested harbor field in the API. This directly uses Harbor's native EnvironmentConfig, VerifierConfig, and ArtifactConfig types:

{
  "task_id": "abc123",
  "configs": [{"agent": "claude-code", "model": "claude-sonnet-4-5", "n_trials": 3}],
  "user": "alice",
  "harbor": {
    "environment": {
      "override_cpus": 4,
      "override_memory_mb": 8192,
      "override_gpus": 1,
      "kwargs": {
        "network_block_all": false,
        "sandbox_timeout_secs": 86400
      }
    },
    "verifier": {
      "disable": true
    },
    "artifacts": ["/workspace/output.txt"],
    "docker_image": "my-registry/my-image:latest"
  }
}

Per-trial agent overrides (env vars, kwargs, timeouts) use agent_config:

{
  "configs": [
    {
      "agent": "claude-code",
      "model": "claude-sonnet-4-5",
      "n_trials": 3,
      "agent_config": {
        "env": {"CUSTOM_VAR": "value"},
        "kwargs": {"max_thinking_tokens": 8000},
        "override_timeout_sec": 7200
      }
    }
  ]
}

GitHub Actions Integration

The CLI supports JSON output for CI pipelines:

oddish run ./tasks/* -a codex --n-trials 1 --json

# Output:
# {
#   "experiment": "random-words-123",
#   "experiment_url": "...",
#   "total_trials": 3,
#   "tasks": [
#     {"id": "task-abc123", "trials_count": 1, "url": "..."},
#     ...
#   ]
# }

Environment variables for CI:

  • ODDISH_API_URL: API endpoint (your self-hosted URL, or the hosted API)
  • ODDISH_API_KEY: API token (required for the hosted API)

Repository Structure

oddish/
├── src/oddish/
│   ├── __init__.py          # Public API (lazy-loaded exports)
│   ├── config.py            # Settings, provider mapping
│   ├── schemas.py           # Pydantic request/response models (HarborConfig, TaskSubmission, etc.)
│   ├── queue.py             # Task creation, queue orchestration
│   ├── experiment.py        # Experiment name generation
│   ├── infra.py             # Docker/infrastructure helpers
│   ├── api/
│   │   ├── __init__.py      # FastAPI app + endpoint wiring
│   │   ├── endpoints.py     # Core endpoint logic
│   │   ├── helpers.py       # Response builders
│   │   ├── tasks.py         # Task upload handling
│   │   └── trial_io.py      # Trial logs/result reading
│   ├── cli/
│   │   ├── __init__.py      # Typer app entry point
│   │   ├── run.py           # Run command (task submission)
│   │   ├── status.py        # Status command (monitoring)
│   │   ├── clean.py         # Clean command (deletion)
│   │   ├── api.py           # HTTP client helpers
│   │   ├── config.py        # API URL/auth resolution
│   │   └── infra.py         # Local infrastructure helpers
│   ├── db/
│   │   ├── __init__.py      # DB exports
│   │   ├── __main__.py      # CLI: python -m oddish.db
│   │   ├── models.py        # SQLAlchemy models (Experiment, Task, Trial)
│   │   ├── connection.py    # Engine, session factory, pool management
│   │   └── storage.py       # S3/local storage client
│   └── workers/
│       ├── harbor_runner.py # Harbor task executor + artifact upload
│       └── queue/
│           ├── queue_manager.py    # PGQueuer setup + entrypoints
│           ├── worker.py           # Standalone worker entry point
│           ├── trial_handler.py    # Trial execution handler
│           ├── analysis_handler.py # Post-trial analysis handler
│           ├── verdict_handler.py  # Task-level verdict handler
│           ├── db_helpers.py       # Worker DB utilities
│           └── shared.py           # Shared worker utilities
│
├── alembic/                 # Database migrations
├── alembic.ini              # Alembic configuration
├── docker-compose.yml       # Local dev orchestration
├── env.example              # Example .env file
├── pyproject.toml           # Package config and dependencies
└── README.md

Using as a Library

Oddish can be imported as a library in your own services:

# Database models and sessions
from oddish.db import TaskModel, TrialModel, get_session, init_db

# Queue operations
from oddish.queue import create_task, get_task_with_trials, get_queue_stats

# Worker logic
from oddish.workers.queue import create_queue_manager

# Configuration
from oddish.config import settings

# Schemas
from oddish.schemas import TaskSubmission, TaskSweepSubmission, TrialSpec, HarborConfig

Database Migrations

Oddish uses Alembic for schema management. The version table is alembic_version_oddish (to avoid conflicts if you run your own Alembic migrations in the same database).

PGQueuer tables are managed separately via oddish.db install-pgqueuer.

# Run all migrations
uv run alembic upgrade head

# Check current version
uv run alembic current

# Full setup (migrations + PGQueuer)
uv run python -m oddish.db setup

Troubleshooting

Port conflicts

Service Port
Postgres 5432
API 8000

If ports are in use, stop the conflicting process or change the port.

Database connection errors

# Verify Postgres is running
docker compose ps

# Test connection
psql $DATABASE_URL -c "SELECT 1"

# Check migrations
uv run alembic current
uv run alembic upgrade head

Tasks stuck in "queued"

  1. Check workers are running and API is healthy:

    curl http://localhost:8000/health
    oddish status
    
  2. Check queue-key concurrency limits (set at API startup) and worker logs

  3. Check for errors in API logs

Harbor execution failures

  1. Verify the sandbox environment is available (Docker running, Daytona key set, etc.)
  2. Check LLM API key is set for the provider
  3. Check trial error message:
    curl http://localhost:8000/tasks/<task_id> | jq '.trials[].error_message'
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oddish-0.1.1.tar.gz (103.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oddish-0.1.1-py3-none-any.whl (102.7 kB view details)

Uploaded Python 3

File details

Details for the file oddish-0.1.1.tar.gz.

File metadata

  • Download URL: oddish-0.1.1.tar.gz
  • Upload date:
  • Size: 103.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for oddish-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7db769a1ba0899250148a5e49d4bd6b57229173b903faea74b601343eccbe021
MD5 f66c077f819d1bb8c7bf4151d27f058b
BLAKE2b-256 c24eb90f17749efb2cc7312df5175c63d163b544b6f511511446ce74071915a0

See more details on using hashes here.

File details

Details for the file oddish-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: oddish-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 102.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for oddish-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f7a2ec4b408c9923fb778b1f4600aa7fb313ce154d7dae455100fc9f8fe6807
MD5 426786715c46abed8fba5447a39f6612
BLAKE2b-256 add84c71f395c4f94b48228bf705f3e6778745bb5ab28701a437d827b5ae10d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page