Postgres-backed eval scheduler for Harbor agent tasks — queuing, retries, and monitoring

These details have not been verified by PyPI

Project links

Project description

Oddish Core Library

This README is focused on implementation details for the oddish package.

Deep Technical Documentation

This covers architecture, configuration, and operational details.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  CLI (oddish run/status) or API Client                      │
│  - Uses env vars for API URL + auth                         │
│  - Submits tasks via HTTP                                   │
│  - Watches tasks or experiments via CLI status              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  FastAPI Server (python -m oddish.api)                      │
│  - POST /tasks/upload, /tasks/sweep                         │
│  - GET /tasks, /tasks/{id}, /trials/{id}/logs               │
│  - Auto-starts workers by default                           │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Postgres                                                   │
│  - experiments table (grouping + sharing metadata)          │
│  - tasks table (task metadata + verdict)                    │
│  - trials table (runs + analysis)                           │
│  - pgqueuer tables (job queue)                              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  PGQueuer Workers                                           │
│  - Poll queue via SELECT FOR UPDATE SKIP LOCKED             │
│  - Queue-key concurrency limits                             │
│  - Execute trials, analyses, and verdict jobs               │
└─────────────────────────────────────────────────────────────┘

Local Development

Quick start (recommended)

cp env.example .env
docker compose up -d db
uv sync
uv run python -m oddish.db setup
uv run python -m oddish.api

This starts Postgres, runs the API with workers, and is a good baseline for local development.

CLI configuration

The CLI can talk to either:

Local API: http://localhost:8000 (self-hosted)
Hosted API (optional): https://abundant-ai--api.modal.run

For local use, point the CLI at your server:

export ODDISH_API_URL="http://localhost:8000"

The local API does not enforce auth by default. For the hosted API, set an API key:

export ODDISH_API_URL="https://abundant-ai--api.modal.run"
export ODDISH_API_KEY="ok_..."

CLI config precedence

The CLI resolves API settings in this order:

ODDISH_API_URL / ODDISH_API_KEY / ODDISH_DASHBOARD_URL
ODDISH_DEFAULT_API_URL / ODDISH_DEFAULT_DASHBOARD_URL
Built-in defaults:
- API: https://abundant-ai--api.modal.run
- Dashboard: https://www.oddish.app

Database setup commands

uv run python -m oddish.db setup            # Full setup (Alembic + PGQueuer)
uv run python -m oddish.db init             # Run Alembic migrations only
uv run python -m oddish.db install-pgqueuer # Install PGQueuer tables only
uv run python -m oddish.db reset            # Drop and recreate all tables
uv run python -m oddish.db purge            # Delete all data (preserves schema)

API flags

# Set queue-key concurrency
uv run python -m oddish.api --n-concurrent '{"openai/gpt-5.2": 8, "anthropic/claude-sonnet-4-5": 8}'

# Custom host/port
uv run python -m oddish.api --host 0.0.0.0 --port 9000

API Endpoints

Method	Endpoint	Description
GET	`/health`	Health check
GET	`/docs`	Swagger UI
GET	`/tasks`	List tasks
GET	`/tasks/{task_id}`	Task details with trials
DELETE	`/tasks/{task_id}`	Delete a task and its trials
DELETE	`/experiments/{experiment_id}`	Delete an experiment and its tasks
PATCH	`/experiments/{experiment_id}`	Update experiment name
POST	`/tasks/upload`	Upload task tarball
POST	`/tasks/sweep`	Create evaluation sweep
GET	`/tasks/{task_id}/trials/{index}`	Fetch trial by index
GET	`/trials/{trial_id}/logs`	Fetch trial logs
GET	`/trials/{trial_id}/result`	Fetch `result.json`

Docker Compose

The docker-compose.yml orchestrates local development:

# Database only (for local Python dev)
docker compose up -d db

# Full stack (containerized)
docker compose up -d db api worker

# One-time database initialization
docker compose run --rm db-init

Services:

Service	Description
`db`	Postgres 16
`api`	FastAPI server (`python -m oddish.api`)
`worker`	Standalone worker (`python -m oddish.workers.queue.worker`)
`db-init`	One-time setup: runs Alembic migrations + PGQueuer install

Configuration

Oddish loads environment variables from .env by default (via Pydantic Settings with the ODDISH_ prefix).

Database URL

Both formats supported:

DATABASE_URL=postgresql+asyncpg://oddish:oddish@localhost:5432/oddish
ODDISH_DATABASE_URL=postgresql+asyncpg://...  # Alternative

DATABASE_URL takes precedence over ODDISH_DATABASE_URL.

Storage

Local (default):

Tasks: /tmp/oddish-tasks
Harbor artifacts: /tmp/harbor-jobs

S3/R2 (production):

ODDISH_S3_ENABLED=true
ODDISH_S3_BUCKET=data
ODDISH_S3_ACCESS_KEY=...
ODDISH_S3_SECRET_KEY=...
ODDISH_S3_ENDPOINT_URL=https://...

Task uploads land under tasks/<task_id>/. Trial artifacts are uploaded under tasks/<task_id>/trials/<trial_id>/ when possible, with a fallback of trials/<trial_id>/ for legacy IDs.

Execution Environments

Oddish runs Harbor tasks in a sandboxed environment.

CLI behavior when --env is omitted:

Local API URL (localhost) defaults to docker
Hosted Modal API URL (*.modal.run) defaults to modal
Other remote API URLs default to docker

You can always override per task with: oddish run --env {docker|daytona|e2b|modal|runloop|gke}.

Queue-Key Routing

Oddish routes jobs by queue key (normalized model string) for PGQueuer entrypoints. Agent names still map to provider buckets for compatibility/attribution, but queueing uses get_queue_key_for_trial(agent, model) and defaults to the agent fallback only when no model is provided.

Concurrency Control

Queue-key concurrency is fixed at API startup (not per job).

Order of precedence:

Manual API startup: python -m oddish.api --n-concurrent '{"openai/gpt-5.2": 8}'
Default: ODDISH_DEFAULT_MODEL_CONCURRENCY (with optional model overrides)

For self-hosted setups, set concurrency on API startup:

uv run python -m oddish.api --n-concurrent '{"openai/gpt-5.2": 8, "anthropic/claude-sonnet-4-5": 8}'

Changing concurrency requires restarting the API process.

LLM API Keys

Only set keys for providers you use:

ANTHROPIC_API_KEY=sk-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

Sandbox Provider Keys

Set keys for the sandbox environments you use:

DAYTONA_API_KEY=...
MODAL_TOKEN_ID=...
MODAL_TOKEN_SECRET=...

Execution Pipeline

Tasks move through a multi-stage pipeline when run_analysis is enabled:

Trials run for each agent/model pair (status: pending → queued → running → success/failed).
Analyses run per trial after completion to classify outcomes.
Verdict runs once per task to summarize analyses.

Each stage is a PGQueuer job routed through queue-key entrypoints. Task status progresses from pending → running → analyzing → verdict_pending → completed (or failed on terminal error).

Experiments

Every task belongs to an experiment. If no experiment is provided, Oddish generates a short, human-friendly name (oddish.experiment.generate_experiment_name). The CLI can watch an experiment with oddish status --experiment <id>.

Workers

Auto-start behavior

By default, python -m oddish.api spawns workers in background threads.

For separate worker processes (for scaling), run:

uv run python -m oddish.workers.queue.worker

In Docker Compose, this maps to the dedicated worker service.

How PGQueuer works

Workers claim jobs atomically via Postgres:

SELECT * FROM pgqueuer
WHERE status = 'queued' AND entrypoint = 'openai/gpt-5.2'
ORDER BY priority DESC
LIMIT 1
FOR UPDATE SKIP LOCKED;

UPDATE pgqueuer SET status = 'processing' WHERE id = ?;

FOR UPDATE: Locks the row
SKIP LOCKED: Other workers skip locked rows
Result: Each job claimed by exactly one worker

Concurrency enforcement

PGQueuer checks processing count before claiming:

SELECT COUNT(*) FROM pgqueuer
WHERE entrypoint = 'openai/gpt-5.2' AND status = 'processing';

If count >= limit (e.g., 8), worker waits. Concurrency is database state, not worker count.

Job routing

Queue entrypoints are created per queue key (typically model identifiers). Each entrypoint handles jobs with job_type of trial, analysis, or verdict.

CLI Reference

# Point at local API
export ODDISH_API_URL="http://localhost:8000"

# Run a task
oddish run ./my-task -a claude-code -m claude-sonnet-4-5

# Run a sweep
oddish run -d terminalbench@2.0 -c sweep.yaml

# Optional run flags
oddish run ./my-task --run-analysis --env daytona --override-cpus 4

# Pass environment variables and kwargs to the agent
oddish run ./my-task -a claude-code -m claude-sonnet-4-5 \
  --ae AWS_REGION=us-east-1 --ak max_thinking_tokens=8000

# Force rebuild and collect artifacts
oddish run ./my-task -a claude-code --force-build --artifact /workspace/output.txt

# Monitor
oddish status
oddish status <task_id> --watch
oddish status --experiment <experiment_id> --watch

# Cleanup
oddish clean <task_id>
oddish clean --experiment <id>
oddish clean --all-experiments

Sweep Config Files

A sweep config file (YAML or JSON, passed via oddish run -c sweep.yaml) defines which agents and models to evaluate:

agents:
  - name: claude-code
    model_name: anthropic/claude-sonnet-4-5
    n_trials: 3
    env:                        # optional: agent env vars
      CUSTOM_VAR: "value"
    kwargs:                     # optional: agent kwargs
      max_thinking_tokens: 8000

  - name: codex
    model_name: openai/gpt-5.2
    n_trials: 3
    timeout_minutes: 120        # optional: per-agent timeout

# Task source (pick one):
path: ./my-task                 # local task or dataset directory
dataset: swebench@1.0           # registry dataset

# Optional filtering:
task_names: ["task-*"]
exclude_task_names: ["*-slow"]
n_tasks: 10

# Optional fields:
environment: daytona
priority: low
experiment_id: exp_123

Harbor Execution Config

All Harbor execution settings (environment resources, verifier config, artifacts) are passed via a nested harbor field in the API. This directly uses Harbor's native EnvironmentConfig, VerifierConfig, and ArtifactConfig types:

{
  "task_id": "abc123",
  "configs": [{"agent": "claude-code", "model": "claude-sonnet-4-5", "n_trials": 3}],
  "user": "alice",
  "harbor": {
    "environment": {
      "override_cpus": 4,
      "override_memory_mb": 8192,
      "override_gpus": 1,
      "kwargs": {
        "network_block_all": false,
        "sandbox_timeout_secs": 86400
      }
    },
    "verifier": {
      "disable": true
    },
    "artifacts": ["/workspace/output.txt"],
    "docker_image": "my-registry/my-image:latest"
  }
}

Per-trial agent overrides (env vars, kwargs, timeouts) use agent_config:

{
  "configs": [
    {
      "agent": "claude-code",
      "model": "claude-sonnet-4-5",
      "n_trials": 3,
      "agent_config": {
        "env": {"CUSTOM_VAR": "value"},
        "kwargs": {"max_thinking_tokens": 8000},
        "override_timeout_sec": 7200
      }
    }
  ]
}

GitHub Actions Integration

The CLI supports JSON output for CI pipelines:

oddish run ./tasks/* -a codex --n-trials 1 --json

# Output:
# {
#   "experiment": "random-words-123",
#   "experiment_url": "...",
#   "total_trials": 3,
#   "tasks": [
#     {"id": "task-abc123", "trials_count": 1, "url": "..."},
#     ...
#   ]
# }

Environment variables for CI:

ODDISH_API_URL: API endpoint (your self-hosted URL, or the hosted API)
ODDISH_API_KEY: API token (required for the hosted API)

Repository Structure

oddish/
├── src/oddish/
│   ├── __init__.py          # Public API (lazy-loaded exports)
│   ├── config.py            # Settings, provider mapping
│   ├── schemas.py           # Pydantic request/response models (HarborConfig, TaskSubmission, etc.)
│   ├── queue.py             # Task creation, queue orchestration
│   ├── experiment.py        # Experiment name generation
│   ├── infra.py             # Docker/infrastructure helpers
│   ├── api/
│   │   ├── __init__.py      # FastAPI app + endpoint wiring
│   │   ├── endpoints.py     # Core endpoint logic
│   │   ├── helpers.py       # Response builders
│   │   ├── tasks.py         # Task upload handling
│   │   └── trial_io.py      # Trial logs/result reading
│   ├── cli/
│   │   ├── __init__.py      # Typer app entry point
│   │   ├── run.py           # Run command (task submission)
│   │   ├── status.py        # Status command (monitoring)
│   │   ├── clean.py         # Clean command (deletion)
│   │   ├── api.py           # HTTP client helpers
│   │   ├── config.py        # API URL/auth resolution
│   │   └── infra.py         # Local infrastructure helpers
│   ├── db/
│   │   ├── __init__.py      # DB exports
│   │   ├── __main__.py      # CLI: python -m oddish.db
│   │   ├── models.py        # SQLAlchemy models (Experiment, Task, Trial)
│   │   ├── connection.py    # Engine, session factory, pool management
│   │   └── storage.py       # S3/local storage client
│   └── workers/
│       ├── harbor_runner.py # Harbor task executor + artifact upload
│       └── queue/
│           ├── queue_manager.py    # PGQueuer setup + entrypoints
│           ├── worker.py           # Standalone worker entry point
│           ├── trial_handler.py    # Trial execution handler
│           ├── analysis_handler.py # Post-trial analysis handler
│           ├── verdict_handler.py  # Task-level verdict handler
│           ├── db_helpers.py       # Worker DB utilities
│           └── shared.py           # Shared worker utilities
│
├── alembic/                 # Database migrations
├── alembic.ini              # Alembic configuration
├── docker-compose.yml       # Local dev orchestration
├── env.example              # Example .env file
├── pyproject.toml           # Package config and dependencies
└── README.md

Using as a Library

Oddish can be imported as a library in your own services:

# Database models and sessions
from oddish.db import TaskModel, TrialModel, get_session, init_db

# Queue operations
from oddish.queue import create_task, get_task_with_trials, get_queue_stats

# Worker logic
from oddish.workers.queue import create_queue_manager

# Configuration
from oddish.config import settings

# Schemas
from oddish.schemas import TaskSubmission, TaskSweepSubmission, TrialSpec, HarborConfig

Database Migrations

Oddish uses Alembic for schema management. The version table is alembic_version_oddish (to avoid conflicts if you run your own Alembic migrations in the same database).

PGQueuer tables are managed separately via oddish.db install-pgqueuer.

# Run all migrations
uv run alembic upgrade head

# Check current version
uv run alembic current

# Full setup (migrations + PGQueuer)
uv run python -m oddish.db setup

Troubleshooting

Port conflicts

Service	Port
Postgres	5432
API	8000

If ports are in use, stop the conflicting process or change the port.

Database connection errors

# Verify Postgres is running
docker compose ps

# Test connection
psql $DATABASE_URL -c "SELECT 1"

# Check migrations
uv run alembic current
uv run alembic upgrade head

Tasks stuck in "queued"

Check workers are running and API is healthy:

curl http://localhost:8000/health
oddish status

Check queue-key concurrency limits (set at API startup) and worker logs
Check for errors in API logs

Harbor execution failures

Verify the sandbox environment is available (Docker running, Daytona key set, etc.)
Check LLM API key is set for the provider

Check trial error message:

curl http://localhost:8000/tasks/<task_id> | jq '.trials[].error_message'

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.12

Apr 24, 2026

0.1.11

Apr 24, 2026

0.1.10

Apr 22, 2026

0.1.9

Apr 22, 2026

0.1.8

Apr 21, 2026

0.1.7

Apr 7, 2026

0.1.6

Apr 4, 2026

0.1.5

Mar 27, 2026

0.1.4

Mar 27, 2026

0.1.3

Mar 24, 2026

0.1.2

Mar 12, 2026

This version

0.1.1

Feb 27, 2026

0.1.0

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oddish-0.1.1.tar.gz (103.9 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oddish-0.1.1-py3-none-any.whl (102.7 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file oddish-0.1.1.tar.gz.

File metadata

Download URL: oddish-0.1.1.tar.gz
Upload date: Feb 27, 2026
Size: 103.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for oddish-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`7db769a1ba0899250148a5e49d4bd6b57229173b903faea74b601343eccbe021`
MD5	`f66c077f819d1bb8c7bf4151d27f058b`
BLAKE2b-256	`c24eb90f17749efb2cc7312df5175c63d163b544b6f511511446ce74071915a0`

See more details on using hashes here.

File details

Details for the file oddish-0.1.1-py3-none-any.whl.

File metadata

Download URL: oddish-0.1.1-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 102.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for oddish-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f7a2ec4b408c9923fb778b1f4600aa7fb313ce154d7dae455100fc9f8fe6807`
MD5	`426786715c46abed8fba5447a39f6612`
BLAKE2b-256	`add84c71f395c4f94b48228bf705f3e6778745bb5ab28701a437d827b5ae10d1`

See more details on using hashes here.

oddish 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Oddish Core Library

Deep Technical Documentation

Architecture

Local Development

Quick start (recommended)

CLI configuration

CLI config precedence

Database setup commands

API flags

API Endpoints

Docker Compose

Configuration

Database URL

Storage

Execution Environments

Queue-Key Routing

Concurrency Control

LLM API Keys

Sandbox Provider Keys

Execution Pipeline

Experiments

Workers

Auto-start behavior

How PGQueuer works

Concurrency enforcement

Job routing

CLI Reference

Sweep Config Files

Harbor Execution Config

GitHub Actions Integration

Repository Structure

Using as a Library

Database Migrations

Troubleshooting

Port conflicts

Database connection errors

Tasks stuck in "queued"

Harbor execution failures

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes