Skip to main content

YAML-driven script orchestrator and scheduler with monitor telemetry helper

Project description

Gulag Chief Guide

This guide is the complete, user-focused reference for running gulag-chief and configuring chief.yaml.

1. What Chief Does

gulag-chief is a YAML-driven orchestrator for Python worker scripts.

It provides:

  • ordered script execution per job
  • a human-readable scheduling DSL
  • configuration validation with explicit errors
  • schedule preview with next run times
  • one-shot runs and daemon scheduling
  • cron export for cron-compatible schedules
  • optional monitor telemetry and worker instrumentation

2. Prerequisites

  • Python 3.9+
  • pip
  • a local chief.yaml

Install package locally (from /Users/joshbeaver/Documents/Summit/gulag/chief):

python -m pip install .

For development and tests:

python -m pip install -e ".[dev]"

3. Quick Start (5 Minutes)

Run from your Chief project directory.

  1. Validate config
gulag-chief validate --config chief.yaml
  1. Preview schedules
gulag-chief preview --config chief.yaml
  1. Run all enabled jobs once
gulag-chief run --config chief.yaml
  1. Run one job
gulag-chief run --config chief.yaml --job sample-etl-pipeline
  1. Run daemon scheduler
gulag-chief daemon --config chief.yaml --poll-seconds 10
  1. Export cron-compatible schedules
gulag-chief export-cron --config chief.yaml

4. CLI Commands and Arguments

Chief supports a global config flag plus command-specific flags.

Global Flag

  • --config PATH: path to config file (default chief.yaml)

You can place it before or after the subcommand:

gulag-chief --config chief.yaml validate
gulag-chief validate --config chief.yaml

validate

Purpose: validate YAML structure, script paths, schedule rules, and compilation mode.

gulag-chief validate [--config PATH]

preview

Purpose: print schedule description, compilation mode, next runs, and cron equivalent when available.

gulag-chief preview [--config PATH] [--job NAME] [--count N]

Flags:

  • --job NAME: preview one job
  • --count N: number of future runs (default 5, must be >= 1)

Examples:

gulag-chief preview
gulag-chief preview --job sample-etl-pipeline
gulag-chief preview --job sample-etl-pipeline --count 10

run

Purpose: execute selected jobs immediately.

gulag-chief run [--config PATH] [--job NAME] [--respect-schedule]

Behavior:

  • default: run all enabled jobs once in YAML order
  • --job NAME: run one enabled job
  • --respect-schedule: run only if the job is due now

Examples:

gulag-chief run
gulag-chief run --job sample-etl-pipeline
gulag-chief run --job sample-etl-pipeline --respect-schedule

daemon

Purpose: continuous scheduler loop.

gulag-chief daemon [--config PATH] [--poll-seconds N]

Flags:

  • --poll-seconds N: polling interval in seconds (default 10, must be >= 1)

export-cron

Purpose: print cron lines for cron-compatible schedules.

gulag-chief export-cron [--config PATH] [--job NAME]

Common CLI behavior:

  • unknown --job name returns an explicit error
  • run and daemon operate only on enabled jobs
  • preview can still inspect disabled jobs
  • run --respect-schedule is intended for cron-invoked runs with runtime guards

5. chief.yaml Configuration

Top-level keys:

  • version
  • defaults
  • monitor
  • jobs

Full shape example:

version: 1

defaults:
  working_dir: .
  stop_on_failure: true
  overlap: skip
  timezone: UTC


jobs:
  - name: sample-etl-pipeline
    enabled: true
    working_dir: .
    stop_on_failure: true
    overlap: skip
    schedule:
      frequency: daily
      time: "06:00"
      timezone: America/New_York
    scripts:
      - path: workers/sample/extract_demo.py
        args: ["--source", "orders"]
        timeout: 120
      - path: workers/sample/transform_demo.py
      - path: workers/sample/load_demo.py

defaults Keys

  • working_dir: default cwd for scripts
  • stop_on_failure: default per-job behavior
  • overlap: default overlap policy (skip | queue | parallel)
  • timezone: default schedule timezone

Job Keys

  • name (required, unique)
  • enabled (default true)
  • working_dir (inherits from defaults)
  • stop_on_failure (inherits from defaults)
  • overlap (inherits from defaults)
  • schedule (required)
  • scripts (required non-empty list)
  • monitor (optional job-level override)

Script Keys

  • path (required)
  • args (optional)
  • timeout (optional, seconds)

args supports both list and shell-style string forms.

scripts:
  - path: workers/sample/load_demo.py
    args:
      - --input
      - workers/sample/state/transformed_orders.json
      - --table
      - fact_orders
scripts:
  - path: workers/sample/transform_demo.py
    args: --input in.json --output out.json --label "nightly run"

Path behavior:

  • relative script paths resolve from job working_dir
  • script files must exist at validation/load time

6. Scheduling DSL (Friendly but Strict)

Every job must define exactly one schedule mode:

  • daily
  • weekly
  • monthly
  • yearly
  • interval
  • custom

Global Schedule Modifiers

Allowed on all frequencies:

  • timezone
  • start (ISO datetime)
  • end (ISO datetime)
  • exclude (list of YYYY-MM-DD)

Behavior:

  • if start/end are naive datetimes, Chief interprets them in schedule timezone
  • Chief will not run outside [start, end]
  • exclude dates are enforced using local date in schedule timezone
  • named holiday shortcuts are not supported in v1

daily

Required:

  • time (HH:MM, 24-hour)

Optional:

  • weekdays_only (true|false)

Example:

schedule:
  frequency: daily
  time: "14:30"
  weekdays_only: true

weekly

Required:

  • day
  • time

day supports:

  • single (monday)
  • comma list (monday,wednesday,friday)
  • named range (monday-friday)
  • YAML list ([monday, wednesday, friday])

Example:

schedule:
  frequency: weekly
  day: monday-friday
  time: "09:00"

monthly

Choose one style.

Style A: day of month.

schedule:
  frequency: monthly
  day_of_month: 15
  time: "08:00"

Style B: ordinal weekday.

schedule:
  frequency: monthly
  ordinal: last
  day: friday
  time: "18:00"

Valid ordinals:

  • first
  • second
  • third
  • fourth
  • last

Rule:

  • day_of_month and ordinal/day cannot be mixed

yearly

Required:

  • month
  • day_of_month
  • time

Example:

schedule:
  frequency: yearly
  month: january
  day_of_month: 1
  time: "00:00"

interval

Required:

  • every in <number><unit> form

Supported units:

  • m (minutes)
  • h (hours)
  • d (days)

Examples:

schedule:
  frequency: interval
  every: 5m
schedule:
  frequency: interval
  every: 2h

Rules:

  • time is forbidden in interval mode
  • seconds intervals (s) are intentionally unsupported in v1

custom

Labeled cron-like fields:

  • minute
  • hour
  • day_of_month
  • month
  • day_of_week

At least one field is required.

Example:

schedule:
  frequency: custom
  minute: 0
  hour: 9
  day_of_week: monday-friday

7. Compilation Modes and Runtime Semantics

Chief compiles schedules into one of three kinds:

  • pure_cron: fully representable by cron
  • hybrid: cron trigger + runtime guard
  • runtime_only: runtime scheduler logic required

Examples:

  • weekly friday 17:30 -> pure cron (30 17 * * 5)
  • monthly ordinal + day -> hybrid
  • interval every: 90m -> runtime only

Job Execution Semantics

  • scripts run sequentially in each job
  • args are passed as parsed from YAML
  • each script can have its own timeout
  • if a script fails and stop_on_failure: true, remaining scripts are skipped
  • if stop_on_failure: false, job continues to remaining scripts

Daemon Semantics

  • no startup catch-up (cron-like)
  • deterministic ordering by YAML job order
  • overlap behavior per job:
    • skip: drop trigger while running
    • queue: keep one pending trigger while running
    • parallel: allow same-job concurrent runs; global scheduling stays deterministic

Timezone and DST Behavior

  • schedule matching is timezone-aware
  • wall-clock semantics are used
  • spring-forward nonexistent times are skipped
  • fall-back ambiguous times run once

8. Using Chief with Monitor

Chief Monitor is a companion service that receives telemetry events from Chief and workers, stores them, and exposes status/alerts in API and UI.

Chief emits lifecycle telemetry automatically when monitor is enabled. Worker scripts can also send custom messages via gulag_chief.monitor_client.

Telemetry delivery is best-effort and non-blocking, so job execution continues if monitor is down.

Start Monitor Service (example)

cd monitor
npm install
npm run db:migrate
npm run dev

Monitor Config in chief.yaml

monitor:
  enabled: true
  endpoint: http://127.0.0.1:7410
  api_key: "" # optional; use env/.env in most setups
  timeout_ms: 400
  heartbeat_seconds: 15
  buffer:
    max_events: 5000
    flush_interval_ms: 1000
    spool_file: .chief/telemetry_spool.jsonl

Top-level monitor fields:

  • enabled: global telemetry on/off
  • endpoint: monitor base URL (http:// or https://)
  • api_key: optional auth key sent as x-api-key
  • timeout_ms: HTTP send timeout
  • heartbeat_seconds: Chief heartbeat interval (chief.heartbeat)
  • buffer.max_events: in-memory queue cap
  • buffer.flush_interval_ms: flush cadence
  • buffer.spool_file: local JSONL fallback if endpoint is unavailable

API Key Without YAML Secrets

Chief resolves API key in this order:

  1. monitor.api_key in YAML if non-empty
  2. CHIEF_MONITOR_API_KEY environment variable
  3. MONITOR_API_KEY environment variable
  4. .env in same directory as chief.yaml

Examples:

export CHIEF_MONITOR_API_KEY=your-secret
# or
export MONITOR_API_KEY=your-secret

.env example:

CHIEF_MONITOR_API_KEY=your-secret

Per-Job Monitor Override

Job monitor settings inherit from top-level monitor, then can be overridden per job.

monitor:
  enabled: true
  endpoint: http://127.0.0.1:7410

jobs:
  - name: critical-pipeline
    monitor:
      enabled: true
      check:
        enabled: true
        grace_seconds: 120
        alert_on_failure: true
        alert_on_miss: true
    schedule:
      frequency: interval
      every: 5m
    scripts:
      - path: workers/critical.py

  - name: noisy-ad-hoc-job
    monitor:
      enabled: false
    schedule:
      frequency: daily
      time: "06:00"
    scripts:
      - path: workers/noisy.py

Override behavior:

  • jobs[].monitor.enabled defaults to top-level monitor.enabled
  • jobs[].monitor.check.* customizes alert/check behavior per job
  • you can disable noisy jobs while keeping telemetry enabled globally

Per-job key reference:

  • jobs[].monitor.enabled Controls whether Chief emits telemetry for this job and injects monitor env vars into worker scripts for this job. Default: inherits top-level monitor.enabled.
  • jobs[].monitor.check.enabled Controls whether the monitor check state for this job is evaluated (UP / LATE / DOWN) against expected next run timing. Default: inherits jobs[].monitor.enabled.
  • jobs[].monitor.check.grace_seconds Additional allowed delay after expected_next_at before the job is marked DOWN and considered missed. Default: 120 (minimum 0).
  • jobs[].monitor.check.alert_on_failure If true, opens FAILURE alerts on failed job runs and closes them on recovery success (with RECOVERY alert). Default: true.
  • jobs[].monitor.check.alert_on_miss If true, opens MISSED alerts when heartbeat is overdue beyond grace_seconds, and closes with RECOVERY when heartbeat resumes. Default: true.

Worker Monitor Helper (gulag_chief.monitor_client)

Use inside workers:

from gulag_chief.monitor_client import monitor

monitor.info("worker started", step="extract")
monitor.warn("slow upstream response", latency_ms=1450)
monitor.error("load failed", table="fact_orders")

Available methods:

  • monitor.debug(message, **meta)
  • monitor.info(message, **meta)
  • monitor.warn(message, **meta)
  • monitor.error(message, **meta)
  • monitor.critical(message, **meta)

Chief injects context env vars into worker subprocesses:

  • CHIEF_MONITOR_ENDPOINT
  • CHIEF_MONITOR_API_KEY
  • CHIEF_RUN_ID
  • CHIEF_JOB_NAME
  • CHIEF_SCRIPT_PATH
  • CHIEF_SCHEDULED_FOR

Telemetry Event Types

Chief lifecycle events:

  • job.started
  • script.started
  • script.completed
  • job.completed
  • job.failed
  • job.next_scheduled
  • chief.heartbeat
  • daemon.dispatch
  • daemon.overlap_skipped
  • daemon.queued_pending

Worker custom event type:

  • worker.message

Levels:

  • DEBUG
  • INFO
  • WARN
  • ERROR
  • CRITICAL

9. export-cron Workflow

Generate cron lines:

gulag-chief export-cron --config chief.yaml

Output includes:

  • CRON_TZ=<timezone> lines
  • cron entries for pure/hybrid schedules
  • comments for runtime-only schedules

Important:

  • hybrid schedules still require runtime guard enforcement
  • use gulag-chief run --respect-schedule in exported cron commands

10. Tutorial: Add a New Job

  1. Create your script under workers/.
  2. Add a job block in chief.yaml with schedule and scripts.
  3. If needed, add helper telemetry in worker code.
  4. Validate and preview:
gulag-chief validate --config chief.yaml
gulag-chief preview --config chief.yaml --job your-job-name
  1. Execute once:
gulag-chief run --config chief.yaml --job your-job-name
  1. Move to daemon mode once stable:
gulag-chief daemon --config chief.yaml

11. Practical YAML Examples

Example 1: Daily ETL with args

jobs:
  - name: ga-daily
    enabled: true
    schedule:
      frequency: daily
      time: "06:00"
      timezone: America/New_York
    scripts:
      - path: scripts/google-analytics/google_analytics_to_supabase.py
        args:
          - --function
          - sessions_by_channel
          - --start-date
          - 2026-01-01
          - --end-date
          - 2026-01-31
        timeout: 1800

Example 2: Monthly last Friday with exclusion

jobs:
  - name: monthly-report
    enabled: true
    overlap: queue
    schedule:
      frequency: monthly
      ordinal: last
      day: friday
      time: "18:00"
      timezone: UTC
      exclude:
        - 2026-12-25
    scripts:
      - path: scripts/other/offer_report_to_supabase.py
        timeout: 1200

Example 3: Runtime-only interval

jobs:
  - name: rolling-check
    enabled: true
    schedule:
      frequency: interval
      every: 90m
    scripts:
      - path: scripts/weather/weather_to_supabase.py
        timeout: 600

12. Validation Rules and Common Errors

Chief enforces strict validation with explicit errors.

Key rules:

  • schedule frequency is required and must be valid
  • required fields must exist for selected frequency
  • conflicting fields are rejected
  • time must be valid HH:MM
  • timezone must be valid IANA timezone
  • interval mode cannot include time
  • monthly must be either day_of_month or ordinal + day
  • script files must exist

Representative error:

Error: "monthly" requires either "day_of_month" or "ordinal + day".

13. Troubleshooting

ModuleNotFoundError: No module named 'pytest'

Use the same interpreter for install and run:

python -m pip install pytest
python -m pytest -q tests/test_chief.py

Do not run test files directly with python tests/test_chief.py.

Missing PyYAML or croniter

python -m pip install gulag-chief
# or from source
python -m pip install .

Config validates but nothing runs with run --respect-schedule

This is expected if current time is not due. Inspect next runs:

gulag-chief preview --job <job-name>

Monitor warnings in chief.log

If telemetry send warnings appear, jobs still execute. It indicates temporary monitor connectivity/auth issues.

Quick checks:

  1. Ensure monitor is running.
  2. Verify monitor.endpoint and API key source.
  3. Confirm /v1/health is reachable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gulag_chief-0.1.1.tar.gz (35.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gulag_chief-0.1.1-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file gulag_chief-0.1.1.tar.gz.

File metadata

  • Download URL: gulag_chief-0.1.1.tar.gz
  • Upload date:
  • Size: 35.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for gulag_chief-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c5435964a2e2fc2265c9bccf485078fd3f554aed91e6c6cbb5188915b1fd20a3
MD5 8ab20283b6d0b4314ea60b2413b5636c
BLAKE2b-256 ffaa0daef65f0823ecba5d183b1ac11287be4ee3a920f490b8fafb925844ec08

See more details on using hashes here.

File details

Details for the file gulag_chief-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gulag_chief-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for gulag_chief-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 50ebe7871124d49d55e600198e25fcd35bee15af26d2bef1476ca586cb11e961
MD5 3062bfa4b72bb2a2905c739451af8734
BLAKE2b-256 8fce66c1faffd75ae5158c1f28f46a3d5489ef481d33bf71e752a9d4b620f514

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page