YAML-driven script orchestrator and scheduler with monitor telemetry helper
Project description
Gulag Chief Guide
This guide is the complete, user-focused reference for running gulag-chief and configuring chief.yaml.
1. What Chief Does
gulag-chief is a YAML-driven orchestrator for Python worker scripts.
It provides:
- ordered script execution per job
- a human-readable scheduling DSL
- configuration validation with explicit errors
- schedule preview with next run times
- one-shot runs and daemon scheduling
- cron export for cron-compatible schedules
- optional monitor telemetry and worker instrumentation
2. Prerequisites
- Python 3.9+
pip- a local
chief.yaml
Install package locally (from /Users/joshbeaver/Documents/Summit/gulag/chief):
python -m pip install .
For development and tests:
python -m pip install -e ".[dev]"
3. Quick Start (5 Minutes)
Run from your Chief project directory.
- Validate config
gulag-chief validate --config chief.yaml
- Preview schedules
gulag-chief preview --config chief.yaml
- Run all enabled jobs once
gulag-chief run --config chief.yaml
- Run one job
gulag-chief run --config chief.yaml --job sample-etl-pipeline
- Run daemon scheduler
gulag-chief daemon --config chief.yaml --poll-seconds 10
- Export cron-compatible schedules
gulag-chief export-cron --config chief.yaml
4. CLI Commands and Arguments
Chief supports a global config flag plus command-specific flags.
Global Flag
--config PATH: path to config file (defaultchief.yaml)
You can place it before or after the subcommand:
gulag-chief --config chief.yaml validate
gulag-chief validate --config chief.yaml
validate
Purpose: validate YAML structure, script paths, schedule rules, and compilation mode.
gulag-chief validate [--config PATH]
preview
Purpose: print schedule description, compilation mode, next runs, and cron equivalent when available.
gulag-chief preview [--config PATH] [--job NAME] [--count N]
Flags:
--job NAME: preview one job--count N: number of future runs (default5, must be>= 1)
Examples:
gulag-chief preview
gulag-chief preview --job sample-etl-pipeline
gulag-chief preview --job sample-etl-pipeline --count 10
run
Purpose: execute selected jobs immediately.
gulag-chief run [--config PATH] [--job NAME] [--respect-schedule]
Behavior:
- default: run all enabled jobs once in YAML order
--job NAME: run one enabled job--respect-schedule: run only if the job is due now
Examples:
gulag-chief run
gulag-chief run --job sample-etl-pipeline
gulag-chief run --job sample-etl-pipeline --respect-schedule
daemon
Purpose: continuous scheduler loop.
gulag-chief daemon [--config PATH] [--poll-seconds N]
Flags:
--poll-seconds N: polling interval in seconds (default10, must be>= 1)
export-cron
Purpose: print cron lines for cron-compatible schedules.
gulag-chief export-cron [--config PATH] [--job NAME]
Common CLI behavior:
- unknown
--jobname returns an explicit error runanddaemonoperate only on enabled jobspreviewcan still inspect disabled jobsrun --respect-scheduleis intended for cron-invoked runs with runtime guards
5. chief.yaml Configuration
Top-level keys:
versiondefaultsmonitorjobs
Full shape example:
version: 1
defaults:
working_dir: .
stop_on_failure: true
overlap: skip
timezone: UTC
jobs:
- name: sample-etl-pipeline
enabled: true
working_dir: .
stop_on_failure: true
overlap: skip
schedule:
frequency: daily
time: "06:00"
timezone: America/New_York
scripts:
- path: workers/sample/extract_demo.py
args: ["--source", "orders"]
timeout: 120
- path: workers/sample/transform_demo.py
- path: workers/sample/load_demo.py
defaults Keys
working_dir: default cwd for scriptsstop_on_failure: default per-job behavioroverlap: default overlap policy (skip | queue | parallel)timezone: default schedule timezone
Job Keys
name(required, unique)enabled(defaulttrue)working_dir(inherits from defaults)stop_on_failure(inherits from defaults)overlap(inherits from defaults)schedule(required)scripts(required non-empty list)monitor(optional job-level override)
Script Keys
path(required)args(optional)timeout(optional, seconds)
args supports both list and shell-style string forms.
scripts:
- path: workers/sample/load_demo.py
args:
- --input
- workers/sample/state/transformed_orders.json
- --table
- fact_orders
scripts:
- path: workers/sample/transform_demo.py
args: --input in.json --output out.json --label "nightly run"
Path behavior:
- relative script paths resolve from job
working_dir - script files must exist at validation/load time
6. Scheduling DSL (Friendly but Strict)
Every job must define exactly one schedule mode:
dailyweeklymonthlyyearlyintervalcustom
Global Schedule Modifiers
Allowed on all frequencies:
timezonestart(ISO datetime)end(ISO datetime)exclude(list ofYYYY-MM-DD)
Behavior:
- if
start/endare naive datetimes, Chief interprets them in schedule timezone - Chief will not run outside
[start, end] excludedates are enforced using local date in schedule timezone- named holiday shortcuts are not supported in v1
daily
Required:
time(HH:MM, 24-hour)
Optional:
weekdays_only(true|false)
Example:
schedule:
frequency: daily
time: "14:30"
weekdays_only: true
weekly
Required:
daytime
day supports:
- single (
monday) - comma list (
monday,wednesday,friday) - named range (
monday-friday) - YAML list (
[monday, wednesday, friday])
Example:
schedule:
frequency: weekly
day: monday-friday
time: "09:00"
monthly
Choose one style.
Style A: day of month.
schedule:
frequency: monthly
day_of_month: 15
time: "08:00"
Style B: ordinal weekday.
schedule:
frequency: monthly
ordinal: last
day: friday
time: "18:00"
Valid ordinals:
firstsecondthirdfourthlast
Rule:
day_of_monthandordinal/daycannot be mixed
yearly
Required:
monthday_of_monthtime
Example:
schedule:
frequency: yearly
month: january
day_of_month: 1
time: "00:00"
interval
Required:
everyin<number><unit>form
Supported units:
m(minutes)h(hours)d(days)
Examples:
schedule:
frequency: interval
every: 5m
schedule:
frequency: interval
every: 2h
Rules:
timeis forbidden in interval mode- seconds intervals (
s) are intentionally unsupported in v1
custom
Labeled cron-like fields:
minutehourday_of_monthmonthday_of_week
At least one field is required.
Example:
schedule:
frequency: custom
minute: 0
hour: 9
day_of_week: monday-friday
7. Compilation Modes and Runtime Semantics
Chief compiles schedules into one of three kinds:
pure_cron: fully representable by cronhybrid: cron trigger + runtime guardruntime_only: runtime scheduler logic required
Examples:
weekly friday 17:30-> pure cron (30 17 * * 5)monthly ordinal + day-> hybridinterval every: 90m-> runtime only
Job Execution Semantics
- scripts run sequentially in each job
- args are passed as parsed from YAML
- each script can have its own timeout
- if a script fails and
stop_on_failure: true, remaining scripts are skipped - if
stop_on_failure: false, job continues to remaining scripts
Daemon Semantics
- no startup catch-up (cron-like)
- deterministic ordering by YAML job order
- overlap behavior per job:
skip: drop trigger while runningqueue: keep one pending trigger while runningparallel: allow same-job concurrent runs; global scheduling stays deterministic
Timezone and DST Behavior
- schedule matching is timezone-aware
- wall-clock semantics are used
- spring-forward nonexistent times are skipped
- fall-back ambiguous times run once
8. Using Chief with Monitor
Chief Monitor is a companion service that receives telemetry events from Chief and workers, stores them, and exposes status/alerts in API and UI.
Chief emits lifecycle telemetry automatically when monitor is enabled. Worker scripts can also send custom messages via gulag_chief.monitor_client.
Telemetry delivery is best-effort and non-blocking, so job execution continues if monitor is down.
Start Monitor Service (example)
cd monitor
npm install
npm run db:migrate
npm run dev
Monitor Config in chief.yaml
monitor:
enabled: true
endpoint: http://127.0.0.1:7410
api_key: "" # optional; use env/.env in most setups
timeout_ms: 400
heartbeat_seconds: 15
buffer:
max_events: 5000
flush_interval_ms: 1000
spool_file: .chief/telemetry_spool.jsonl
Top-level monitor fields:
enabled: global telemetry on/offendpoint: monitor base URL (http://orhttps://)api_key: optional auth key sent asx-api-keytimeout_ms: HTTP send timeoutheartbeat_seconds: Chief heartbeat interval (chief.heartbeat)buffer.max_events: in-memory queue capbuffer.flush_interval_ms: flush cadencebuffer.spool_file: local JSONL fallback if endpoint is unavailable
API Key Without YAML Secrets
Chief resolves API key in this order:
monitor.api_keyin YAML if non-emptyCHIEF_MONITOR_API_KEYenvironment variableMONITOR_API_KEYenvironment variable.envin same directory aschief.yaml
Examples:
export CHIEF_MONITOR_API_KEY=your-secret
# or
export MONITOR_API_KEY=your-secret
.env example:
CHIEF_MONITOR_API_KEY=your-secret
Per-Job Monitor Override
Job monitor settings inherit from top-level monitor, then can be overridden per job.
monitor:
enabled: true
endpoint: http://127.0.0.1:7410
jobs:
- name: critical-pipeline
monitor:
enabled: true
check:
enabled: true
grace_seconds: 120
alert_on_failure: true
alert_on_miss: true
schedule:
frequency: interval
every: 5m
scripts:
- path: workers/critical.py
- name: noisy-ad-hoc-job
monitor:
enabled: false
schedule:
frequency: daily
time: "06:00"
scripts:
- path: workers/noisy.py
Override behavior:
jobs[].monitor.enableddefaults to top-levelmonitor.enabledjobs[].monitor.check.*customizes alert/check behavior per job- you can disable noisy jobs while keeping telemetry enabled globally
Per-job key reference:
jobs[].monitor.enabledControls whether Chief emits telemetry for this job and injects monitor env vars into worker scripts for this job. Default: inherits top-levelmonitor.enabled.jobs[].monitor.check.enabledControls whether the monitor check state for this job is evaluated (UP/LATE/DOWN) against expected next run timing. Default: inheritsjobs[].monitor.enabled.jobs[].monitor.check.grace_secondsAdditional allowed delay afterexpected_next_atbefore the job is markedDOWNand considered missed. Default:120(minimum0).jobs[].monitor.check.alert_on_failureIftrue, opensFAILUREalerts on failed job runs and closes them on recovery success (withRECOVERYalert). Default:true.jobs[].monitor.check.alert_on_missIftrue, opensMISSEDalerts when heartbeat is overdue beyondgrace_seconds, and closes withRECOVERYwhen heartbeat resumes. Default:true.
Worker Monitor Helper (gulag_chief.monitor_client)
Use inside workers:
from gulag_chief.monitor_client import monitor
monitor.info("worker started", step="extract")
monitor.warn("slow upstream response", latency_ms=1450)
monitor.error("load failed", table="fact_orders")
Available methods:
monitor.debug(message, **meta)monitor.info(message, **meta)monitor.warn(message, **meta)monitor.error(message, **meta)monitor.critical(message, **meta)
Chief injects context env vars into worker subprocesses:
CHIEF_MONITOR_ENDPOINTCHIEF_MONITOR_API_KEYCHIEF_RUN_IDCHIEF_JOB_NAMECHIEF_SCRIPT_PATHCHIEF_SCHEDULED_FOR
Telemetry Event Types
Chief lifecycle events:
job.startedscript.startedscript.completedjob.completedjob.failedjob.next_scheduledchief.heartbeatdaemon.dispatchdaemon.overlap_skippeddaemon.queued_pending
Worker custom event type:
worker.message
Levels:
DEBUGINFOWARNERRORCRITICAL
9. export-cron Workflow
Generate cron lines:
gulag-chief export-cron --config chief.yaml
Output includes:
CRON_TZ=<timezone>lines- cron entries for pure/hybrid schedules
- comments for runtime-only schedules
Important:
- hybrid schedules still require runtime guard enforcement
- use
gulag-chief run --respect-schedulein exported cron commands
10. Tutorial: Add a New Job
- Create your script under
workers/. - Add a job block in
chief.yamlwith schedule and scripts. - If needed, add helper telemetry in worker code.
- Validate and preview:
gulag-chief validate --config chief.yaml
gulag-chief preview --config chief.yaml --job your-job-name
- Execute once:
gulag-chief run --config chief.yaml --job your-job-name
- Move to daemon mode once stable:
gulag-chief daemon --config chief.yaml
11. Practical YAML Examples
Example 1: Daily ETL with args
jobs:
- name: ga-daily
enabled: true
schedule:
frequency: daily
time: "06:00"
timezone: America/New_York
scripts:
- path: scripts/google-analytics/google_analytics_to_supabase.py
args:
- --function
- sessions_by_channel
- --start-date
- 2026-01-01
- --end-date
- 2026-01-31
timeout: 1800
Example 2: Monthly last Friday with exclusion
jobs:
- name: monthly-report
enabled: true
overlap: queue
schedule:
frequency: monthly
ordinal: last
day: friday
time: "18:00"
timezone: UTC
exclude:
- 2026-12-25
scripts:
- path: scripts/other/offer_report_to_supabase.py
timeout: 1200
Example 3: Runtime-only interval
jobs:
- name: rolling-check
enabled: true
schedule:
frequency: interval
every: 90m
scripts:
- path: scripts/weather/weather_to_supabase.py
timeout: 600
12. Validation Rules and Common Errors
Chief enforces strict validation with explicit errors.
Key rules:
- schedule
frequencyis required and must be valid - required fields must exist for selected frequency
- conflicting fields are rejected
- time must be valid
HH:MM - timezone must be valid IANA timezone
- interval mode cannot include
time - monthly must be either
day_of_monthorordinal + day - script files must exist
Representative error:
Error: "monthly" requires either "day_of_month" or "ordinal + day".
13. Troubleshooting
ModuleNotFoundError: No module named 'pytest'
Use the same interpreter for install and run:
python -m pip install pytest
python -m pytest -q tests/test_chief.py
Do not run test files directly with python tests/test_chief.py.
Missing PyYAML or croniter
python -m pip install gulag-chief
# or from source
python -m pip install .
Config validates but nothing runs with run --respect-schedule
This is expected if current time is not due. Inspect next runs:
gulag-chief preview --job <job-name>
Monitor warnings in chief.log
If telemetry send warnings appear, jobs still execute. It indicates temporary monitor connectivity/auth issues.
Quick checks:
- Ensure monitor is running.
- Verify
monitor.endpointand API key source. - Confirm
/v1/healthis reachable.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gulag_chief-0.1.1.tar.gz.
File metadata
- Download URL: gulag_chief-0.1.1.tar.gz
- Upload date:
- Size: 35.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5435964a2e2fc2265c9bccf485078fd3f554aed91e6c6cbb5188915b1fd20a3
|
|
| MD5 |
8ab20283b6d0b4314ea60b2413b5636c
|
|
| BLAKE2b-256 |
ffaa0daef65f0823ecba5d183b1ac11287be4ee3a920f490b8fafb925844ec08
|
File details
Details for the file gulag_chief-0.1.1-py3-none-any.whl.
File metadata
- Download URL: gulag_chief-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50ebe7871124d49d55e600198e25fcd35bee15af26d2bef1476ca586cb11e961
|
|
| MD5 |
3062bfa4b72bb2a2905c739451af8734
|
|
| BLAKE2b-256 |
8fce66c1faffd75ae5158c1f28f46a3d5489ef481d33bf71e752a9d4b620f514
|