Rundial Python SDK (non-blocking ingest with bounded spool and ergonomic run API)
Project description
rundial
pip install rundial
Phase 4 introduces a non-blocking metrics transport with:
- bounded in-memory queue on the training thread
- background flush worker
- bounded disk spool (default enabled)
- gzip compression in worker transport (threshold-based)
- retry with exponential backoff + jitter
- diagnostics counters for dropped/accepted/retried points
CLI quickstart
Installing rundial also installs the rundial CLI:
rundial init --endpoint http://127.0.0.1:8787
rundial auth whoami
rundial target ls
rundial doctor
Operational commands:
rundial workspace ls
rundial project ls --workspace default-workspace
rundial run start --workspace default-workspace --project default-project --name baseline-001 --kind training
rundial run list --workspace default-workspace --project default-project --status running
rundial run status run_...
rundial run finish run_... --state completed
rundial metrics tail run_... --workspace default-workspace --project default-project --keys train/loss
rundial metrics export run_... \
--workspace default-workspace \
--project default-project \
--keys train/loss \
--format csv \
--out metrics.csv
rundial logs tail run_... --workspace default-workspace --project default-project --min-level info
rundial logs export run_... \
--workspace default-workspace \
--project default-project \
--format json \
--out logs.json
All commands accept global --json output. Exit codes are stable: 0 success, 1 command
or transport error, and 2 authentication/authorization failure.
CLI operational smoke, with the open-core stack running:
RUNDIAL_API_KEY=rdk_... python user_tests/cli_operational_parity_smoke.py \
--workspace default-workspace \
--project default-project
Start a run with workspace/project strings only:
rundial run start \
--endpoint http://127.0.0.1:8787 \
--workspace default-workspace \
--project default-project \
--name baseline-001 \
--kind training
Config precedence:
- CLI flags
- env vars (
RUNDIAL_API_KEY,RUNDIAL_ENDPOINT,RUNDIAL_WORKSPACE,RUNDIAL_PROJECT) ~/.config/rundial/config.toml
Quick start (recommended)
import rundial as rd
with rd.init(
workspace="team-alpha",
project="mnist-demo",
name="baseline-001",
kind="training",
endpoint="http://127.0.0.1:8787",
api_key="rdk_...",
mode="online",
) as run:
run.log({"train/loss": 0.42, "train/acc": 0.91}, step=1)
run.log_metric("eval/loss", value=0.31, time_ms=1_760_000_000_000)
run.log_text("starting eval loop", level="info")
run.checkpoint("checkpoints/model.pt", step=1)
# `run.finish()` / `run.close()` finalize the run as `completed`.
# Use `run.fail(...)` or `run.abort(...)` for explicit terminal outcomes.
This slug-first mode resolves workspace/project to the canonical internal run target before run start.
Use kind="agent" or kind="eval" for agent and evaluation runs; the default is
kind="training".
Logs and console capture
run.log_text(message, level="info") shares the same bounded, non-blocking queue as metric
logging. Messages are capped at 8 KiB, truncated lines are flagged, and queue drops are visible
through run.diagnostics().
import rundial as rd
with rd.init(
workspace="team-alpha",
project="mnist-demo",
name="logs-demo",
endpoint="http://127.0.0.1:8787",
api_key="rdk_...",
capture_console=True,
) as run:
print("stdout is mirrored into Rundial logs")
run.log_text("manual warning", level="warn")
capture_console=True tees stdout as info and stderr as error. The caller still writes to
the original stream, and Rundial drops-and-counts when the bounded queue is full instead of
blocking the training process.
Lightweight traces
Trace spans use the same non-blocking ingest worker and disk spool as metrics and logs. Attributes and events are normalized in the worker; large prompt, completion, or tool-output values above 16 KiB are uploaded through the artifact pipe and replaced on the span with a small evidence reference.
with rd.init(
workspace="team-alpha",
project="mnist-demo",
name="agent-demo",
kind="agent",
endpoint="http://127.0.0.1:8787",
api_key="rdk_...",
) as run:
with run.trace("planner.step", attrs={"phase": "plan"}) as span:
span.event("prompt.ready", {"tokens": 128})
span.set_attrs({"model": "example-model"})
run.tool_call("search", input={"q": "Ada Lovelace"}, output="large tool output...")
Artifacts and checkpoints
run.log_artifact(path_or_dir, name="checkpoint") enqueues artifact work and returns before
hashing or uploading files. A dedicated background uploader handles manifest hashing,
pre-signed upload URLs, multipart uploads for large files, and finalization without sharing
the metrics/log worker.
with rd.init(workspace="team-alpha", project="mnist-demo", api_key="rdk_...") as run:
run.log_artifact("outputs/eval-report", name="eval-report")
run.checkpoint("checkpoints/model.pt", step=100, keep_last=5)
rd.checkpoint("checkpoints/model.pt", step=101)
Artifact upload jobs are journaled in the SDK spool directory and retried by the next client
process if an upload is interrupted. run.checkpoint(...) and the current-run convenience
rd.checkpoint(...) use artifact type checkpoint, alias latest, and a server-enforced
keep-last retention policy. The API default is to keep the latest 5 finalized checkpoints
per run/name when a client omits the hint; pass keep_last=K to tune it for a checkpoint
call.
To consume an artifact from another run, record lineage and download through a blocking handle:
with rd.init(workspace="team-alpha", project="mnist-demo", api_key="rdk_...") as run:
artifact = run.use_artifact("checkpoint:latest")
artifact.download("inputs/checkpoint")
Lineage UI is still in progress for the v1 artifact milestone.
Media
run.log(...) accepts image and table helper values for common visual inspection workflows.
Media bytes ride the artifact uploader, while Rundial stores only a bounded manifest row for
querying and display.
with rd.init(workspace="team-alpha", project="mnist-demo", api_key="rdk_...") as run:
run.log({"samples": rd.Image("outputs/sample-grid.png", caption="validation samples")}, step=10)
run.log(
{
"predictions": rd.Table(
columns=["id", "label", "score"],
rows=[["img-1", "cat", 0.91], ["img-2", "dog", 0.87]],
)
},
step=10,
)
rd.Image(...) accepts filesystem paths, PIL-like objects with save(...), and uint8
numpy-like arrays shaped (height, width), (height, width, 1), (height, width, 3), or
(height, width, 4). Array and PIL-like serialization happens in the artifact worker, not
inside run.log(...). File-backed media jobs are replayable through the artifact journal;
generated media is best-effort until the worker materializes the generated file.
Framework Integrations
Install optional framework adapters only when you need them:
pip install "rundial[integrations]"
| Framework | Import | What it maps |
|---|---|---|
| PyTorch Lightning | from rundial.integrations import RundialLogger |
hyperparams to run config, metrics to run.log(...), checkpoints to artifacts |
| Hugging Face Transformers | from rundial.integrations import RundialCallback |
Trainer args/model config to run config, logs/eval metrics to run.log(...), saved checkpoints to artifacts |
| Keras | from rundial.integrations import RundialKerasCallback |
fit/optimizer params to run config, epoch/batch metrics to run.log(...), checkpoint paths to artifacts |
The base rundial install has no hard framework dependencies. Adapter imports remain safe
without Lightning, Transformers, or Keras installed; installing the extra provides the native
callback base classes for framework type checks.
W&B Compatibility
For common W&B-style training scripts, swap only the import line:
import rundial.compat.wandb as wandb
The shim supports wandb.init, wandb.log, wandb.config, wandb.finish, run.summary,
wandb.Image, wandb.Table, wandb.watch, wandb.define_metric, and wandb.login.
Unsupported symbols raise NotImplementedError with a pointer to the compatibility table in
docs/wandb-compat.md.
Resume existing runs
Use run_id with an explicit resume mode when restarting a crashed or interrupted job:
import rundial as rd
with rd.init(
workspace="team-alpha",
project="mnist-demo",
run_id="run_abc123",
resume="allow",
endpoint="http://127.0.0.1:8787",
api_key="rdk_...",
) as run:
run.log({"train/loss": 0.38}, step=50)
Resume modes:
resume="never"(default): createrun_idonly if it does not already exist.resume="allow": attach to a running run or create it if missing; terminal runs are not reopened.resume="must": require an existing run; terminal runs are explicitly reopened asrunning.
Duplicate steps are resolved at query time. Rundial keeps raw metric rows append-only, but series
queries show the latest accepted value per (runId, metricKey, step) using ingest time, with a
stable row-id tie breaker. This keeps training-loop ingest fast while resumed curves remain
monotonic by step.
Discovery helpers
import rundial as rd
client = rd.Client(
endpoint="http://127.0.0.1:8787",
api_key="rdk_...",
spool_enabled=False,
start_worker_on_init=False,
)
print(client.whoami())
print(client.list_workspaces())
print(client.list_projects("default-workspace"))
client.close(timeout_seconds=0.1, drain=False)
If the server does not expose /api/v1/runs/resolve-target, slug-first run start fails with an actionable stale-build error. Rebuild/restart API and retry.
Runtime notes
run.log()/run.log_metric()are non-blocking and never perform network or disk I/O.run.log_text()and opt-in console capture use the same non-blocking queue and exposelog_lines_truncated,dropped_log_lines_queue_full, anddropped_log_lines_invaliddiagnostics.- system metrics are sampled by a background thread by default and logged as ordinary
system/*metrics; passsystem_metrics=Falsetord.init(...)to opt out, orsystem_metrics_interval_seconds=...to tune the cadence (minimum 2 seconds). run.finish()/run.close()flush and finalize the run; useclient.close(...)when you only want to release the client transport.- NaN and infinite metric values are dropped without raising, counted in
run.diagnostics().non_finite_dropped, and warn once per metric key. - disk spool is enabled by default at
.rundial_spooland is bounded by size/age. - if disk spool writes fail, fallback memory buffering stays bounded and drops oldest points.
close()returns within the requested timeout plus a bounded transport wait; when it cannot send all pending points before the deadline, un-sent points are handed to the disk spool and re-sent by the next process.run.diagnostics().pending_spooled_batchesreports durable batches waiting for delivery.- worker transport can gzip large payloads (
gzip_enabled,gzip_min_bytes). - use
run.diagnostics()to inspect queue pressure, retries, and drop counters. - modes:
online(default): upload in background with retries/spool fallbackoffline: buffer to spool only (no upload attempts)disabled: safe no-op logging for tests and dry-runs
- distributed policy:
distributed="rank0"(default): only rank 0 emits logsdistributed="all": all ranks emit logs (use with caution for cardinality/volume)
- rank detection uses common env vars (
RANK,LOCAL_RANK,SLURM_PROCID, etc.); override explicitly withdistributed_rank=<int>.
Backward-compatible low-level API
from rundial_sdk import RundialClient
RundialClient remains supported for advanced/manual lifecycle control.
Benchmark guardrail
Run the Phase 4 benchmark/guardrail script:
bun run test:phase4:sdk:benchmark
The command validates hot-path latency and bounded spool behavior under sustained retryable failures.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rundial-1.0.0rc1.tar.gz.
File metadata
- Download URL: rundial-1.0.0rc1.tar.gz
- Upload date:
- Size: 88.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
875a95342f39b7543a3b8a95c706e0eb527152c651065f1e544a0d49cc1de571
|
|
| MD5 |
53973703b49ba86623e892f0760ff19e
|
|
| BLAKE2b-256 |
4908c8e3775834417e04369e9b13be7edc176288d3d9f287a26fdbf985eda81b
|
Provenance
The following attestation bundles were made for rundial-1.0.0rc1.tar.gz:
Publisher:
release.yml on rundial-dev/rundial
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rundial-1.0.0rc1.tar.gz -
Subject digest:
875a95342f39b7543a3b8a95c706e0eb527152c651065f1e544a0d49cc1de571 - Sigstore transparency entry: 1853111744
- Sigstore integration time:
-
Permalink:
rundial-dev/rundial@607f84e55e3753aba58bd48e2674c2b0b29d62b2 -
Branch / Tag:
refs/tags/v1.0.0-rc.1 - Owner: https://github.com/rundial-dev
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
self-hosted -
Publication workflow:
release.yml@607f84e55e3753aba58bd48e2674c2b0b29d62b2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rundial-1.0.0rc1-py3-none-any.whl.
File metadata
- Download URL: rundial-1.0.0rc1-py3-none-any.whl
- Upload date:
- Size: 94.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42dd3d2eb6ca9903d8bd0149de1c2ae8bd5048778995989193e9840854b45468
|
|
| MD5 |
e74d282690261261f66bca6927feb0c1
|
|
| BLAKE2b-256 |
4e90260b06dfffc559adbce777ac202b211215104e6ffd52b5d2ab4f892dd581
|
Provenance
The following attestation bundles were made for rundial-1.0.0rc1-py3-none-any.whl:
Publisher:
release.yml on rundial-dev/rundial
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rundial-1.0.0rc1-py3-none-any.whl -
Subject digest:
42dd3d2eb6ca9903d8bd0149de1c2ae8bd5048778995989193e9840854b45468 - Sigstore transparency entry: 1853111870
- Sigstore integration time:
-
Permalink:
rundial-dev/rundial@607f84e55e3753aba58bd48e2674c2b0b29d62b2 -
Branch / Tag:
refs/tags/v1.0.0-rc.1 - Owner: https://github.com/rundial-dev
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
self-hosted -
Publication workflow:
release.yml@607f84e55e3753aba58bd48e2674c2b0b29d62b2 -
Trigger Event:
push
-
Statement type: