Skip to main content

Policy-driven sync and export toolkit for W&B runs

Project description

dr-wandb

Policy-driven sync, export, and update tooling for Weights & Biases.

Installation

# CLI tool
uv tool install dr-wandb

# Or as a library
uv add dr-wandb

Authentication

wandb login
# or
export WANDB_API_KEY=your_api_key_here

CLI

Export canonical project data

wandb-export ENTITY PROJECT OUTPUT_DIR [OPTIONS]

Options:
  --output-format  [parquet|jsonl]  Output format (default: parquet)
  --fetch-mode     [incremental|full_reconcile]
                                      Run selection mode (default: incremental)
  --runs-per-page  INTEGER          Runs fetched per API call (default: 500)
  --state-path     TEXT             Optional explicit sync state path
  --save-every     INTEGER          Persist state every N runs (default: 25)
  --checkpoint-every-runs INTEGER   Write checkpoint chunk every N runs (default: 25)
  --no-incremental                  Disable checkpointed export (legacy single-shot output)
  --no-finalize-compact             Keep checkpoint chunks only (skip compact final tables)
  --inspection-sample-rows INTEGER  Sample size for per-checkpoint inspection stats (default: 5)
  --policy-module  TEXT             Policy module (default: dr_wandb.sync_policy)
  --policy-class   TEXT             Policy class (default: NoopPolicy)
  --output-json    TEXT             Optional summary output path

wandb-export now uses incremental checkpointing by default. During export it writes:

  • OUTPUT_DIR/_checkpoints/runs/chunk-*.parquet
  • OUTPUT_DIR/_checkpoints/history/chunk-*.parquet
  • OUTPUT_DIR/_checkpoints/manifest.json
  • OUTPUT_DIR/_checkpoints/inspection.jsonl

The job can resume after interruption using the same --state-path, and final compact outputs are deduplicated from checkpoint chunks.

--fetch-mode incremental is now the default for wandb-export, wandb-sync, and wandb-plan-patches. In that mode dr_wandb:

  • fetches newly created runs with a createdAt >= last_seen_created_at filter;
  • revisits only runs that are still marked non-terminal in the saved state;
  • avoids history scans unless the active policy explicitly requests them.

Use --fetch-mode full_reconcile to force a full project rescan.

To iteratively update the existing ml-moe/moe export on this machine, rerun:

uv run wandb-export \
  ml-moe moe \
  /Users/daniellerothermel/drotherm/repos/ml-moe/data/wandb_export \
  --state-path /Users/daniellerothermel/drotherm/repos/ml-moe/data/.sync/ml_moe_moe_state.json \
  --output-json /Users/daniellerothermel/drotherm/repos/ml-moe/data/.sync/last_export_summary.json

Sync + patch workflows

wandb-sync ENTITY PROJECT --policy-module my_pkg.my_policy --policy-class MyPolicy
wandb-bootstrap-export ENTITY PROJECT ./old_export ./new_export --policy-module my_pkg.my_policy --policy-class MyPolicy
wandb-inspect-state ENTITY PROJECT --state-path ./state.json
wandb-plan-patches ENTITY PROJECT ./patches.jsonl --policy-module my_pkg.my_policy --policy-class MyPolicy
wandb-apply-patches ./patches.jsonl            # dry-run
wandb-apply-patches ./patches.jsonl --apply    # writes updates

wandb-sync and wandb-plan-patches also default to --fetch-mode incremental. Pass --fetch-mode full_reconcile when you need a full project rescan.

wandb-bootstrap-export reads an existing compact export (*_runs.*, *_history.*), rebuilds sync state locally, reapplies the active policy, and seeds a fresh output directory with a single merged checkpoint baseline. It now streams large history tables instead of materializing the whole history export in memory. Use --overwrite-output when you want to replace an existing bootstrap target directory or state file.

wandb-inspect-state reads the saved sync state and reports tracked run counts by status, including terminal, ignore, and non-terminal runs. Use --show-runs non_terminal|ignore|terminal --limit N when you want a small sample of the matching runs.

Library usage

from pathlib import Path

from dr_wandb.sync_engine import SyncEngine
from dr_wandb.sync_policy import NoopPolicy
from dr_wandb.sync_types import ExportConfig

engine = SyncEngine(policy=NoopPolicy())
summary = engine.export_project(
    ExportConfig(
        entity="my-team",
        project="my-project",
        output_dir=Path("./data"),
        output_format="parquet",
    )
)

print(summary.run_count, summary.history_count)

Core concepts

SyncPolicy

A policy controls data retrieval and decision logic:

  • select_history_keys(ctx)
  • select_history_window(ctx)
  • classify_run(ctx, history_tail)
  • infer_patch(ctx, history_tail)
  • should_update(ctx, patch)
  • is_terminal(ctx, decision)
  • on_error(ctx, exc)

Canonical export outputs

wandb-export writes:

  • runs table: one row per run with run payload + policy/cursor fields
  • history table: one row per history event with _step/_timestamp/_runtime/_wandb + metric payload
  • manifest JSON: schema/version, policy identity, counts, and file paths

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dr_wandb-1.1.0.tar.gz (98.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dr_wandb-1.1.0-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file dr_wandb-1.1.0.tar.gz.

File metadata

  • Download URL: dr_wandb-1.1.0.tar.gz
  • Upload date:
  • Size: 98.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.0

File hashes

Hashes for dr_wandb-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f86275e88b4b82838ffb74c6da0c43de71692b48b447168e9aae4e32cf2a9e26
MD5 1b03fc36da7c2c80a5cea0e37ec2733c
BLAKE2b-256 46c8cf783be73f73972b5da26665bde43a1aabfe00080bea78849ab94419975c

See more details on using hashes here.

File details

Details for the file dr_wandb-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: dr_wandb-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.0

File hashes

Hashes for dr_wandb-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f214d90376b6ab7cefc60842eeda3f5d3a7a0946a5d8eaccf5792d466e999a1
MD5 529a586b96e1208341907044729bcd4c
BLAKE2b-256 4b1e0d66a84519a7b3ea1d3f75da83af16df30b0835b06f4986a783f7e5d3afa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page