Skip to main content

Policy-driven sync and export toolkit for W&B runs

Project description

dr-wandb

Policy-driven sync, export, and update tooling for Weights & Biases.

Installation

# CLI tool
uv tool install dr-wandb

# Or as a library
uv add dr-wandb

Authentication

wandb login
# or
export WANDB_API_KEY=your_api_key_here

CLI

Export canonical project data

wandb-export ENTITY PROJECT OUTPUT_DIR [OPTIONS]

Options:
  --output-format  [parquet|jsonl]  Output format (default: parquet)
  --fetch-mode     [incremental|full_reconcile]
                                      Run selection mode (default: incremental)
  --runs-per-page  INTEGER          Runs fetched per API call (default: 500)
  --state-path     TEXT             Optional explicit sync state path
  --save-every     INTEGER          Persist state every N runs (default: 25)
  --checkpoint-every-runs INTEGER   Write checkpoint chunk every N runs (default: 25)
  --no-incremental                  Disable checkpointed export (legacy single-shot output)
  --no-finalize-compact             Keep checkpoint chunks only (skip compact final tables)
  --inspection-sample-rows INTEGER  Sample size for per-checkpoint inspection stats (default: 5)
  --policy-module  TEXT             Policy module (default: dr_wandb.sync_policy)
  --policy-class   TEXT             Policy class (default: NoopPolicy)
  --output-json    TEXT             Optional summary output path

wandb-export now uses incremental checkpointing by default. During export it writes:

  • OUTPUT_DIR/_checkpoints/runs/chunk-*.parquet
  • OUTPUT_DIR/_checkpoints/history/chunk-*.parquet
  • OUTPUT_DIR/_checkpoints/manifest.json
  • OUTPUT_DIR/_checkpoints/inspection.jsonl

The job can resume after interruption using the same --state-path, and final compact outputs are deduplicated from checkpoint chunks.

--fetch-mode incremental is now the default for wandb-export, wandb-sync, and wandb-plan-patches. In that mode dr_wandb:

  • fetches newly created runs with a createdAt >= last_seen_created_at filter;
  • revisits only runs that are still marked non-terminal in the saved state;
  • avoids history scans unless the active policy explicitly requests them.

Use --fetch-mode full_reconcile to force a full project rescan.

To iteratively update the existing ml-moe/moe export on this machine, rerun:

uv run wandb-export \
  ml-moe moe \
  /Users/daniellerothermel/drotherm/repos/ml-moe/data/wandb_export \
  --state-path /Users/daniellerothermel/drotherm/repos/ml-moe/data/.sync/ml_moe_moe_state.json \
  --output-json /Users/daniellerothermel/drotherm/repos/ml-moe/data/.sync/last_export_summary.json

Sync + patch workflows

wandb-sync ENTITY PROJECT --policy-module my_pkg.my_policy --policy-class MyPolicy
wandb-bootstrap-export ENTITY PROJECT ./old_export ./new_export --policy-module my_pkg.my_policy --policy-class MyPolicy
wandb-inspect-state ENTITY PROJECT --state-path ./state.json
wandb-plan-patches ENTITY PROJECT ./patches.jsonl --policy-module my_pkg.my_policy --policy-class MyPolicy
wandb-apply-patches ./patches.jsonl            # dry-run
wandb-apply-patches ./patches.jsonl --apply    # writes updates

wandb-sync and wandb-plan-patches also default to --fetch-mode incremental. Pass --fetch-mode full_reconcile when you need a full project rescan.

wandb-bootstrap-export reads an existing compact export (*_runs.*, *_history.*), rebuilds sync state locally, reapplies the active policy, and seeds a fresh output directory with a single merged checkpoint baseline. It now streams large history tables instead of materializing the whole history export in memory. Use --overwrite-output when you want to replace an existing bootstrap target directory or state file.

wandb-inspect-state reads the saved sync state and reports tracked run counts by status, including terminal, ignore, and non-terminal runs. Use --show-runs non_terminal|ignore|terminal --limit N when you want a small sample of the matching runs.

Library usage

from pathlib import Path

from dr_wandb.sync_engine import SyncEngine
from dr_wandb.sync_policy import NoopPolicy
from dr_wandb.sync_types import ExportConfig

engine = SyncEngine(policy=NoopPolicy())
summary = engine.export_project(
    ExportConfig(
        entity="my-team",
        project="my-project",
        output_dir=Path("./data"),
        output_format="parquet",
    )
)

print(summary.run_count, summary.history_count)

Loading extracted run metadata

Use the loader functions in dr_wandb.run_metadata when reading runs_raw.jsonl exports back in. Both loaders return only the latest raw snapshot per run_id, ordered by createdAt descending.

from pathlib import Path

from dr_wandb.run_metadata import (
    load_canonical_run_metadata,
    load_raw_run_record_dicts,
)

data_root = Path("./data")

canonical_runs = load_canonical_run_metadata(
    export_name="moe_runs",
    data_root=data_root,
)

raw_run_dicts = load_raw_run_record_dicts(
    export_name="moe_runs",
    data_root=data_root,
)

Choose the loader based on the shape you want:

  • load_canonical_run_metadata(...) returns CanonicalRunMetadata models with promoted fields like name, config, summaryMetrics, and historyKeys.
  • load_raw_run_record_dicts(...) returns RawRunRecord-shaped dictionaries with the original outer record fields: run_id, entity, project, exported_at, raw_run_hash, and raw_run.

These are the supported read paths for run metadata exports. They intentionally deduplicate multiple raw rows for the same run and should be preferred over line-by-line access to runs_raw.jsonl.

Core concepts

SyncPolicy

A policy controls data retrieval and decision logic:

  • select_history_keys(ctx)
  • select_history_window(ctx)
  • classify_run(ctx, history_tail)
  • infer_patch(ctx, history_tail)
  • should_update(ctx, patch)
  • is_terminal(ctx, decision)
  • on_error(ctx, exc)

Canonical export outputs

wandb-export writes:

  • runs table: one row per run with run payload + policy/cursor fields
  • history table: one row per history event with _step/_timestamp/_runtime/_wandb + metric payload
  • manifest JSON: schema/version, policy identity, counts, and file paths

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dr_wandb-1.1.1.tar.gz (99.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dr_wandb-1.1.1-py3-none-any.whl (48.5 kB view details)

Uploaded Python 3

File details

Details for the file dr_wandb-1.1.1.tar.gz.

File metadata

  • Download URL: dr_wandb-1.1.1.tar.gz
  • Upload date:
  • Size: 99.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.0

File hashes

Hashes for dr_wandb-1.1.1.tar.gz
Algorithm Hash digest
SHA256 97f6af724a7d4d8cd7487289c5862e5589854e30c6cf3c1ba8e310257b95f8ce
MD5 c87b25b2263ce105e3c30237f423ce2b
BLAKE2b-256 9f8e07eb2c95df7da7246842f2d728c7d08b69c0f9e08ca27dc3c141e5f51fc0

See more details on using hashes here.

File details

Details for the file dr_wandb-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: dr_wandb-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 48.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.0

File hashes

Hashes for dr_wandb-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fa6e9be69e6face6f17f91e42d359ddc4b8d0a1305522eea3e8d0adffe7a7f07
MD5 9d3b26d783e3f759766b1e23ad7c7461
BLAKE2b-256 ecc2bab5c9d01e35fed0981f4465d7a7281246c717fa018e35b8158cb4627658

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page