Skip to main content

Interact with wandb from python

Project description

dr_wandb

A command-line utility for downloading and archiving Weights & Biases experiment data to local storage formats optimized for offline analysis.

Installation

CLI Tool Install: wandb-downloader

uv tool install dr_wandb

Or, to use the library functions

# To use the library functions
uv add dr_wandb
# Optionally
uv add dr_wandb[postgres]
uv sync

Authentication

Configure Weights & Biases authentication using one of these methods:

wandb login

Or set the API key as an environment variable:

export WANDB_API_KEY=your_api_key_here

Quickstart

The default approach doesn't involve postgres. It fetches the runs, and optionally histories, and dumps them to local pkl files.

» wandb-download --help

 Usage: wandb-download [OPTIONS] ENTITY PROJECT OUTPUT_DIR

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    entity          TEXT  [required]                                                                                                                            │
│ *    project         TEXT  [required]                                                                                                                            │
│ *    output_dir      TEXT  [required]                                                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --runs-only             --no-runs-only             [default: no-runs-only]                                                                                       │
│ --runs-per-page                           INTEGER  [default: 500]                                                                                                │
│ --log-every                               INTEGER  [default: 20]                                                                                                 │
│ --install-completion                               Install completion for the current shell.                                                                     │
│ --show-completion                                  Show completion for the current shell, to copy it or customize the installation.                              │
│ --help                                             Show this message and exit.                                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

An example:

» wandb-download --runs-only "ml-moe" "ft-scaling" "./data"                                          1 2025-11-10 21:47:54 - INFO -
:: Beginning Dr. Wandb Project Downloading Tool ::

2025-11-10 21:47:54 - INFO - {
    "entity": "ml-me",
    "project": "scaling",
    "output_dir": "data",
    "runs_only": true,
    "runs_per_page": 500,
    "log_every": 20,
    "runs_output_filename": "ml-me_scaling_runs.pkl",
    "histories_output_filename": "ml-me_scaling_histories.pkl"
}
2025-11-10 21:47:54 - INFO -
2025-11-10 21:47:54 - INFO - >> Downloading runs, this will take a while (minutes)
wandb: Currently logged in as: danielle-rothermel (ml-moe) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
2025-11-10 21:48:00 - INFO -   - total runs found: 517
2025-11-10 21:48:00 - INFO - >> Serializing runs and maybe getting histories: False
2025-11-10 21:48:07 - INFO - >> 20/517: 2025_08_21-08_24_43_test_finetune_DD-dolma1_7-10M_main_1Mtx1_--learning_rate=5e-05
2025-11-10 21:48:12 - INFO - >> 40/517: 2025_08_21-08_24_43_test_finetune_DD-dolma1_7-150M_main_10Mtx1_--learning_rate=5e-06
...
2025-11-10 21:50:46 - INFO - >> Dumped runs data to: ./data/ml-moe_ft-scaling_runs.pkl
2025-11-10 21:50:46 - INFO - >> Runs only, not dumping histories to: ./data/ml-moe_ft-scaling_histories.pkl

Very Alpha: Postgres Version

Its very likely this won't currently work. Download all runs from a Weights & Biases project:

uv run python src/dr_wandb/cli/postres_download.py --entity your_entity --project your_project

Options:
  --entity TEXT        WandB entity (username or team name)
  --project TEXT       WandB project name
  --runs-only          Download only run metadata, skip training history
  --force-refresh      Download all data, ignoring existing records
  --db-url TEXT        PostgreSQL connection string
  --output-dir TEXT    Directory for exported Parquet files
  --help              Show help message and exit

The tool creates a PostgreSQL database, downloads experiment data, and exports Parquet files to the configured output directory. It tool tracks existing data and downloads only new or updated runs by default. A run is considered for update if:

  • It does not exist in the local database
  • Its state is "running" (indicating potential new data)

Use --force-refresh to download all runs regardless of existing data.

Environment Variables

The tool reads configuration from environment variables with the DR_WANDB_ prefix and supports .env files:

Variable Description Default
DR_WANDB_ENTITY Weights & Biases entity name None
DR_WANDB_PROJECT Weights & Biases project name None
DR_WANDB_DATABASE_URL PostgreSQL connection string postgresql+psycopg2://localhost/wandb
DR_WANDB_OUTPUT_DIR Directory for exported files ./data

Database Configuration

The PostgreSQL connection string follows the standard format:

postgresql+psycopg2://username:password@host:port/database_name

If the specified database does not exist, the tool will attempt to create it automatically.

Data Schema

The tool generates the following files in the output directory:

  • runs_metadata.parquet - Complete run metadata including configurations, summaries, and system information
  • runs_history.parquet - Training metrics and logged values over time
  • runs_metadata_{component}.parquet - Component-specific files for config, summary, wandb_metadata, system_metrics, system_attrs, and sweep_info

Run Records

  • run_id: Unique identifier for the experiment run
  • run_name: Human-readable name assigned to the run
  • state: Current state (finished, running, crashed, failed, killed)
  • project: Project name
  • entity: Entity name
  • created_at: Timestamp of run creation
  • config: Experiment configuration parameters (JSONB)
  • summary: Final metrics and outputs (JSONB)
  • wandb_metadata: Platform-specific metadata (JSONB)
  • system_metrics: Hardware and system information (JSONB)
  • system_attrs: Additional system attributes (JSONB)
  • sweep_info: Hyperparameter sweep information (JSONB)

Training History Records

  • run_id: Reference to the parent run
  • step: Training step number
  • timestamp: Time of metric logging
  • runtime: Elapsed time since run start
  • wandb_metadata: Platform logging metadata (JSONB)
  • metrics: All logged metrics and values (JSONB, flattened in Parquet export)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dr_wandb-0.1.2.tar.gz (58.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dr_wandb-0.1.2-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file dr_wandb-0.1.2.tar.gz.

File metadata

  • Download URL: dr_wandb-0.1.2.tar.gz
  • Upload date:
  • Size: 58.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for dr_wandb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 07a5c353b4041d9288b0675af8ee82834102fc3d5701f33fd16ec9905d1a3a2c
MD5 d26db4ada6846d0dbd9a7f8e26e7f20b
BLAKE2b-256 83c4704fb802dc09efd6cb2fc7dac94337214e778732cacb16602f825e1e34e6

See more details on using hashes here.

File details

Details for the file dr_wandb-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dr_wandb-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for dr_wandb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 83c388e4f00979d208f8adbce9d01d6177304dcab1be05c0b1fb6f8ae0fdb37a
MD5 a91ba016d1f2b70dc50657aa0ed6c501
BLAKE2b-256 8abb67c07efa8632b006d5c9120c6b20f66866fad14fbde9f28815db424182de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page