Skip to main content

Continuously sync offline wandb runs

Project description

Wandb Offline Sync

Continuously sync offline wandb runs.

Why?

If you work on computing nodes without internet access, you can use wandb in offline mode to log your runs. Normally you would sync an offline run to wandb at the end of the run, but if you have a long running job, you may want to sync your run during job's execution.

How it works

This project has two components:

  • Farm: a https server that listens for sync requests and syncs the runs to wandb.
  • Agent: a python module that you use in your code to request the syncs to the farm.

Quickstart

  • Install this package: pip install wandb-offline-sync
  • Generate a SSL certificate for the sync farm. You can use the command: openssl req -newkey rsa:4096 -nodes -keyout key.pem -x509 -days 365 -out cert.pem. This command will create two files: cert.pem and key.pem.
  • Run export WANDB_SYNC_FARM_USERNAME=<your_username>; export WANDB_SYNC_FARM_PASSWORD=<your_password> to set the username and password for the sync farm. You can also put these commands in your .bashrc file. Replace <your_username> and <your_password> with your credentials. If these variables are not set, the farm will use the default credentials ("user", "pass")
  • Run the farm with the command wandb_sync_farm --cert=<path_to_cert.pem> --key=<path_to_key.pem> in a node with internet connection. The farm will listen for sync requests. Run wandb_sync_farm --help to see all the available options.
  • Run export WANDB_SYNC_FARM_HOST=<sync_farm_ip_address_or_hostname>; export WANDB_SYNC_FARM_PORT=<sync_farm_port> to set the hostname and port of the sync farm. You can also put these commands in your .bashrc file. These variables will be used by the agent.

In the code of your job:

  • Import the agent: from wandb_offline_sync import agent
  • After calling wandb.init(...), initialize the agent with: agent.init(...). You can pass a frequency argument to set the minimum time between syncs (in seconds). For example, if you set frequency=60, the agent will request a sync at most once per minute. The default value for the frequency is 5 minutes.

Each time wandb.log(...) is called, the agent will check if the minimum time interval (given by the frequency) between syncs has passed, and if so, it will request a sync to the farm.

When wandb.finish() is called, the agent will wait some seconds (default is 30) to ensure that previous syncs have been completed, and then it will request a final sync to the farm.

Notes

  • If you run the farm with --verbose, it may happen that in the first minutes of the run, the output shows the error .wandb file is empty, and the run is not synced to the wandb server. Don't worry, after some minutes the data of the run will be available and will be synced to the wandb server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wandb_offline_sync-0.0.14.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wandb_offline_sync-0.0.14-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file wandb_offline_sync-0.0.14.tar.gz.

File metadata

  • Download URL: wandb_offline_sync-0.0.14.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for wandb_offline_sync-0.0.14.tar.gz
Algorithm Hash digest
SHA256 1ee46317d0f88b8f395ba419f45e348bc20080752d01ae1e3a72192a67bb1363
MD5 fa816c006d4edccfe6581ee6662d717f
BLAKE2b-256 dc8de666fed17a79b2f770aa14b295ddc5c985fb78bd1964b35682ae1995f7ea

See more details on using hashes here.

File details

Details for the file wandb_offline_sync-0.0.14-py3-none-any.whl.

File metadata

File hashes

Hashes for wandb_offline_sync-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 b856d9960d175ca0bd825d99bbbf54efee02cb22e10a65e8a924dcf02ca6728c
MD5 1c5d0bc133f9152faa14b093ca1dcc20
BLAKE2b-256 77b6ab97189a611e35878c03dd2b6381b20480e18465c0398fc3bcdb90e47336

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page