Continuously sync offline wandb runs
Project description
Wandb Offline Sync
Continuously sync offline wandb runs.
Why?
If you work on computing nodes without internet access, you can use wandb in offline mode to log your runs. Normally you would sync an offline run to wandb at the end of the run, but if you have a long running job, you may want to sync your run during job's execution.
How it works
This project has two components:
- Farm: a https server that listens for sync requests and syncs the runs to wandb.
- Agent: a python module that you use in your code to request the syncs to the farm.
Quickstart
- Install this package:
pip install wandb-offline-sync - Generate a SSL certificate for the sync farm. You can use the command:
openssl req -newkey rsa:4096 -nodes -keyout key.pem -x509 -days 365 -out cert.pem. This command will create two files:cert.pemandkey.pem. - Run
export WANDB_SYNC_FARM_USERNAME=<your_username>; export WANDB_SYNC_FARM_PASSWORD=<your_password>to set the username and password for the sync farm. You can also put these commands in your.bashrcfile. Replace<your_username>and<your_password>with your credentials. If these variables are not set, the farm will use the default credentials("user", "pass") - Run the farm with the command
wandb_sync_farm --cert=<path_to_cert.pem> --key=<path_to_key.pem>in a node with internet connection. The farm will listen for sync requests. Runwandb_sync_farm --helpto see all the available options. - Run
export WANDB_SYNC_FARM_HOST=<sync_farm_ip_address_or_hostname>; export WANDB_SYNC_FARM_PORT=<sync_farm_port>to set the hostname and port of the sync farm. You can also put these commands in your.bashrcfile. These variables will be used by the agent.
In the code of your job:
- Import the agent:
from wandb_offline_sync import agent - After calling
wandb.init(...), initialize the agent with:agent.init(...). You can pass afrequencyargument to set the minimum time between syncs (in seconds). For example, if you setfrequency=60, the agent will request a sync at most once per minute. The default value for the frequency is 5 minutes.
Each time wandb.log(...) is called, the agent will check if the minimum time interval (given by the frequency) between syncs has passed, and if so, it will request a sync to the farm.
When wandb.finish() is called, the agent will wait some seconds (default is 30) to ensure that previous syncs have been completed, and then it will request a final sync to the farm.
Notes
- If you run the farm with
--verbose, it may happen that in the first minutes of the run, the output shows the error.wandb file is empty, and the run is not synced to the wandb server. Don't worry, after some minutes the data of the run will be available and will be synced to the wandb server.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wandb_offline_sync-0.0.14.tar.gz.
File metadata
- Download URL: wandb_offline_sync-0.0.14.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ee46317d0f88b8f395ba419f45e348bc20080752d01ae1e3a72192a67bb1363
|
|
| MD5 |
fa816c006d4edccfe6581ee6662d717f
|
|
| BLAKE2b-256 |
dc8de666fed17a79b2f770aa14b295ddc5c985fb78bd1964b35682ae1995f7ea
|
File details
Details for the file wandb_offline_sync-0.0.14-py3-none-any.whl.
File metadata
- Download URL: wandb_offline_sync-0.0.14-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b856d9960d175ca0bd825d99bbbf54efee02cb22e10a65e8a924dcf02ca6728c
|
|
| MD5 |
1c5d0bc133f9152faa14b093ca1dcc20
|
|
| BLAKE2b-256 |
77b6ab97189a611e35878c03dd2b6381b20480e18465c0398fc3bcdb90e47336
|