Skip to main content

Framework-agnostic monitoring toolkit for federated and distributed ML

Project description

HiveWatch

hivewatch

hivewatch is a framework-agnostic monitoring toolkit for federated and distributed machine learning workloads. It provides a consistent interface for logging client updates, round summaries, and map-ready metadata across local experiments and larger deployments.

Installation

hivewatch requires Python 3.8 or later.

pip install -e .                 # core package
pip install -e ".[wandb]"        # Weights & Biases integration
pip install -e ".[mlflow]"       # MLflow integration
pip install -e ".[wandb,mlflow]" # both integrations
pip install -e ".[all]"          # all optional dependencies

Quickstart

import hivewatch
from hivewatch.emitters import WandbEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[WandbEmitter(project="my-fl-project")],
)

for round_num in range(num_rounds):
    hivewatch.round_start(round_num)

    for client_id, metadata in client_results.items():
        hivewatch.log_client_update(
            client_id=client_id,
            round=round_num,
            **metadata,
        )

    hivewatch.log_round(
        round=round_num,
        global_accuracy=agg_accuracy,
        global_loss=agg_loss,
    )

hivewatch.finish()

Emitters

hivewatch uses a pluggable emitter model. Create one or more emitters and pass them to hivewatch.init() to send the same run data to multiple destinations.

Local map and deferred map metadata

from hivewatch.emitters import SSEEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[SSEEmitter(port=7070, serve_map=False)],
)

SSEEmitter persists both of the following artifacts:

  • runs/<run_id>.jsonl for the complete event history
  • runs/<run_id>.map.json for map-ready metadata that can be loaded directly later

Serve the dashboard separately:

hivewatch map run --runs-dir runs --port 7070

Open one specific saved run in static mode:

hivewatch map run --runs-dir runs --run-id run-abc123

The bundled examples/hivewatch_map.html viewer loads map metadata first and falls back to the JSONL-derived event history for older runs. This keeps local development and later replay workflows compatible with the same viewer.

Package layout

The source tree groups related functionality into focused areas:

  • src/hivewatch/map/
    • metadata.py for map metadata assembly and event-to-round transformations
    • server.py for the local map/dashboard HTTP server
  • src/hivewatch/geo/
    • utils.py for client-side location resolution helpers

This keeps map-related files together and avoids requiring users to maintain example-local geo helper files or server-side peer inspection patches.

Weights & Biases

from hivewatch.emitters import WandbEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[WandbEmitter(project="my-fl-project")],
)

MLflow

from hivewatch.emitters import MLflowEmitter

# Local tracking directory (MLflow default)
hivewatch.init(emitters=[MLflowEmitter(experiment="my-fl-project")])

# Remote tracking server
hivewatch.init(emitters=[MLflowEmitter(
    tracking_uri="http://localhost:5000",
    experiment="my-fl-project",
)])

# MLflow system metrics
hivewatch.init(emitters=[MLflowEmitter(
    experiment="my-fl-project",
    mlflow_system_metrics=True,
    system_metrics_sampling_interval=5,
)])

Start an MLflow server:

mlflow server --host 0.0.0.0 --port 5000

To use a custom storage directory:

mlflow server \
  --host 0.0.0.0 \
  --port 5000 \
  --backend-store-uri ./my_custom_dir \
  --default-artifact-root ./my_custom_dir/artifacts

The MLflow UI is then available at http://localhost:5000.

Multiple emitters

from hivewatch.emitters import MLflowEmitter, WandbEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[
        WandbEmitter(project="my-fl-project"),
        MLflowEmitter(experiment="my-fl-project"),
    ],
)

Custom emitters

class MyEmitter:
    def on_init(self, run_id, algorithm, config): ...
    def on_round(self, summary, clients): ...
    def on_client_update(self, client): ...
    def finish(self): ...

hivewatch.init(emitters=[MyEmitter()])

Metadata Contract

hivewatch defines the keys it understands, but it preserves unknown keys so applications can attach additional metadata without losing information.

Field Type Description
client_id str Client identifier
round int Current global round
local_accuracy float Accuracy after local training
local_loss float Loss after local training
num_samples int Local dataset size
gradient_norm float L2 norm of local gradients
bytes_sent int Bytes uploaded to the server
train_time_sec float Local training wall-clock time
cpu_pct float CPU utilization percentage
ram_mb float Memory usage in MB
gpu_util_pct float GPU utilization percentage
lat / lng / country float/str Client location metadata for map visualization
base_round int For asynchronous FL, staleness is round - base_round

Logged Metrics

Weights & Biases

Metric Description
round/accuracy, round/loss Global model performance per round
round/participation_rate Completed clients divided by selected clients
round/num_stragglers Number of stragglers
round/duration_sec Wall-clock time per round
comm/total_bytes_mb Total upload and download volume
comm/bytes_per_client_mb Per-client communication cost
agg/gradient_divergence Standard deviation of per-client gradient norms
agg/aggregation_time_sec Server-side aggregation time
client/<id>/accuracy Per-client accuracy
client/<id>/gradient_norm Per-client gradient norm
client/<id>/staleness Rounds behind the current global model in async FL
client/<id>/bytes_sent_mb Per-client upload size
client/<id>/train_time_sec Per-client training time
sys/<id>/cpu_pct Per-client CPU utilization
sys/<id>/ram_mb Per-client RAM usage
event/client_dropout Dropout counter
event/comm_failure Communication failure counter

All metrics use round as the x-axis through wandb.define_metric().

MLflow

MLflow records the same metrics. Per-client metrics use dot notation such as client.<id>.accuracy instead of slash notation because of MLflow metric naming conventions. Hyperparameters are logged once as MLflow parameters, and model checkpoints are stored as versioned MLflow artifacts.

Architecture

FL Clients
  └── return metadata dict
        │  (gRPC / HTTP / sockets / others; hivewatch does not depend on the transport layer)
        ▼
FL Server
  └── receives metadata and calls hivewatch:
        hivewatch.round_start(round)
        hivewatch.log_client_update(client_id, round, **metadata)
        hivewatch.log_round(round, global_accuracy, global_loss)
        │
        ▼
hivewatch
  ├── WandbEmitter  →  wandb.ai dashboard
  └── MLflowEmitter →  MLflow UI (localhost:5000)

hivewatch does not depend on a specific transport layer or FL framework. Applications bridge their training framework to hivewatch in the same way they would bridge it to another experiment tracking backend.

For map visualization, the storage contract includes a standalone metadata artifact in addition to the raw event log. This supports:

  • local CLI runs that immediately launch or serve a map
  • local or remote services that persist metadata for later display
  • future deployments that store metadata in object storage and load it in a separate web tier

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hivewatch-0.2.0.dev0.tar.gz (326.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hivewatch-0.2.0.dev0-py3-none-any.whl (329.2 kB view details)

Uploaded Python 3

File details

Details for the file hivewatch-0.2.0.dev0.tar.gz.

File metadata

  • Download URL: hivewatch-0.2.0.dev0.tar.gz
  • Upload date:
  • Size: 326.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hivewatch-0.2.0.dev0.tar.gz
Algorithm Hash digest
SHA256 17547ed0ac2a4cad21610bd9df7372f9dd58991d56502bb277f633df3dbb3a58
MD5 9eeefa50947b57e36eac7bb44794bab4
BLAKE2b-256 7394f1ae653dbf7a5b5b80614c77c6b1e6b7ea7ece968798a3ccd1d02b2371ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for hivewatch-0.2.0.dev0.tar.gz:

Publisher: pre-release.yml on APPFL/hivewatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hivewatch-0.2.0.dev0-py3-none-any.whl.

File metadata

  • Download URL: hivewatch-0.2.0.dev0-py3-none-any.whl
  • Upload date:
  • Size: 329.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hivewatch-0.2.0.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a05842201d547284e3a6020721144932e15f79db34d5def650d1fe3c2b3e009
MD5 38470fd1a353b113bb431d2ed7893122
BLAKE2b-256 48c13db4db1367373fbaa810272530a79a65d80401eda9725a0365b4b06951a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for hivewatch-0.2.0.dev0-py3-none-any.whl:

Publisher: pre-release.yml on APPFL/hivewatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page