Framework-agnostic monitoring toolkit for federated and distributed ML

Project description

HiveWatch

hivewatch

hivewatch is a framework-agnostic monitoring toolkit for federated and distributed machine learning workloads. It provides a consistent interface for logging client updates, round summaries, and map-ready metadata across local experiments and larger deployments.

Installation

hivewatch requires Python 3.8 or later.

pip install -e .                 # core package
pip install -e ".[wandb]"        # Weights & Biases integration
pip install -e ".[mlflow]"       # MLflow integration
pip install -e ".[wandb,mlflow]" # both integrations
pip install -e ".[all]"          # all optional dependencies

Quickstart

import hivewatch
from hivewatch.emitters import WandbEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[WandbEmitter(project="my-fl-project")],
)

for round_num in range(num_rounds):
    hivewatch.round_start(round_num)

    for client_id, metadata in client_results.items():
        hivewatch.log_client_update(
            client_id=client_id,
            round=round_num,
            **metadata,
        )

    hivewatch.log_round(
        round=round_num,
        global_accuracy=agg_accuracy,
        global_loss=agg_loss,
    )

hivewatch.finish()

Emitters

hivewatch uses a pluggable emitter model. Create one or more emitters and pass them to hivewatch.init() to send the same run data to multiple destinations.

Local map and deferred map metadata

from hivewatch.emitters import SSEEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[SSEEmitter(port=7070, serve_map=False)],
)

SSEEmitter persists both of the following artifacts:

runs/<run_id>.jsonl for the complete event history
runs/<run_id>.map.json for map-ready metadata that can be loaded directly later

Serve the dashboard separately:

hivewatch map run --runs-dir runs --port 7070

Open one specific saved run in static mode:

hivewatch map run --runs-dir runs --run-id run-abc123

The bundled examples/hivewatch_map.html viewer loads map metadata first and falls back to the JSONL-derived event history for older runs. This keeps local development and later replay workflows compatible with the same viewer.

Package layout

The source tree groups related functionality into focused areas:

src/hivewatch/map/
- metadata.py for map metadata assembly and event-to-round transformations
- server.py for the local map/dashboard HTTP server
src/hivewatch/geo/
- utils.py for client-side location resolution helpers

This keeps map-related files together and avoids requiring users to maintain example-local geo helper files or server-side peer inspection patches.

Weights & Biases

from hivewatch.emitters import WandbEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[WandbEmitter(project="my-fl-project")],
)

MLflow

from hivewatch.emitters import MLflowEmitter

# Local tracking directory (MLflow default)
hivewatch.init(emitters=[MLflowEmitter(experiment="my-fl-project")])

# Remote tracking server
hivewatch.init(emitters=[MLflowEmitter(
    tracking_uri="http://localhost:5000",
    experiment="my-fl-project",
)])

# MLflow system metrics
hivewatch.init(emitters=[MLflowEmitter(
    experiment="my-fl-project",
    mlflow_system_metrics=True,
    system_metrics_sampling_interval=5,
)])

Start an MLflow server:

mlflow server --host 0.0.0.0 --port 5000

To use a custom storage directory:

mlflow server \
  --host 0.0.0.0 \
  --port 5000 \
  --backend-store-uri ./my_custom_dir \
  --default-artifact-root ./my_custom_dir/artifacts

The MLflow UI is then available at http://localhost:5000.

Multiple emitters

from hivewatch.emitters import MLflowEmitter, WandbEmitter

hivewatch.init(
    algorithm="FedAvg",
    emitters=[
        WandbEmitter(project="my-fl-project"),
        MLflowEmitter(experiment="my-fl-project"),
    ],
)

Custom emitters

class MyEmitter:
    def on_init(self, run_id, algorithm, config): ...
    def on_round(self, summary, clients): ...
    def on_client_update(self, client): ...
    def finish(self): ...

hivewatch.init(emitters=[MyEmitter()])

Metadata Contract

hivewatch defines the keys it understands, but it preserves unknown keys so applications can attach additional metadata without losing information.

Field	Type	Description
`client_id`	str	Client identifier
`round`	int	Current global round
`local_accuracy`	float	Accuracy after local training
`local_loss`	float	Loss after local training
`num_samples`	int	Local dataset size
`gradient_norm`	float	L2 norm of local gradients
`bytes_sent`	int	Bytes uploaded to the server
`train_time_sec`	float	Local training wall-clock time
`cpu_pct`	float	CPU utilization percentage
`ram_mb`	float	Memory usage in MB
`gpu_util_pct`	float	GPU utilization percentage
`lat` / `lng` / `country`	float/str	Client location metadata for map visualization
`base_round`	int	For asynchronous FL, staleness is `round - base_round`

Logged Metrics

Weights & Biases

Metric	Description
`round/accuracy`, `round/loss`	Global model performance per round
`round/participation_rate`	Completed clients divided by selected clients
`round/num_stragglers`	Number of stragglers
`round/duration_sec`	Wall-clock time per round
`comm/total_bytes_mb`	Total upload and download volume
`comm/bytes_per_client_mb`	Per-client communication cost
`agg/gradient_divergence`	Standard deviation of per-client gradient norms
`agg/aggregation_time_sec`	Server-side aggregation time
`client/<id>/accuracy`	Per-client accuracy
`client/<id>/gradient_norm`	Per-client gradient norm
`client/<id>/staleness`	Rounds behind the current global model in async FL
`client/<id>/bytes_sent_mb`	Per-client upload size
`client/<id>/train_time_sec`	Per-client training time
`sys/<id>/cpu_pct`	Per-client CPU utilization
`sys/<id>/ram_mb`	Per-client RAM usage
`event/client_dropout`	Dropout counter
`event/comm_failure`	Communication failure counter

All metrics use round as the x-axis through wandb.define_metric().

MLflow

MLflow records the same metrics. Per-client metrics use dot notation such as client.<id>.accuracy instead of slash notation because of MLflow metric naming conventions. Hyperparameters are logged once as MLflow parameters, and model checkpoints are stored as versioned MLflow artifacts.

Architecture

FL Clients
  └── return metadata dict
        │  (gRPC / HTTP / sockets / others; hivewatch does not depend on the transport layer)
        ▼
FL Server
  └── receives metadata and calls hivewatch:
        hivewatch.round_start(round)
        hivewatch.log_client_update(client_id, round, **metadata)
        hivewatch.log_round(round, global_accuracy, global_loss)
        │
        ▼
hivewatch
  ├── WandbEmitter  →  wandb.ai dashboard
  └── MLflowEmitter →  MLflow UI (localhost:5000)

hivewatch does not depend on a specific transport layer or FL framework. Applications bridge their training framework to hivewatch in the same way they would bridge it to another experiment tracking backend.

For map visualization, the storage contract includes a standalone metadata artifact in addition to the raw event log. This supports:

local CLI runs that immediately launch or serve a map
local or remote services that persist metadata for later display
future deployments that store metadata in object storage and load it in a separate web tier

Project details

Release history Release notifications | RSS feed

0.2.1

May 10, 2026

0.2.0

Apr 28, 2026

0.2.0.dev1 pre-release

Apr 23, 2026

This version

0.2.0.dev0 pre-release

Apr 22, 2026

0.1.0

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hivewatch-0.2.0.dev0.tar.gz (326.9 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hivewatch-0.2.0.dev0-py3-none-any.whl (329.2 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file hivewatch-0.2.0.dev0.tar.gz.

File metadata

Download URL: hivewatch-0.2.0.dev0.tar.gz
Upload date: Apr 22, 2026
Size: 326.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hivewatch-0.2.0.dev0.tar.gz
Algorithm	Hash digest
SHA256	`17547ed0ac2a4cad21610bd9df7372f9dd58991d56502bb277f633df3dbb3a58`
MD5	`9eeefa50947b57e36eac7bb44794bab4`
BLAKE2b-256	`7394f1ae653dbf7a5b5b80614c77c6b1e6b7ea7ece968798a3ccd1d02b2371ee`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hivewatch-0.2.0.dev0.tar.gz:

Publisher: pre-release.yml on APPFL/hivewatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hivewatch-0.2.0.dev0.tar.gz
- Subject digest: 17547ed0ac2a4cad21610bd9df7372f9dd58991d56502bb277f633df3dbb3a58
- Sigstore transparency entry: 1359543604
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: APPFL/hivewatch@ec1335961aca30d0163a064ab3f0901cf0cbd16e
- Branch / Tag: refs/heads/main
- Owner: https://github.com/APPFL
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pre-release.yml@ec1335961aca30d0163a064ab3f0901cf0cbd16e
- Trigger Event: workflow_dispatch

File details

Details for the file hivewatch-0.2.0.dev0-py3-none-any.whl.

File metadata

Download URL: hivewatch-0.2.0.dev0-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 329.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hivewatch-0.2.0.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a05842201d547284e3a6020721144932e15f79db34d5def650d1fe3c2b3e009`
MD5	`38470fd1a353b113bb431d2ed7893122`
BLAKE2b-256	`48c13db4db1367373fbaa810272530a79a65d80401eda9725a0365b4b06951a2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hivewatch-0.2.0.dev0-py3-none-any.whl:

Publisher: pre-release.yml on APPFL/hivewatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hivewatch-0.2.0.dev0-py3-none-any.whl
- Subject digest: 5a05842201d547284e3a6020721144932e15f79db34d5def650d1fe3c2b3e009
- Sigstore transparency entry: 1359543660
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: APPFL/hivewatch@ec1335961aca30d0163a064ab3f0901cf0cbd16e
- Branch / Tag: refs/heads/main
- Owner: https://github.com/APPFL
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pre-release.yml@ec1335961aca30d0163a064ab3f0901cf0cbd16e
- Trigger Event: workflow_dispatch

hivewatch 0.2.0.dev0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

HiveWatch

Installation

Quickstart

Emitters

Local map and deferred map metadata

Package layout

Weights & Biases

MLflow

Multiple emitters

Custom emitters

Metadata Contract

Logged Metrics

Weights & Biases

MLflow

Architecture

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance