Fleet health monitoring — track health across a fleet of agents
Project description
fleet-health-monitor — Fleet Health Daemon
Continuous health monitoring across the fleet. Node health tracking, threshold alerting, watchdog timers, live dashboards.
What This Gives You
- Node health tracking — per-agent health status (HEALTHY, DEGRADED, UNHEALTHY, OFFLINE)
- Threshold configuration — configurable alert thresholds for response time, error rate, uptime
- Watchdog timers — detect stuck or unresponsive agents
- Fleet aggregation — roll up individual node health into fleet-wide status
- Live dashboard — real-time fleet health visualization
Quick Start
pip install fleet-health-monitor
from fleet_health_monitor import FleetHealth, NodeHealth, Watchdog, ThresholdConfig
# Configure thresholds
thresholds = ThresholdConfig(
max_response_time_ms=5000,
max_error_rate=0.1,
min_uptime_pct=99.0,
)
# Track node health
fleet = FleetHealth()
fleet.register(NodeHealth(agent_id="agent-1", thresholds=thresholds))
fleet.register(NodeHealth(agent_id="agent-2", thresholds=thresholds))
# Record metrics
fleet.record("agent-1", response_time_ms=120, success=True)
fleet.record("agent-2", response_time_ms=8500, success=False)
# Check fleet status
status = fleet.status()
print(status.healthy) # 1
print(status.degraded) # 1
print(status.overall) # DEGRADED
# Start watchdog
watchdog = Watchdog(fleet=fleet, check_interval_seconds=30)
watchdog.start()
API Reference
NodeHealth(agent_id, thresholds) — record(response_time_ms, success), status
HealthStatus — HEALTHY, DEGRADED, UNHEALTHY, OFFLINE
ThresholdConfig — max_response_time_ms, max_error_rate, min_uptime_pct
FleetHealth — register(node), record(agent_id, ...), status() → FleetStatus
Watchdog(fleet, check_interval_seconds) — Continuous monitoring loop
FleetDashboard — Real-time visualization
How It Fits
- OpenConstruct Documentation — ecosystem-wide docs and guides
The system-level health daemon for the SuperInstance fleet. Complements agent-therapy (behavioral health) with infrastructure-level monitoring.
- cocapn-health-rs — Rust health checker (TCP probing)
- agent-therapy — Behavioral health
- cicd-agent — Triggers health checks post-deploy
Testing
pytest tests/
Installation
pip install fleet-health-monitor
Python 3.10+. MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file si_fleet_health_monitor-0.1.0.tar.gz.
File metadata
- Download URL: si_fleet_health_monitor-0.1.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca1a9c08229a8d5f834d4643a4a5b85bfbf046461e0526f3d2644e3fd2537ca2
|
|
| MD5 |
99417e81d06f0c32c774cac6cf72a7d5
|
|
| BLAKE2b-256 |
992895a4e67c26ee9dd06016194e8220f1d86e72258489c97a2980ef0636a0fc
|
File details
Details for the file si_fleet_health_monitor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: si_fleet_health_monitor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeeb8f603589bd5b3c9a1212268ecb57e6ade1c1b3635f69e2641dab16642bac
|
|
| MD5 |
ff460eefca4928ff4bf491a0a84ae0dc
|
|
| BLAKE2b-256 |
c6ef748630b30bc9884575ef912c8ba26ab7d0ee8d48b2e64e121692eb72df4b
|