Skip to main content

Fleet health monitoring — track health across a fleet of agents

Project description

fleet-health-monitor — Fleet Health Daemon

Continuous health monitoring across the fleet. Node health tracking, threshold alerting, watchdog timers, live dashboards.

What This Gives You

  • Node health tracking — per-agent health status (HEALTHY, DEGRADED, UNHEALTHY, OFFLINE)
  • Threshold configuration — configurable alert thresholds for response time, error rate, uptime
  • Watchdog timers — detect stuck or unresponsive agents
  • Fleet aggregation — roll up individual node health into fleet-wide status
  • Live dashboard — real-time fleet health visualization

Quick Start

pip install fleet-health-monitor
from fleet_health_monitor import FleetHealth, NodeHealth, Watchdog, ThresholdConfig

# Configure thresholds
thresholds = ThresholdConfig(
    max_response_time_ms=5000,
    max_error_rate=0.1,
    min_uptime_pct=99.0,
)

# Track node health
fleet = FleetHealth()
fleet.register(NodeHealth(agent_id="agent-1", thresholds=thresholds))
fleet.register(NodeHealth(agent_id="agent-2", thresholds=thresholds))

# Record metrics
fleet.record("agent-1", response_time_ms=120, success=True)
fleet.record("agent-2", response_time_ms=8500, success=False)

# Check fleet status
status = fleet.status()
print(status.healthy)    # 1
print(status.degraded)   # 1
print(status.overall)    # DEGRADED

# Start watchdog
watchdog = Watchdog(fleet=fleet, check_interval_seconds=30)
watchdog.start()

API Reference

NodeHealth(agent_id, thresholds)record(response_time_ms, success), status

HealthStatus — HEALTHY, DEGRADED, UNHEALTHY, OFFLINE

ThresholdConfigmax_response_time_ms, max_error_rate, min_uptime_pct

FleetHealthregister(node), record(agent_id, ...), status() → FleetStatus

Watchdog(fleet, check_interval_seconds) — Continuous monitoring loop

FleetDashboard — Real-time visualization

How It Fits

The system-level health daemon for the SuperInstance fleet. Complements agent-therapy (behavioral health) with infrastructure-level monitoring.

Testing

pytest tests/

Installation

pip install fleet-health-monitor

Python 3.10+. MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

si_fleet_health_monitor-0.1.0.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

si_fleet_health_monitor-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file si_fleet_health_monitor-0.1.0.tar.gz.

File metadata

  • Download URL: si_fleet_health_monitor-0.1.0.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for si_fleet_health_monitor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ca1a9c08229a8d5f834d4643a4a5b85bfbf046461e0526f3d2644e3fd2537ca2
MD5 99417e81d06f0c32c774cac6cf72a7d5
BLAKE2b-256 992895a4e67c26ee9dd06016194e8220f1d86e72258489c97a2980ef0636a0fc

See more details on using hashes here.

File details

Details for the file si_fleet_health_monitor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for si_fleet_health_monitor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eeeb8f603589bd5b3c9a1212268ecb57e6ade1c1b3635f69e2641dab16642bac
MD5 ff460eefca4928ff4bf491a0a84ae0dc
BLAKE2b-256 c6ef748630b30bc9884575ef912c8ba26ab7d0ee8d48b2e64e121692eb72df4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page