Skip to main content

Local-first homelab health monitoring with optional AI-agent integration and approval-gated repair

Project description

Homelab Guardian

Homelab Guardian is a local-first homelab operations assistant built around read-only infrastructure collectors and local reports.

It is not another dashboard. It generates plain-English health reports that explain:

  • what is broken
  • what changed
  • what matters
  • what the safest next step is

Guardian v0.2 is the Daily Homelab Doctor plus dashboard and alerts: a packaged CLI that collects optional read-only signals, stores local snapshots, writes Markdown reports, serves a read-only web view, and can send optional flap-damped notifications.

Core principles

  • Local-first
  • Read-only against homelab infrastructure by default
  • Any action is opt-in, whitelisted, reversible-minded, and human-approved — Guardian proposes; a person approves; Guardian verifies. (See Approval-gated repair.)
  • Never raw shell, never an AI-generated command — actions are named, parameterized argv only; the model is never the authority
  • No cloud dependency required
  • Useful without AI
  • Secrets stay local
  • Every integration is optional
  • Collectors degrade gracefully when unavailable or unconfigured

Guardian does write its own local runtime state: Markdown reports, SQLite scan snapshots, acknowledgments, alert state, and retention cleanup. Optional integrations may send outbound requests for Telegram notifications or AI briefings when explicitly enabled.

Quick start

git clone <repo-url>
cd homelab-guardian
python -m venv .venv
. .venv/bin/activate
pip install -e .

guardian init      # answer a few questions; optionally scans your LAN
guardian doctor    # preflight check
guardian           # first scan -> reports/latest.md

guardian init can probe your local network (read-only TCP connects, nothing is sent to any device) and recognizes common homelab services — Home Assistant, Proxmox, Pi-hole/AdGuard, Portainer, Plex, Jellyfin, Synology, QNAP, Uptime Kuma, and more — then writes a working config.yaml for you. Smart-speaker false positives (Google Cast devices) are fingerprinted and filtered out automatically.

Manual first-run steps

cp config.example.yaml config.yaml
mkdir -p data reports

Edit config.yaml locally. Do not commit it.

For Docker inventory, enable the Docker collector in config.yaml only on a Docker host or when using the socket proxy overlay:

collectors:
  docker:
    enabled: true
    socket_url: unix://var/run/docker.sock
    exclude_containers:
      - "homelab-guardian*"

Direct Python run

Use this mode for development or for hosts where Python already has access to the paths and services you want to inspect.

python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python -m homelab_guardian.main --config config.yaml

Safe example run without private services:

python -m homelab_guardian.main --config config.example.yaml

Preflight check:

python -m homelab_guardian.main doctor --config config.yaml

Equivalent form:

python -m homelab_guardian.main --config config.yaml --doctor

Prebuilt Docker image

Multi-arch images (amd64, arm64 — Raspberry Pi friendly) are published to GitHub Container Registry on every main-branch push:

docker pull ghcr.io/spezzuti/homelab-guardian:latest

Notes for containerized runs:

  • The systemd collector needs the host's systemd; run Guardian directly on the host if you want service monitoring.
  • The Bitwarden secrets provider needs the bws CLI, which is not in the image; in containers, inject secrets as environment variables instead (env_file or bws run -- docker compose ...).

Docker Compose run

Preferred install path on a Docker host:

cp config.example.yaml config.yaml
mkdir -p data reports
docker compose run --rm homelab-guardian

The default Compose file mounts:

  • ./config.yaml:/app/config.yaml:ro
  • ./data:/app/data
  • ./reports:/app/reports
  • /var/run/docker.sock:/var/run/docker.sock:ro

Guardian writes:

  • report: ./reports/latest.md
  • SQLite snapshots: ./data/guardian.sqlite

Inspect the latest report:

sed -n '1,220p' reports/latest.md

Or open reports/latest.md in your editor.

Docker socket warning

Mounting /var/run/docker.sock matters because the Docker collector must ask the Docker daemon for container metadata: status, health, restart count, ports, mounts, volumes, and Compose labels.

The Docker socket is powerful. Even when mounted :ro, the Docker API can expose sensitive host/container metadata, and socket access is often equivalent to broad control of Docker. Guardian only performs read-oriented SDK calls, but the socket itself should still be treated as privileged.

If /var/run/docker.sock is missing:

  • You are probably not on a Docker host, or
  • Guardian is running in a container without the socket mounted, or
  • Docker Desktop / rootless Docker uses a different socket path.

Safest next step:

  1. Run python -m homelab_guardian.main doctor --config config.yaml.
  2. Confirm the host actually runs Docker.
  3. If running in Docker Compose, confirm the socket mount exists.
  4. If you do not want Docker inventory on this machine, disable collectors.docker.enabled.

Safer socket proxy mode

A safer alternative to direct socket mounting is the optional socket proxy Compose file:

docker compose -f docker-compose.socket-proxy.yml run --rm homelab-guardian

This starts docker-socket-proxy and sets:

DOCKER_HOST=tcp://docker-socket-proxy:2375

The proxy exposes only selected read-oriented Docker API areas where possible and keeps write methods disabled. This reduces blast radius compared with mounting the raw socket directly into Guardian. It is still Docker daemon access, so use it intentionally.

Configuration

Start from config.example.yaml. Do not commit config.yaml, .env, API tokens, SSH keys, databases, generated reports, or machine-specific credentials.

Home Assistant access is read-only and uses an environment variable for the token. The example Compose files load local .env values into the container and pass HOMEASSISTANT_TOKEN through to Guardian.

cp .env.example .env
# Edit .env locally. Never commit it.

Deployment modes

Run directly on a Docker host

Install Python and run Guardian on the same host that runs Docker. Enable the Docker collector only if /var/run/docker.sock exists and the user running Guardian can read Docker metadata.

Run via Docker Compose with Docker socket mounted

Run Guardian as a one-shot container with local config.yaml, data, and reports bind mounts. This is the preferred MVP install path for Docker hosts.

Run via Docker Compose with socket proxy

Use docker-compose.socket-proxy.yml to route Docker SDK calls through docker-socket-proxy instead of giving Guardian the raw socket.

Run without Docker

Guardian is still useful without Docker. Leave the Docker collector disabled and use any combination of:

  • DNS checks
  • TCP checks
  • HTTP checks
  • local backup path checks
  • Home Assistant API checks

Future: remote collectors

Future versions may support remote collectors for Docker hosts, NAS systems, Home Assistant, and backup locations. The current MVP is local-only: paths and sockets are evaluated from the machine or container running Guardian.

Current collectors

Docker collector

Disabled by default because Docker socket access is sensitive. When enabled, it reads container metadata and reports:

  • container name
  • image
  • status
  • health status
  • restart count
  • exposed/published ports
  • mounts, bind paths, and named volumes
  • Docker Compose project/service labels

Exited, unhealthy, restarting, or dead containers are surfaced as warnings or critical checks. If Docker is enabled but unavailable, the report shows unknown with the likely cause and safest next step instead of crashing.

Guardian can exclude containers by name pattern. This is useful for ignoring Guardian's own one-shot runtime containers and its socket proxy:

collectors:
  docker:
    exclude_containers:
      - "homelab-guardian*"

Docker Compose container names in this setup normally use hyphens, not underscores, so prefer homelab-guardian* for Guardian runtime exclusions. Excluded containers are skipped from normal container health checks. The Docker inventory summary still reports how many were excluded and which patterns were used.

Home Assistant collector

Disabled by default. When configured with a URL and token environment variable, it performs a read-only GET /api/states request and reports unavailable or unknown entities. It does not call services and does not modify Home Assistant.

Safe setup for local dogfood:

  1. In Home Assistant, create a long-lived access token from your user profile.

  2. Copy .env.example to .env and put the token there:

    HOMEASSISTANT_TOKEN=your-token-here
    
  3. In the ignored local config.yaml, set the Home Assistant URL and enable the collector:

    collectors:
      homeassistant:
        enabled: true
        url: "http://homeassistant.local:8123"
        token_env: "HOMEASSISTANT_TOKEN"
    
  4. Run a report:

    python -m homelab_guardian.main --config config.yaml
    
  5. If running through Docker Compose, use the same ignored .env file and config.yaml:

    docker compose run --rm homelab-guardian
    

Never commit .env, config.yaml, reports, databases, tokens, or machine-specific credentials.

Network collector

Supports:

  • DNS resolution checks
  • TCP port checks
  • HTTP status checks
  • TLS certificate expiry checks (works for self-signed certificates too)

Failures include clear evidence such as hostname, port, expected status, actual status, timeout, and error text.

Backup freshness collector

Checks configured local paths without modifying them. It reports:

  • whether the path exists
  • latest modified file timestamp
  • backup age in hours and days
  • warning if the newest file is older than max_age_days
  • critical if a required path is missing
  • critical if a required directory exists but contains no files
  • unknown if an optional directory exists but contains no files
  • unknown if backup checks are enabled but no paths are configured yet

If backups.enabled is true and paths: [], Guardian reports unknown because the check is not ready to evaluate anything. That means configuration is incomplete, not that a backup failed. Add backup paths when ready, or set backups.enabled: false until backup monitoring is part of your rollout.

Backup paths are local to the machine or container running Guardian. If Guardian runs in Docker, mount backup locations read-only into the container first.

When the configured path is a file, Guardian uses that file's modified time. When the configured path is a directory, Guardian recursively scans files inside the directory and uses the newest file modified time. Directory modified times are ignored because they can change for reasons that do not prove a backup file is fresh.

Safe backup freshness dogfood

Use a dummy local folder before pointing Guardian at real backup destinations. Do not test against production backup paths until the dummy procedure behaves as expected.

mkdir -p /tmp/homelab-guardian-backup-dogfood
printf 'dummy backup marker\n' > /tmp/homelab-guardian-backup-dogfood/backup-marker.txt
cp config.example.yaml config.yaml

In the ignored local config.yaml, set only the dummy path:

collectors:
  backups:
    enabled: true
    paths:
      - id: dummy_backup_dogfood
        name: Dummy backup dogfood path
        path: /tmp/homelab-guardian-backup-dogfood
        max_age_days: 1
        required: true

Then run:

python -m homelab_guardian.main --config config.yaml

Expected result: the dummy backup check reports ok while the marker file is fresh. To test stale behavior safely, change max_age_days to 0 or adjust only files inside /tmp/homelab-guardian-backup-dogfood. Never commit config.yaml, generated reports, database files, or the dummy runtime folder.

Web view

guardian serve                          # http://localhost:8674
guardian serve --interval 900           # appliance mode: scan + serve in one process
guardian serve --host 0.0.0.0           # expose on your LAN (explicit choice)

A read-only page rendered from local scan history: overall status, the AI briefing when enabled, what changed, every check with its evidence, and recent scan history with per-scan drill-down. No web framework, no JavaScript, no write endpoints; binds to localhost unless you say otherwise. /healthz returns plain ok so other monitors can watch Guardian itself.

Recurring scans

Guardian runs once by default. For continuous monitoring, pass --interval:

python -m homelab_guardian.main --config config.yaml --interval 900

This repeats the scan every 900 seconds. A failed scan is logged and the loop continues. Each scan is compared against the previous snapshot, so the report and any notifications highlight what changed.

For a host install, deploy/homelab-guardian.service is a ready-to-edit systemd user service. For Docker Compose, run the service with --interval and restart: unless-stopped instead of one-shot docker compose run.

Secrets providers

Every credential Guardian uses (Home Assistant token, Telegram bot token, AI API key) is referenced by name in config.yaml and resolved through a secrets provider. Tokens never live in the config file.

  • provider: env (default) — names are environment variables, typically from a local .env file. No extra tooling required.
  • provider: bitwarden — names are secret keys in Bitwarden Secrets Manager, fetched through the bws CLI with a single machine-account access token (BWS_ACCESS_TOKEN in the environment). One token instead of a pile of .env entries, secrets stay centrally managed and rotatable, and environment variables still override when set. If the provider is unavailable, Guardian warns once and degrades to environment-only — a secrets backend outage never breaks a scan.
secrets:
  provider: bitwarden
  bitwarden:
    access_token_env: "BWS_ACCESS_TOKEN"
    project_id: "" # optional: restrict to one project

python -m homelab_guardian.main doctor verifies the provider end-to-end and reports how many secrets are readable. The provider interface is intentionally small, so additional backends (Vault, 1Password, Infisical) can be added without touching collectors.

Alternative: bws run -- python -m homelab_guardian.main --config config.yaml injects all secrets as process environment variables without any Guardian configuration.

AI briefing — bring your own model

Optional and disabled by default. When ai.enabled is true, Guardian sends only the structured check results and the what-changed diff to a single OpenAI-compatible chat completions endpoint and places the returned plain-English briefing at the top of the report.

  • Works with OpenRouter, a local Ollama/LM Studio endpoint, or any other OpenAI-compatible server — your model, your key, your choice.
  • The model receives structured JSON only. It has no shell, no tools, and no access to your systems, and the prompt forbids suggesting state-changing commands.
  • Guardian remains fully functional with this disabled; a failed model call never fails the scan.
ai:
  enabled: true
  base_url: "http://localhost:11434/v1" # local Ollama example
  model: "qwen3:14b"
  api_key_env: "GUARDIAN_AI_API_KEY" # leave the env var unset for keyless local endpoints

Note: each config file should point at its own database_path. Scan diffing compares against the previous snapshot in that database, so two configs sharing one database will see each other's checks as added/removed noise.

Attach an AI agent (MCP + agent-mode)

Guardian's collectors and the status/summary/evidence/recommended_action contract are the moat; you can hand that view to any model.

  • guardian mcp serves Guardian over the Model Context Protocol so an agent (Claude, a local agent, ...) reads your verified homelab state instead of re-deriving it. Read-only by default; optional gated write tools. See docs/mcp.md.
  • Agent-delivery mode (notifications.mode: agent) makes Guardian feed each confirmed change to the agent's webhook so the agent is the single voice, with a deterministic Telegram fallback for criticals if the agent is unreachable.

The division of labor: Guardian is the deterministic source of truth and the "reflex" actuator; the agent narrates, reasons, and handles the deep, judgment- heavy fixes Guardian deliberately won't.

Approval-gated repair

Optional, off by default (repair.enabled). Guardian can propose a fix for a detected problem and, after a human approves it, execute it and verify recovery — closing the detect → diagnose → propose → approve → repair → verify loop. The whole point is to do this safely:

  • Never raw shell. Only named, whitelisted, parameterized actions, built as argv lists. Targets come from validated check evidence or admin allowlists.
  • The agent is never the authority. It can propose and execute, but approval is human-only (CLI guardian repair approve, or the dashboard /repairs page). Destructive actions can never auto-approve.
  • Built-ins: restart a watched systemd unit or container; reclaim disk (docker_prune / journal_vacuum / apt_clean / prune_dir, with read-only previews and a backup interlock). Everything is audited and loop-guarded.

Design and threat model: docs/repair.md and docs/repair-reclaim.md.

Telegram notifications

Optional and disabled by default. Configure under notifications.telegram in config.yaml with a bot token and chat id provided through environment variables (see config.example.yaml). send_on: changes is the recommended mode: you only get a message when something actually changed since the last scan.

Disk space collector

Reads usage on configured mounts (or the drive Guardian runs on when no paths are set) with warning/critical percent thresholds. Disk-full is the most common silent homelab failure; this is the check that catches it early.

systemd collector

Sweeps the system (and optionally user) service manager for failed units and units stuck in a restart loop — the activating/auto-restart state that never reaches "failed" and hides exactly the breakage that matters. Specific units can be watched individually with state and restart-count evidence.

Flap damping

One wifi blip should not page you. With notifications.telegram.confirm_scans: 2, a status change must hold for two consecutive scans before Guardian announces it — in both directions, so recoveries are confirmed too. A check that flaps up and down never triggers a message. The report and web view always show the live state; damping only gates notifications.

Acknowledging known issues

Chronic problems train you to ignore alerts. Acknowledge a check to mute it without losing sight of it:

guardian ack ha_unavailable_entities --note "MQTT bridge down, part ordered" --days 14
guardian ack          # list current acknowledgments
guardian unack ha_unavailable_entities

An acknowledged check keeps its real status but is excluded from the overall status, change detection, and notifications. It appears in a collapsed "Acknowledged" section of the report and web view, with your note, so it stays visible without drowning the signal. Acknowledgments can expire automatically (--days, --until); check ids are shown in reports and in the web view's evidence blocks.

Report layout

The Markdown report includes:

  • overall status
  • summary counts
  • what changed since the previous scan (regressions, improvements, new and removed checks)
  • critical issues first
  • warnings second
  • unknowns third
  • OK checks last, collapsed to names when there are many
  • recommended actions and JSON evidence for each non-collapsed check

Safety notes

Homelab Guardian's safety boundary has four parts:

  • Collectors are read-only. Detection never modifies services, containers, DNS, Home Assistant entities, backup contents, systemd units, certificates, disks, or remote hosts.
  • Local runtime state is writable. Guardian writes reports, SQLite scan snapshots, acknowledgments, alert state, and optional retention pruning under the configured report/database paths.
  • Outbound integrations are opt-in. Telegram notifications and AI briefings send structured status data only when explicitly enabled.
  • Repair is opt-in and human-gated. With repair.enabled (off by default), Guardian may propose a whitelisted, parameterized fix and execute it only after a human approves the specific proposal — never raw shell, never an AI-generated command, always followed by a verify and an audit record. Destructive actions can never auto-approve. See Approval-gated repair and docs/repair.md.

Recommended actions in reports are diagnostic next steps for the operator; nothing executes automatically without the approval flow above.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

homelab_guardian-0.3.2.tar.gz (151.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

homelab_guardian-0.3.2-py3-none-any.whl (128.0 kB view details)

Uploaded Python 3

File details

Details for the file homelab_guardian-0.3.2.tar.gz.

File metadata

  • Download URL: homelab_guardian-0.3.2.tar.gz
  • Upload date:
  • Size: 151.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for homelab_guardian-0.3.2.tar.gz
Algorithm Hash digest
SHA256 d16e3dccd921959ed2817ed544178b146f653117165d433ffed02955b3535cdc
MD5 793e5d71eff5dc1451122b928578c6db
BLAKE2b-256 aff339362c0308d97a2665991029107738d273b2c78ae01c4cac8294b18db8dc

See more details on using hashes here.

File details

Details for the file homelab_guardian-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for homelab_guardian-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aa73545c442878a41bc3a1ed561b1aadec8c5ee3c3598c95c942a219ae722adc
MD5 aa07369abc08d9d883787789018afe67
BLAKE2b-256 abddf4baa7c976ad4c1358a0404e5122759046fa703c3415d8ed129b5395c4aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page