Local-first homelab health monitoring with optional AI-agent integration and approval-gated repair
Project description
Homelab Guardian
Homelab Guardian is a local-first homelab operations assistant built around read-only infrastructure collectors and local reports.
It is not another dashboard. It generates plain-English health reports that explain:
- what is broken
- what changed
- what matters
- what the safest next step is
Guardian v0.2 is the Daily Homelab Doctor plus dashboard and alerts: a packaged CLI that collects optional read-only signals, stores local snapshots, writes Markdown reports, serves a read-only web view, and can send optional flap-damped notifications.
Core principles
- Local-first
- Read-only against homelab infrastructure by default
- Any action is opt-in, whitelisted, reversible-minded, and human-approved — Guardian proposes; a person approves; Guardian verifies. (See Approval-gated repair.)
- Never raw shell, never an AI-generated command — actions are named, parameterized argv only; the model is never the authority
- No cloud dependency required
- Useful without AI
- Secrets stay local
- Every integration is optional
- Collectors degrade gracefully when unavailable or unconfigured
Guardian does write its own local runtime state: Markdown reports, SQLite scan snapshots, acknowledgments, alert state, and retention cleanup. Optional integrations may send outbound requests for Telegram notifications or AI briefings when explicitly enabled.
Quick start
git clone <repo-url>
cd homelab-guardian
python -m venv .venv
. .venv/bin/activate
pip install -e .
guardian init # answer a few questions; optionally scans your LAN
guardian doctor # preflight check
guardian # first scan -> reports/latest.md
guardian init can probe your local network (read-only TCP connects, nothing
is sent to any device) and recognizes common homelab services — Home
Assistant, Proxmox, Pi-hole/AdGuard, Portainer, Plex, Jellyfin, Synology,
QNAP, Uptime Kuma, and more — then writes a working config.yaml for you.
Smart-speaker false positives (Google Cast devices) are fingerprinted and
filtered out automatically.
Manual first-run steps
cp config.example.yaml config.yaml
mkdir -p data reports
Edit config.yaml locally. Do not commit it.
For Docker inventory, enable the Docker collector in config.yaml only on a Docker host or when using the socket proxy overlay:
collectors:
docker:
enabled: true
socket_url: unix://var/run/docker.sock
exclude_containers:
- "homelab-guardian*"
Direct Python run
Use this mode for development or for hosts where Python already has access to the paths and services you want to inspect.
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python -m homelab_guardian.main --config config.yaml
Safe example run without private services:
python -m homelab_guardian.main --config config.example.yaml
Preflight check:
python -m homelab_guardian.main doctor --config config.yaml
Equivalent form:
python -m homelab_guardian.main --config config.yaml --doctor
Prebuilt Docker image
Multi-arch images (amd64, arm64 — Raspberry Pi friendly) are published to GitHub Container Registry on every main-branch push:
docker pull ghcr.io/spezzuti/homelab-guardian:latest
Notes for containerized runs:
- The systemd collector needs the host's systemd; run Guardian directly on the host if you want service monitoring.
- The Bitwarden secrets provider needs the
bwsCLI, which is not in the image; in containers, inject secrets as environment variables instead (env_file orbws run -- docker compose ...).
Docker Compose run
Preferred install path on a Docker host:
cp config.example.yaml config.yaml
mkdir -p data reports
docker compose run --rm homelab-guardian
The default Compose file mounts:
./config.yaml:/app/config.yaml:ro./data:/app/data./reports:/app/reports/var/run/docker.sock:/var/run/docker.sock:ro
Guardian writes:
- report:
./reports/latest.md - SQLite snapshots:
./data/guardian.sqlite
Inspect the latest report:
sed -n '1,220p' reports/latest.md
Or open reports/latest.md in your editor.
Docker socket warning
Mounting /var/run/docker.sock matters because the Docker collector must ask the Docker daemon for container metadata: status, health, restart count, ports, mounts, volumes, and Compose labels.
The Docker socket is powerful. Even when mounted :ro, the Docker API can expose sensitive host/container metadata, and socket access is often equivalent to broad control of Docker. Guardian only performs read-oriented SDK calls, but the socket itself should still be treated as privileged.
If /var/run/docker.sock is missing:
- You are probably not on a Docker host, or
- Guardian is running in a container without the socket mounted, or
- Docker Desktop / rootless Docker uses a different socket path.
Safest next step:
- Run
python -m homelab_guardian.main doctor --config config.yaml. - Confirm the host actually runs Docker.
- If running in Docker Compose, confirm the socket mount exists.
- If you do not want Docker inventory on this machine, disable
collectors.docker.enabled.
Safer socket proxy mode
A safer alternative to direct socket mounting is the optional socket proxy Compose file:
docker compose -f docker-compose.socket-proxy.yml run --rm homelab-guardian
This starts docker-socket-proxy and sets:
DOCKER_HOST=tcp://docker-socket-proxy:2375
The proxy exposes only selected read-oriented Docker API areas where possible and keeps write methods disabled. This reduces blast radius compared with mounting the raw socket directly into Guardian. It is still Docker daemon access, so use it intentionally.
Configuration
Start from config.example.yaml. Do not commit config.yaml, .env, API tokens, SSH keys, databases, generated reports, or machine-specific credentials.
Home Assistant access is read-only and uses an environment variable for the token. The example Compose files load local .env values into the container and pass HOMEASSISTANT_TOKEN through to Guardian.
cp .env.example .env
# Edit .env locally. Never commit it.
Deployment modes
Run directly on a Docker host
Install Python and run Guardian on the same host that runs Docker. Enable the Docker collector only if /var/run/docker.sock exists and the user running Guardian can read Docker metadata.
Run via Docker Compose with Docker socket mounted
Run Guardian as a one-shot container with local config.yaml, data, and reports bind mounts. This is the preferred MVP install path for Docker hosts.
Run via Docker Compose with socket proxy
Use docker-compose.socket-proxy.yml to route Docker SDK calls through docker-socket-proxy instead of giving Guardian the raw socket.
Run without Docker
Guardian is still useful without Docker. Leave the Docker collector disabled and use any combination of:
- DNS checks
- TCP checks
- HTTP checks
- local backup path checks
- Home Assistant API checks
Future: remote collectors
Future versions may support remote collectors for Docker hosts, NAS systems, Home Assistant, and backup locations. The current MVP is local-only: paths and sockets are evaluated from the machine or container running Guardian.
Current collectors
Docker collector
Disabled by default because Docker socket access is sensitive. When enabled, it reads container metadata and reports:
- container name
- image
- status
- health status
- restart count
- exposed/published ports
- mounts, bind paths, and named volumes
- Docker Compose project/service labels
Exited, unhealthy, restarting, or dead containers are surfaced as warnings or critical checks. If Docker is enabled but unavailable, the report shows unknown with the likely cause and safest next step instead of crashing.
Guardian can exclude containers by name pattern. This is useful for ignoring Guardian's own one-shot runtime containers and its socket proxy:
collectors:
docker:
exclude_containers:
- "homelab-guardian*"
Docker Compose container names in this setup normally use hyphens, not underscores, so prefer homelab-guardian* for Guardian runtime exclusions. Excluded containers are skipped from normal container health checks. The Docker inventory summary still reports how many were excluded and which patterns were used.
Home Assistant collector
Disabled by default. When configured with a URL and token environment variable, it performs a read-only GET /api/states request and reports unavailable or unknown entities. It does not call services and does not modify Home Assistant.
Safe setup for local dogfood:
-
In Home Assistant, create a long-lived access token from your user profile.
-
Copy
.env.exampleto.envand put the token there:HOMEASSISTANT_TOKEN=your-token-here
-
In the ignored local
config.yaml, set the Home Assistant URL and enable the collector:collectors: homeassistant: enabled: true url: "http://homeassistant.local:8123" token_env: "HOMEASSISTANT_TOKEN"
-
Run a report:
python -m homelab_guardian.main --config config.yaml
-
If running through Docker Compose, use the same ignored
.envfile andconfig.yaml:docker compose run --rm homelab-guardian
Never commit .env, config.yaml, reports, databases, tokens, or machine-specific credentials.
Network collector
Supports:
- DNS resolution checks
- TCP port checks
- HTTP status checks
- TLS certificate expiry checks (works for self-signed certificates too)
Failures include clear evidence such as hostname, port, expected status, actual status, timeout, and error text.
Backup freshness collector
Checks configured local paths without modifying them. It reports:
- whether the path exists
- latest modified file timestamp
- backup age in hours and days
- warning if the newest file is older than
max_age_days - critical if a required path is missing
- critical if a required directory exists but contains no files
- unknown if an optional directory exists but contains no files
- unknown if backup checks are enabled but no paths are configured yet
If backups.enabled is true and paths: [], Guardian reports unknown because the check is not ready to evaluate anything. That means configuration is incomplete, not that a backup failed. Add backup paths when ready, or set backups.enabled: false until backup monitoring is part of your rollout.
Backup paths are local to the machine or container running Guardian. If Guardian runs in Docker, mount backup locations read-only into the container first.
When the configured path is a file, Guardian uses that file's modified time. When the configured path is a directory, Guardian recursively scans files inside the directory and uses the newest file modified time. Directory modified times are ignored because they can change for reasons that do not prove a backup file is fresh.
Safe backup freshness dogfood
Use a dummy local folder before pointing Guardian at real backup destinations. Do not test against production backup paths until the dummy procedure behaves as expected.
mkdir -p /tmp/homelab-guardian-backup-dogfood
printf 'dummy backup marker\n' > /tmp/homelab-guardian-backup-dogfood/backup-marker.txt
cp config.example.yaml config.yaml
In the ignored local config.yaml, set only the dummy path:
collectors:
backups:
enabled: true
paths:
- id: dummy_backup_dogfood
name: Dummy backup dogfood path
path: /tmp/homelab-guardian-backup-dogfood
max_age_days: 1
required: true
Then run:
python -m homelab_guardian.main --config config.yaml
Expected result: the dummy backup check reports ok while the marker file is fresh. To test stale behavior safely, change max_age_days to 0 or adjust only files inside /tmp/homelab-guardian-backup-dogfood. Never commit config.yaml, generated reports, database files, or the dummy runtime folder.
Web view
guardian serve # http://localhost:8674
guardian serve --interval 900 # appliance mode: scan + serve in one process
guardian serve --host 0.0.0.0 # expose on your LAN (explicit choice)
A read-only page rendered from local scan history: overall status, the AI
briefing when enabled, what changed, every check with its evidence, and
recent scan history with per-scan drill-down. No web framework, no
JavaScript, no write endpoints; binds to localhost unless you say otherwise.
/healthz returns plain ok so other monitors can watch Guardian itself.
Recurring scans
Guardian runs once by default. For continuous monitoring, pass --interval:
python -m homelab_guardian.main --config config.yaml --interval 900
This repeats the scan every 900 seconds. A failed scan is logged and the loop continues. Each scan is compared against the previous snapshot, so the report and any notifications highlight what changed.
For a host install, deploy/homelab-guardian.service is a ready-to-edit
systemd user service. For Docker Compose, run the service with --interval
and restart: unless-stopped instead of one-shot docker compose run.
Secrets providers
Every credential Guardian uses (Home Assistant token, Telegram bot token, AI
API key) is referenced by name in config.yaml and resolved through a secrets
provider. Tokens never live in the config file.
provider: env(default) — names are environment variables, typically from a local.envfile. No extra tooling required.provider: bitwarden— names are secret keys in Bitwarden Secrets Manager, fetched through thebwsCLI with a single machine-account access token (BWS_ACCESS_TOKENin the environment). One token instead of a pile of.enventries, secrets stay centrally managed and rotatable, and environment variables still override when set. If the provider is unavailable, Guardian warns once and degrades to environment-only — a secrets backend outage never breaks a scan.
secrets:
provider: bitwarden
bitwarden:
access_token_env: "BWS_ACCESS_TOKEN"
project_id: "" # optional: restrict to one project
python -m homelab_guardian.main doctor verifies the provider end-to-end and reports how
many secrets are readable. The provider interface is intentionally small, so
additional backends (Vault, 1Password, Infisical) can be added without
touching collectors.
Alternative: bws run -- python -m homelab_guardian.main --config config.yaml injects all
secrets as process environment variables without any Guardian configuration.
AI briefing — bring your own model
Optional and disabled by default. When ai.enabled is true, Guardian sends
only the structured check results and the what-changed diff to a single
OpenAI-compatible chat completions endpoint and places the returned
plain-English briefing at the top of the report.
- Works with OpenRouter, a local Ollama/LM Studio endpoint, or any other OpenAI-compatible server — your model, your key, your choice.
- The model receives structured JSON only. It has no shell, no tools, and no access to your systems, and the prompt forbids suggesting state-changing commands.
- Guardian remains fully functional with this disabled; a failed model call never fails the scan.
ai:
enabled: true
base_url: "http://localhost:11434/v1" # local Ollama example
model: "qwen3:14b"
api_key_env: "GUARDIAN_AI_API_KEY" # leave the env var unset for keyless local endpoints
Note: each config file should point at its own database_path. Scan diffing
compares against the previous snapshot in that database, so two configs
sharing one database will see each other's checks as added/removed noise.
Attach an AI agent (MCP + agent-mode)
Guardian's collectors and the status/summary/evidence/recommended_action
contract are the moat; you can hand that view to any model.
guardian mcpserves Guardian over the Model Context Protocol so an agent (Claude, a local agent, ...) reads your verified homelab state instead of re-deriving it. Read-only by default; optional gated write tools. See docs/mcp.md.- Agent-delivery mode (
notifications.mode: agent) makes Guardian feed each confirmed change to the agent's webhook so the agent is the single voice, with a deterministic Telegram fallback for criticals if the agent is unreachable.
The division of labor: Guardian is the deterministic source of truth and the "reflex" actuator; the agent narrates, reasons, and handles the deep, judgment- heavy fixes Guardian deliberately won't.
Approval-gated repair
Optional, off by default (repair.enabled). Guardian can propose a fix for a
detected problem and, after a human approves it, execute it and verify
recovery — closing the detect → diagnose → propose → approve → repair → verify
loop. The whole point is to do this safely:
- Never raw shell. Only named, whitelisted, parameterized actions, built as argv lists. Targets come from validated check evidence or admin allowlists.
- The agent is never the authority. It can propose and execute, but approval
is human-only (CLI
guardian repair approve, or the dashboard/repairspage). Destructive actions can never auto-approve. - Built-ins: restart a watched systemd unit or container; reclaim disk
(
docker_prune/journal_vacuum/apt_clean/prune_dir, with read-only previews and a backup interlock). Everything is audited and loop-guarded.
Design and threat model: docs/repair.md and docs/repair-reclaim.md.
Telegram notifications
Optional and disabled by default. Configure under notifications.telegram in
config.yaml with a bot token and chat id provided through environment
variables (see config.example.yaml). send_on: changes is the recommended
mode: you only get a message when something actually changed since the last
scan.
Disk space collector
Reads usage on configured mounts (or the drive Guardian runs on when no paths are set) with warning/critical percent thresholds. Disk-full is the most common silent homelab failure; this is the check that catches it early.
systemd collector
Sweeps the system (and optionally user) service manager for failed units and
units stuck in a restart loop — the activating/auto-restart state that
never reaches "failed" and hides exactly the breakage that matters. Specific
units can be watched individually with state and restart-count evidence.
Flap damping
One wifi blip should not page you. With notifications.telegram.confirm_scans: 2,
a status change must hold for two consecutive scans before Guardian announces
it — in both directions, so recoveries are confirmed too. A check that flaps
up and down never triggers a message. The report and web view always show the
live state; damping only gates notifications.
Acknowledging known issues
Chronic problems train you to ignore alerts. Acknowledge a check to mute it without losing sight of it:
guardian ack ha_unavailable_entities --note "MQTT bridge down, part ordered" --days 14
guardian ack # list current acknowledgments
guardian unack ha_unavailable_entities
An acknowledged check keeps its real status but is excluded from the overall
status, change detection, and notifications. It appears in a collapsed
"Acknowledged" section of the report and web view, with your note, so it
stays visible without drowning the signal. Acknowledgments can expire
automatically (--days, --until); check ids are shown in reports and in
the web view's evidence blocks.
Report layout
The Markdown report includes:
- overall status
- summary counts
- what changed since the previous scan (regressions, improvements, new and removed checks)
- critical issues first
- warnings second
- unknowns third
- OK checks last, collapsed to names when there are many
- recommended actions and JSON evidence for each non-collapsed check
Safety notes
Homelab Guardian's safety boundary has four parts:
- Collectors are read-only. Detection never modifies services, containers, DNS, Home Assistant entities, backup contents, systemd units, certificates, disks, or remote hosts.
- Local runtime state is writable. Guardian writes reports, SQLite scan snapshots, acknowledgments, alert state, and optional retention pruning under the configured report/database paths.
- Outbound integrations are opt-in. Telegram notifications and AI briefings send structured status data only when explicitly enabled.
- Repair is opt-in and human-gated. With
repair.enabled(off by default), Guardian may propose a whitelisted, parameterized fix and execute it only after a human approves the specific proposal — never raw shell, never an AI-generated command, always followed by a verify and an audit record. Destructive actions can never auto-approve. See Approval-gated repair and docs/repair.md.
Recommended actions in reports are diagnostic next steps for the operator; nothing executes automatically without the approval flow above.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file homelab_guardian-0.3.2.tar.gz.
File metadata
- Download URL: homelab_guardian-0.3.2.tar.gz
- Upload date:
- Size: 151.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d16e3dccd921959ed2817ed544178b146f653117165d433ffed02955b3535cdc
|
|
| MD5 |
793e5d71eff5dc1451122b928578c6db
|
|
| BLAKE2b-256 |
aff339362c0308d97a2665991029107738d273b2c78ae01c4cac8294b18db8dc
|
File details
Details for the file homelab_guardian-0.3.2-py3-none-any.whl.
File metadata
- Download URL: homelab_guardian-0.3.2-py3-none-any.whl
- Upload date:
- Size: 128.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa73545c442878a41bc3a1ed561b1aadec8c5ee3c3598c95c942a219ae722adc
|
|
| MD5 |
aa07369abc08d9d883787789018afe67
|
|
| BLAKE2b-256 |
abddf4baa7c976ad4c1358a0404e5122759046fa703c3415d8ed129b5395c4aa
|