Zpool Monitoring Daemon

These details have not been verified by PyPI

Project links

Project description

check_zpools

check_zpools is a production-ready ZFS pool monitoring tool with intelligent alerting and daemon mode. It provides comprehensive health monitoring with configurable thresholds, email notifications, and alert deduplication.

Features

ZFS Pool Monitoring: Real-time health, capacity, error, and scrub status tracking
Intelligent Alerting: Email notifications with deduplication and configurable resend intervals
Daemon Mode: Continuous monitoring with graceful shutdown and error recovery
Rich CLI: Beautiful table output and JSON export via rich-click
Layered Configuration: Flexible config system (defaults → app → host → user → .env → env)
Structured Logging: Rich console output with journald, eventlog, and Graylog/GELF support

Platform Support

Current Status:

Linux/FreeBSD/macOS: Full support with local ZFS pools
Windows: Limited support - ZFS pools are not natively available on Windows
- CLI commands work but require ZFS to be present (e.g., via WSL)
- Future: Remote ZFS monitoring via SSH is planned, which will enable Windows users to monitor remote ZFS servers

Note: The tool is primarily designed for systems running ZFS. Windows support is currently a preparation for future remote monitoring capabilities.

Install - recommended via UV

UV - the ultrafast installer - written in Rust (10–20× faster than pip/poetry)

# recommended Install via uv
pip install --upgrade uv
# Create and activate a virtual environment (optional but recommended)
uv venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# install via uv from PyPI
uv pip install check_zpools

For alternative install paths (pip, pipx, uv, uvx source builds, etc.), see INSTALL.md. All supported methods register the check_zpools command on your PATH.

Python 3.13+ Baseline

The project targets Python 3.13 and newer only.
Runtime dependencies stay on the current stable releases (rich-click>=1.9.3 and lib_cli_exit_tools>=2.0.0) and keeps pytest, ruff, pyright, bandit, build, twine, codecov-cli, pip-audit, textual, and import-linter pinned to their newest majors.
CI workflows exercise GitHub's rolling runner images (ubuntu-latest, macos-latest, windows-latest) and cover CPython 3.13 alongside the latest available 3.x release provided by Actions.

CLI Command Reference

Global Options

All commands support these global options:

Option	Description
`--version`	Show version and exit
`-h, --help`	Show help message and exit
`--traceback` / `--no-traceback`	Show full Python traceback on errors (default: disabled)

Example:

check_zpools --version
check_zpools --help
check_zpools check --traceback  # Show detailed errors

ZFS Monitoring Commands

`check` - One-Shot Pool Health Check

Performs a single check of all ZFS pools against configured thresholds and reports any issues found.

Usage:

check_zpools check [OPTIONS]

Options:

Option	Type	Default	Description
`--format`	`text` \| `json`	`text`	Output format for results

Exit Codes:

0 - All pools healthy (OK)
1 - Warning-level issues detected
2 - Critical issues detected

Examples:

# Check all pools with text output (default)
check_zpools check

# Check all pools with JSON output for scripting
check_zpools check --format json

# Check in a script and handle exit codes
if check_zpools check --format json > /tmp/zfs_status.json; then
  echo "All pools healthy"
else
  echo "Issues detected - check /tmp/zfs_status.json"
fi

JSON Output Format:

{
  "timestamp": "2025-11-16T15:30:00.000000",
  "pools": [
    {
      "name": "rpool",
      "health": "ONLINE",
      "capacity_percent": 45.2
    }
  ],
  "issues": [
    {
      "pool_name": "tank",
      "severity": "WARNING",
      "category": "capacity",
      "message": "Pool capacity at 85%",
      "details": {
        "warning_threshold": 80,
        "current_percent": 85
      }
    }
  ],
  "overall_severity": "WARNING"
}

`daemon` - Continuous Monitoring

Starts the monitoring daemon which periodically checks pools and sends email alerts.

Usage:

check_zpools daemon [OPTIONS]

Options:

Option	Type	Default	Description
`--foreground`	FLAG	`False`	Run in foreground (don't daemonize)

Examples:

# Start daemon in foreground (for systemd or testing)
check_zpools daemon --foreground

# Start daemon in background (manual mode)
check_zpools daemon

# Run with custom check interval (via environment variable)
CHECK_ZPOOLS_DAEMON_CHECK_INTERVAL_SECONDS=600 check_zpools daemon --foreground

Behavior:

Monitors pools at configured intervals (default: 300 seconds / 5 minutes)
Sends email alerts when issues are detected
Suppresses duplicate alerts (default: 24 hour interval)
Sends recovery notifications when issues resolve
Handles SIGTERM/SIGINT for graceful shutdown
Logs to journald when run as systemd service
Comprehensive logging: Each check cycle logs:
- Check cycle number and daemon uptime (e.g., "Check #42, uptime: 2d 5h 30m")
- Overall statistics (pools checked, issues found, severity)
- Detailed metrics for each pool (health, capacity, size, errors, scrub status)

Systemd Usage:

# Use service-install command instead (see below)
sudo check_zpools service-install
sudo systemctl start check_zpools

Systemd Service Management

Note: Systemd service installation is only available on Linux systems with systemd. Not supported on Windows or macOS.

`service-install` - Install Systemd Service

Installs check_zpools as a systemd service for automatic monitoring.

Usage:

sudo check_zpools service-install [OPTIONS]

Options:

Option	Type	Default	Description
`--no-enable`	FLAG	`False`	Don't enable service to start on boot
`--no-start`	FLAG	`False`	Don't start service immediately
`--uvx-version`	TEXT	`None`	Version specifier for uvx installations (e.g., `@latest`, `@1.0.0`)

Examples:

# Install, enable, and start service (recommended)
sudo check_zpools service-install

# Install but don't start immediately
sudo check_zpools service-install --no-start

# Install but don't enable for automatic boot
sudo check_zpools service-install --no-enable

# Install without starting or enabling
sudo check_zpools service-install --no-enable --no-start

# Install with uvx using @latest (auto-updates to latest version)
sudo uvx check_zpools@latest service-install --uvx-version @latest

# Install with uvx pinned to specific version
sudo uvx check_zpools@1.0.0 service-install --uvx-version @1.0.0

What it does:

Creates /etc/systemd/system/check_zpools.service
Detects installation method (pip, venv, uv, uvx) and configures ExecStart accordingly
Enables service to start on boot (unless --no-enable)
Starts service immediately (unless --no-start)
Configures automatic restart on failure
Sets up journald logging

Installation Method Detection: The service installer automatically detects how check_zpools was installed:

pip/pipx: Uses absolute path to executable
Virtual environment: Uses venv path with proper PATH configuration
uv project: Uses uv run check_zpools
uvx: Uses uvx check_zpools (works with temporary cache installations)

Service Configuration: The installed service runs as root with the following properties:

Type: Simple
Restart: On failure
RestartSec: 10 seconds
After: network.target, zfs-mount.service

`service-uninstall` - Remove Systemd Service

Removes the systemd service and optionally stops/disables it.

Usage:

sudo check_zpools service-uninstall [OPTIONS]

Options:

Option	Type	Default	Description
`--no-stop`	FLAG	`False`	Don't stop running service
`--no-disable`	FLAG	`False`	Don't disable service

Examples:

# Uninstall completely (stop, disable, remove)
sudo check_zpools service-uninstall

# Uninstall but leave service running
sudo check_zpools service-uninstall --no-stop

# Uninstall but keep enabled
sudo check_zpools service-uninstall --no-disable

Note: This does not remove cache and state directories:

# To remove state and cache manually:
sudo rm -rf /var/cache/check_zpools /var/lib/check_zpools

`service-status` - Check Service Status

Displays the current status of the check_zpools systemd service.

Usage:

check_zpools service-status

No options.

Example Output:

Service Status:
  Installed: Yes (/etc/systemd/system/check_zpools.service)
  Running:   Yes (active since 2025-11-16 10:30:00)
  Enabled:   Yes (starts on boot)

Systemctl Output:
● check_zpools.service - ZFS Pool Monitoring Daemon
     Loaded: loaded (/etc/systemd/system/check_zpools.service; enabled)
     Active: active (running) since Sat 2025-11-16 10:30:00 CET; 5h ago
   Main PID: 12345 (python3)
      Tasks: 1 (limit: 4915)
     Memory: 28.5M
        CPU: 1.234s
     CGroup: /system.slice/check_zpools.service
             └─12345 /usr/bin/python3 /usr/local/bin/check_zpools daemon --foreground

Configuration Management

`config` - Display Current Configuration

Shows the merged configuration from all sources (defaults, config files, environment variables).

Usage:

check_zpools config [OPTIONS]

Options:

Option	Type	Default	Description
`--format`	`human` \| `json`	`human`	Output format
`--section`	TEXT	None	Show only specific section (e.g., `zfs`, `email`, `daemon`)

Examples:

# Show full configuration (human-readable)
check_zpools config

# Show configuration as JSON
check_zpools config --format json

# Show only ZFS section
check_zpools config --section zfs

# Show only email configuration
check_zpools config --section email

# Export configuration for backup
check_zpools config --format json > backup-config.json

Configuration Precedence:

defaults → app → host → user → .env → environment variables
(lowest)                                      (highest)

Configuration Sources:

Built-in defaults (embedded in package)
App config: /etc/xdg/check_zpools/config.toml (Linux)
Host config: /etc/check_zpools/hosts/$(hostname).toml (Linux)
User config: ~/.config/check_zpools/config.toml (Linux)
.env files (project directory or parents)
Environment variables (CHECK_ZPOOLS_*)

`config-deploy` - Deploy Configuration Files

Creates configuration files in specified locations with default templates.

Usage:

check_zpools config-deploy --target TARGET [OPTIONS]

Options:

Option	Type	Default	Description
`--target`	`app` \| `host` \| `user`	Required	Target configuration layer (can specify multiple)
`--force`	FLAG	`False`	Overwrite existing configuration files

Examples:

# Deploy to user config directory (recommended for first setup)
check_zpools config-deploy --target user

# Deploy to system-wide app config (requires privileges)
sudo check_zpools config-deploy --target app

# Deploy to host-specific config (requires privileges)
sudo check_zpools config-deploy --target host

# Deploy to multiple locations at once
check_zpools config-deploy --target user --target host

# Overwrite existing configuration
check_zpools config-deploy --target user --force

# Deploy app and user configs (app needs sudo)
sudo check_zpools config-deploy --target app --target user

Deployment Paths:

Target	Linux Path	macOS Path	Windows Path
`app`	`/etc/xdg/check_zpools/config.toml`	`/Library/Application Support/check_zpools/config.toml`	`C:\ProgramData\check_zpools\config.toml`
`host`	`/etc/check_zpools/hosts/$(hostname).toml`	`/Library/Application Support/check_zpools/hosts/$(hostname).toml`	`C:\ProgramData\check_zpools\hosts\$(hostname).toml`
`user`	`~/.config/check_zpools/config.toml`	`~/Library/Application Support/check_zpools/config.toml`	`%APPDATA%\check_zpools\config.toml`

Testing & Utilities

`hello` - Verify Installation

Prints "Hello World" to verify the package is properly installed and executable.

Usage:

check_zpools hello

Output:

Hello World

Use Cases:

Verify package installation succeeded
Test CLI entry point is working
Quick smoke test after deployment
Validate PATH configuration for installed command

`fail` - Test Error Handling

Intentionally raises a RuntimeError to test error handling and logging.

Usage:

check_zpools fail

Behavior:

Logs intentional failure at WARNING level
Raises RuntimeError with "Intentional failure for testing"
Exits with non-zero status code
Demonstrates error logging and traceback handling

Use Cases:

Test error logging configuration
Verify exception handling is working
Test monitoring/alerting for failed commands
Validate log aggregation captures errors

Note: This is a development/testing command. Use --traceback flag to see full stack trace.

`send-email` - Advanced Email Testing

Sends a custom email using configured SMTP settings with full control over message content and attachments.

Usage:

check_zpools send-email --to EMAIL --subject SUBJECT [OPTIONS]

Options:

Option	Type	Required	Description
`--to`	TEXT	Yes	Recipient email address (can specify multiple times)
`--subject`	TEXT	Yes	Email subject line
`--body`	TEXT	No	Plain-text email body
`--body-html`	TEXT	No	HTML email body (sent as multipart with plain text)
`--from`	TEXT	No	Override sender address (uses config default if not specified)
`--attachment`	PATH	No	File to attach (can specify multiple times)

Examples:

# Send simple text email
check_zpools send-email \
  --to recipient@example.com \
  --subject "Test Email" \
  --body "Hello from check_zpools CLI"

# Send HTML email with plain text fallback
check_zpools send-email \
  --to admin@example.com \
  --subject "HTML Test" \
  --body "Plain text version" \
  --body-html "<h1>HTML Version</h1><p>Rich formatting</p>"

# Send email with attachments
check_zpools send-email \
  --to ops@example.com \
  --subject "System Report" \
  --body "Please review attached logs" \
  --attachment /var/log/zpool.log \
  --attachment /tmp/report.pdf

# Send to multiple recipients with custom sender
check_zpools send-email \
  --to user1@example.com \
  --to user2@example.com \
  --from "zfs-monitor@example.com" \
  --subject "Alert" \
  --body "Multi-recipient test"

Use Cases:

Test SMTP configuration with custom message content
Verify HTML email rendering in mail clients
Test attachment handling and size limits
Validate multi-recipient delivery
Test custom sender address override

Comparison with send-notification:

send-notification: Simplified interface, notification-style messages
send-email: Full-featured, supports HTML, attachments, custom sender

`send-notification` - Test Email Configuration

Sends a test notification email to verify SMTP settings are working correctly.

Usage:

check_zpools send-notification --to EMAIL --subject SUBJECT --message MESSAGE

Options:

Option	Type	Required	Description
`--to`	TEXT	Yes	Recipient email address (can specify multiple times)
`--subject`	TEXT	Yes	Notification subject line
`--message`	TEXT	Yes	Notification message (plain text)

Examples:

# Send simple test notification
check_zpools send-notification \
  --to admin@example.com \
  --subject "Test Alert" \
  --message "Testing check_zpools email configuration"

# Send to multiple recipients
check_zpools send-notification \
  --to ops@example.com \
  --to dev@example.com \
  --subject "Service Status" \
  --message "All services operational"

# Use environment variable for SMTP password
CHECK_ZPOOLS_EMAIL_SMTP_PASSWORD="app-password" \
check_zpools send-notification \
  --to test@example.com \
  --subject "Test" \
  --message "Testing SMTP authentication"

Use Cases:

Verify SMTP configuration before deploying daemon
Test email delivery to alert recipients
Troubleshoot email authentication issues
Confirm firewall allows SMTP connections

Package Information

`info` - Display Package Information

Shows package version, installation paths, and metadata.

Usage:

check_zpools info

No options.

Example Output:

check_zpools v0.1.0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Package Information:
  Name:        check_zpools
  Version:     0.1.0
  Command:     check_zpools
  Description: Zpool Monitoring Daemon

Paths:
  Package:     /usr/local/lib/python3.13/site-packages/check_zpools
  Config:      /etc/xdg/check_zpools/config.toml
  Cache:       ~/.cache/check_zpools

Project URLs:
  Homepage:    https://github.com/bitranox/check_zpools
  Repository:  https://github.com/bitranox/check_zpools.git
  Issues:      https://github.com/bitranox/check_zpools/issues

Authors:
  bitranox <bitranox@gmail.com>

Configuration

Quick Start Configuration

Create ~/.config/check_zpools/config.toml with the following content:

# ZFS Monitoring Thresholds
[zfs.capacity]
warning_percent = 80   # Alert when pool reaches 80% capacity
critical_percent = 90  # Critical alert at 90%

[zfs.errors]
read_errors_warning = 0      # Alert on any read errors
write_errors_warning = 0     # Alert on any write errors
checksum_errors_warning = 0  # Alert on any checksum errors

[zfs.scrub]
max_age_days = 30  # Warn if scrub not run in 30 days

# Daemon Settings
[daemon]
check_interval_seconds = 300  # Check every 5 minutes
alert_resend_hours = 24       # Resend alerts with unchanged severity after 24 hours
pools_to_monitor = []         # Empty = monitor all pools
send_ok_emails = false        # Don't send emails for OK status
send_recovery_emails = true   # Notify when issues resolve

# Email Alert Recipients
[alerts]
alert_recipients = ["admin@example.com", "ops@example.com"]

# Email SMTP Configuration
[email]
smtp_hosts = ["smtp.gmail.com:587"]
from_address = "zfs-monitor@example.com"
smtp_username = "alerts@example.com"
# IMPORTANT: Set password via environment variable (note: DOUBLE underscore):
# CHECK_ZPOOLS_EMAIL__SMTP_PASSWORD=your-app-password
use_starttls = true
timeout = 30.0

Configuration Sections

`[zfs.capacity]` - Capacity Monitoring

Setting	Type	Default	Description
`warning_percent`	int	80	Capacity percentage that triggers WARNING alert
`critical_percent`	int	90	Capacity percentage that triggers CRITICAL alert

Constraints:

0 < warning_percent < critical_percent <= 100
Defaults are appropriate for most systems

`[zfs.errors]` - Error Monitoring

Setting	Type	Description
`read_errors_warning`	int	Threshold for read error alerts (0 = any error triggers alert)
`write_errors_warning`	int	Threshold for write error alerts
`checksum_errors_warning`	int	Threshold for checksum error alerts

Note: Default of 0 means ANY error triggers an alert. Set higher thresholds only if you understand the implications.

`[zfs.scrub]` - Scrub Monitoring

Setting	Type	Default	Description
`max_age_days`	int	30	Maximum days since last scrub before alerting (0 = disabled)

Recommendation: Monthly scrubs (30 days) are appropriate for most systems. High-value data may require weekly scrubs (7 days).

`[daemon]` - Daemon Behavior

Setting	Type	Default	Description
`check_interval_seconds`	int	300	Seconds between pool checks (300 = 5 minutes)
`alert_resend_hours`	int	24	Hours before resending alerts when severity unchanged
`pools_to_monitor`	list	`[]`	Specific pools to monitor (empty = all pools)
`send_ok_emails`	bool	`false`	Send email when pools are OK
`send_recovery_emails`	bool	`true`	Send email when issues resolve

Notes:

check_interval_seconds: Lower values increase system load
alert_resend_hours: Prevents alert fatigue from persistent unchanged issues. State changes (e.g., WARNING → CRITICAL) trigger immediate alerts regardless of this interval
pools_to_monitor: Example: ["rpool", "tank"]

`[alerts]` - Alert Recipients

Setting	Type	Default	Description
`alert_recipients`	list	`[]`	Email addresses to receive alerts

Example:

[alerts]
alert_recipients = [
  "admin@example.com",
  "ops-team@example.com",
  "monitoring@pagerduty.example.com"
]

`[email]` - SMTP Configuration

Setting	Type	Default	Description
`smtp_hosts`	list	`[]`	SMTP servers in `host:port` format (tried in order)
`from_address`	str	`"noreply@localhost"`	Sender email address
`smtp_username`	str	`None`	SMTP authentication username
`smtp_password`	str	`None`	SMTP authentication password (use env var!)
`use_starttls`	bool	`true`	Enable STARTTLS encryption
`timeout`	float	`30.0`	SMTP connection timeout in seconds

Security Best Practices:

# NEVER put passwords in config files!
# Use environment variables instead (note: DOUBLE underscore between section and key):
export CHECK_ZPOOLS_EMAIL__SMTP_PASSWORD="your-app-password"

# Or use .env file:
echo "CHECK_ZPOOLS_EMAIL__SMTP_PASSWORD=your-app-password" > .env

Environment Variable Overrides

All configuration can be overridden via environment variables using the prefix CHECK_ZPOOLS_:

Format:

CHECK_ZPOOLS_<SECTION>__<SUBSECTION>__<KEY>=value

Note: Use DOUBLE underscore (__) to separate nested sections/keys.

Examples:

# Override ZFS capacity thresholds
export CHECK_ZPOOLS_ZFS__CAPACITY__WARNING_PERCENT=85
export CHECK_ZPOOLS_ZFS__CAPACITY__CRITICAL_PERCENT=95

# Override daemon check interval
export CHECK_ZPOOLS_DAEMON__CHECK_INTERVAL_SECONDS=600

# Override email SMTP settings
export CHECK_ZPOOLS_EMAIL__SMTP_HOSTS="smtp.gmail.com:587"
export CHECK_ZPOOLS_EMAIL__FROM_ADDRESS="alerts@example.com"
export CHECK_ZPOOLS_EMAIL__SMTP_PASSWORD="app-password"

# Override logging (lib_log_rich native variables - highest precedence)
export LOG_CONSOLE_LEVEL=DEBUG
export LOG_NO_COLOR=true

# Run with overrides
CHECK_ZPOOLS_ZFS__CAPACITY__WARNING_PERCENT=85 check_zpools check

Email Configuration Examples

Gmail with App Password

[email]
smtp_hosts = ["smtp.gmail.com:587"]
from_address = "your-email@gmail.com"
smtp_username = "your-email@gmail.com"
use_starttls = true

# Set password via environment variable
export CHECK_ZPOOLS_EMAIL_SMTP_PASSWORD="xxxx-xxxx-xxxx-xxxx"

Setup Gmail App Password:

Go to https://myaccount.google.com/security
Enable 2-Step Verification
Go to App Passwords: https://myaccount.google.com/apppasswords
Generate new app password
Use the 16-character password

Office 365 / Outlook

[email]
smtp_hosts = ["smtp.office365.com:587"]
from_address = "alerts@yourdomain.com"
smtp_username = "alerts@yourdomain.com"
use_starttls = true

Multiple SMTP Servers (Failover)

[email]
smtp_hosts = [
  "smtp.primary.com:587",
  "smtp.backup.com:587",
  "smtp.fallback.com:25"
]
from_address = "monitoring@example.com"
smtp_username = "monitoring@example.com"
use_starttls = true

The system will try each server in order until one succeeds.

Library Usage

You can use check_zpools as a Python library:

from check_zpools.behaviors import check_pools_once
from check_zpools.config import get_config
from check_zpools.models import Severity

# Perform one-shot pool check
result = check_pools_once()

print(f"Overall severity: {result.overall_severity.value}")
print(f"Pools checked: {len(result.pools)}")
print(f"Issues found: {len(result.issues)}")

# Display issues
for issue in result.issues:
    print(f"  [{issue.severity.value}] {issue.pool_name}: {issue.message}")

# Check severity and exit accordingly
if result.overall_severity == Severity.CRITICAL:
    print("CRITICAL issues detected!")
    exit(2)
elif result.overall_severity == Severity.WARNING:
    print("WARNING issues detected")
    exit(1)
else:
    print("All pools healthy")
    exit(0)

# Access configuration
config = get_config()
capacity_config = config['zfs']['capacity']
print(f"Warning threshold: {capacity_config['warning_percent']}%")
print(f"Critical threshold: {capacity_config['critical_percent']}%")

Advanced Example - Custom Monitoring Script:

#!/usr/bin/env python3
"""Custom ZFS monitoring with Slack notifications."""

import requests
from check_zpools.behaviors import check_pools_once
from check_zpools.models import Severity

def send_slack_alert(message: str, severity: Severity):
    """Send alert to Slack webhook."""
    webhook_url = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

    color = {
        Severity.CRITICAL: "danger",
        Severity.WARNING: "warning",
        Severity.INFO: "good",
        Severity.OK: "good"
    }[severity]

    payload = {
        "attachments": [{
            "color": color,
            "text": message,
            "footer": "ZFS Pool Monitor"
        }]
    }

    requests.post(webhook_url, json=payload)

# Check pools
result = check_pools_once()

# Send alerts if issues found
if result.issues:
    message = f"ZFS Issues Detected ({result.overall_severity.value}):\n"
    for issue in result.issues:
        message += f"• {issue.pool_name}: {issue.message}\n"

    send_slack_alert(message, result.overall_severity)

Troubleshooting

Common Issues

"ZFS command not available"

# Verify ZFS is installed
which zpool
zpool list

# If not installed (Ubuntu/Debian):
sudo apt install zfsutils-linux

# If installed but not in PATH:
export PATH="$PATH:/usr/sbin:/sbin"
check_zpools check

"Permission denied" errors

# ZFS commands require root privileges
sudo check_zpools check

# Or run daemon as root
sudo check_zpools daemon --foreground

# For systemd service (recommended):
sudo check_zpools service-install

Email delivery failures

# Test SMTP connectivity
telnet smtp.gmail.com 587

# Verify configuration
check_zpools config --section email

# Check logs for detailed error
LOG_CONSOLE_LEVEL=DEBUG check_zpools daemon --foreground

# Test email configuration (see send-notification command above)
check_zpools send-notification \
  --to test@example.com \
  --subject "Test" \
  --message "Testing email configuration"

Systemd service not starting

# Check service status
check_zpools service-status

# View detailed logs
sudo journalctl -u check_zpools -f

# Check for configuration errors
check_zpools config

# Verify ZFS access as root
sudo zpool list

Daemon not sending alerts

# Check alert recipients are configured
check_zpools config --section alerts

# Check email configuration
check_zpools config --section email

# Verify SMTP password is set
echo $CHECK_ZPOOLS_EMAIL_SMTP_PASSWORD

# Check alert state (may be suppressed)
cat ~/.cache/check_zpools/alert_state.json

# Force new alerts by clearing state
rm ~/.cache/check_zpools/alert_state.json

Daemon Logging

The daemon mode provides comprehensive logging to help monitor system health and troubleshoot issues. All logs are structured with additional metadata for easy filtering and analysis.

Log Levels

Set the log level using the LOG_CONSOLE_LEVEL environment variable:

# Available levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
LOG_CONSOLE_LEVEL=INFO check_zpools daemon --foreground
LOG_CONSOLE_LEVEL=DEBUG check_zpools daemon --foreground  # Detailed debugging

Check Cycle Statistics

On each check cycle, the daemon logs overall statistics at INFO level:

INFO: Check cycle completed [check_number=42, uptime="2d 5h 30m", pools_checked=3, issues_found=0, severity="OK"]

Logged fields:

check_number - Sequential check counter since daemon start
uptime - Human-readable daemon uptime (days, hours, minutes)
pools_checked - Number of pools monitored this cycle
issues_found - Total issues detected
severity - Overall severity level (OK, INFO, WARNING, CRITICAL)

Per-Pool Details

For each pool, the daemon logs detailed metrics at INFO level:

INFO: Pool: rpool [health="ONLINE", capacity_percent="45.2%", size="1.00 TB", allocated="452.00 GB", free="548.00 GB", read_errors=0, write_errors=0, checksum_errors=0, last_scrub="2025-11-18 14:30:00", scrub_errors=0, scrub_in_progress=False]

Logged fields per pool:

pool_name - Name of the pool
health - Health status (ONLINE, DEGRADED, FAULTED, etc.)
capacity_percent - Used capacity percentage
size - Total pool size (human-readable)
allocated - Allocated/used space (human-readable)
free - Free space available (human-readable)
read_errors - Read I/O error count
write_errors - Write I/O error count
checksum_errors - Checksum error count (data corruption)
last_scrub - Timestamp of last scrub or "Never"
scrub_errors - Errors found during last scrub
scrub_in_progress - Whether scrub is currently running

Viewing Logs

Systemd Service Logs

When running as a systemd service, logs are sent to journald via two mechanisms:

Console output capture: systemd captures stdout/stderr and forwards to journald
Direct journald logging: Structured fields are written directly to journald via the native API

This dual approach ensures logs are visible in journalctl while also providing rich structured metadata (pool names, capacity percentages, error counts, etc.) that can be queried programmatically.

# Follow logs in real-time
sudo journalctl -u check_zpools -f

# View last 50 entries
sudo journalctl -u check_zpools -n 50

# View last 100 entries
sudo journalctl -u check_zpools -n 100

# View logs since boot
sudo journalctl -u check_zpools -b

# View logs for specific time range
sudo journalctl -u check_zpools --since "2025-11-18 00:00:00" --until "2025-11-18 23:59:59"

# Filter by log level
sudo journalctl -u check_zpools -p info     # INFO and above
sudo journalctl -u check_zpools -p warning  # WARNING and above
sudo journalctl -u check_zpools -p err      # ERROR and above

# Search for specific pool
sudo journalctl -u check_zpools | grep "Pool: rpool"

# Export logs to file
sudo journalctl -u check_zpools > /tmp/check_zpools.log

# View with structured fields (verbose output)
sudo journalctl -u check_zpools -o verbose -n 10

Verbose output example (-o verbose):

The verbose format shows all structured fields attached to each log entry:

Tue 2025-11-25 12:00:05.123456 CET [s=abc123;i=1234;b=xyz789]
    _CAP_EFFECTIVE=1ffffffffff
    _SELINUX_CONTEXT=unconfined
    _SYSTEMD_SLICE=system.slice
    _BOOT_ID=986dd66e4e954b5597d26b935d2b628d
    _MACHINE_ID=373857a545ac4c4c85fa656760c38a36
    _HOSTNAME=proxmox-pbs
    _RUNTIME_SCOPE=system
    PRIORITY=6
    _GID=0
    _UID=0
    _COMM=python
    _EXE=/usr/bin/python3.13
    _CMDLINE=/root/.cache/uv/archive-v0/eHdzBoX7oENWEEFWZvspZ/bin/python /root/.cache/uv/archive-v0/eHdzBoX7oENWEEFWZvspZ/bin/check_zpools daemon --foreground
    _SYSTEMD_CGROUP=/system.slice/check_zpools.service
    _SYSTEMD_UNIT=check_zpools.service
    LOGGER_LEVEL=INFO
    SERVICE=check_zpools
    ENVIRONMENT=prod
    JOB_ID=cli-daemon
    USER_NAME=root
    HOSTNAME=proxmox-pbs
    COMMAND=daemon
    FOREGROUND=True
    _TRANSPORT=journal
    LOGGER_NAME=check_zpools.daemon
    PATHNAME=/root/.cache/uv/archive-v0/eHdzBoX7oENWEEFWZvspZ/lib/python3.13/site-packages/check_zpools/daemon.py
    FILENAME=daemon.py
    MODULE=daemon
    MESSAGE=Pool: rpool
    LINENO=557
    FUNCNAME=_log_pool_details
    POOL_NAME=rpool
    CAPACITY_PERCENT=18.0%
    SIZE=464.00 GB
    ALLOCATED=87.20 GB
    FREE=377.00 GB
    READ_ERRORS=0
    WRITE_ERRORS=0
    CHECKSUM_ERRORS=0
    SCRUB_ERRORS=0
    SCRUB_IN_PROGRESS=False
    HEALTH=ONLINE
    LAST_SCRUB=2025-11-25 10:04:00
    _PID=997164
    _SYSTEMD_INVOCATION_ID=2c67ad4fc02e477ca6eab402f29d198c
    PROCESS_ID=997164
    PROCESS_ID_CHAIN=997164
    EVENT_ID=b45d3d93670242e0a1dc2eeae30f5151
    TIMESTAMP=2025-11-25T11:50:48.386780+00:00
    _SOURCE_REALTIME_TIMESTAMP=1764071448390274

These structured fields enable powerful queries:

# Find entries where capacity exceeds 80%
sudo journalctl -u check_zpools CAPACITY_PERCENT=80..100

# Find entries with errors
sudo journalctl -u check_zpools READ_ERRORS=1..

Foreground Mode Logs

When running in foreground, logs go to stdout:

# Run with default INFO level
check_zpools daemon --foreground

# Run with DEBUG level for troubleshooting
LOG_CONSOLE_LEVEL=DEBUG check_zpools daemon --foreground

# Redirect to file
check_zpools daemon --foreground > /var/log/check_zpools.log 2>&1

# Follow logs with tail
check_zpools daemon --foreground 2>&1 | tee -a /var/log/check_zpools.log

Example Log Output

Here's what a typical check cycle looks like in the logs:

[2025-11-18 14:35:00] INFO: Starting ZFS pool monitoring daemon [version="2.1.1", interval_seconds=300, pools="all"]
[2025-11-18 14:35:00] INFO: PoolMonitor initialized [capacity_warning=80, capacity_critical=90, scrub_max_age_days=30]
[2025-11-18 14:35:05] INFO: Check cycle completed [check_number=1, uptime="0m", pools_checked=2, issues_found=0, severity="OK"]
[2025-11-18 14:35:05] INFO: Pool: rpool [health="ONLINE", capacity_percent="45.2%", size="1.00 TB", allocated="452.00 GB", free="548.00 GB", read_errors=0, write_errors=0, checksum_errors=0, last_scrub="2025-11-18 02:00:00", scrub_errors=0, scrub_in_progress=False]
[2025-11-18 14:35:05] INFO: Pool: backup [health="ONLINE", capacity_percent="62.5%", size="2.00 TB", allocated="1.25 TB", free="750.00 GB", read_errors=0, write_errors=0, checksum_errors=0, last_scrub="2025-11-17 02:00:00", scrub_errors=0, scrub_in_progress=False]
[2025-11-18 14:40:05] INFO: Check cycle completed [check_number=2, uptime="5m", pools_checked=2, issues_found=0, severity="OK"]
[2025-11-18 14:40:05] INFO: Pool: rpool [health="ONLINE", capacity_percent="45.2%", ...]
[2025-11-18 14:40:05] INFO: Pool: backup [health="ONLINE", capacity_percent="62.5%", ...]

Log Analysis Tips

Monitor Daemon Health

# Check daemon uptime
sudo journalctl -u check_zpools | grep "uptime=" | tail -1

# Count total checks performed
sudo journalctl -u check_zpools | grep "Check cycle completed" | wc -l

# View last check statistics
sudo journalctl -u check_zpools | grep "Check cycle completed" | tail -1

Track Pool Capacity Over Time

# Extract capacity percentages for specific pool
sudo journalctl -u check_zpools | grep 'Pool: rpool' | grep -o 'capacity_percent="[^"]*"'

# Monitor capacity growth
sudo journalctl -u check_zpools --since "1 week ago" | grep 'Pool: rpool' | grep -o 'capacity_percent="[^"]*"'

Find Issues

# Find all warnings
sudo journalctl -u check_zpools -p warning

# Find cycles with issues
sudo journalctl -u check_zpools | grep 'issues_found=[1-9]'

# Find error events
sudo journalctl -u check_zpools | grep -E '(read_errors=[1-9]|write_errors=[1-9]|checksum_errors=[1-9])'

Further Documentation

User Documentation

Install Guide - Detailed installation instructions and setup
Changelog - Version history and detailed release notes
Security Policy - Security reporting and vulnerability disclosure

Developer Documentation

Development Handbook - Development setup, testing, and workflow
Contributing Guide - How to contribute to the project
Code Architecture - Architectural design and patterns
Module Reference - Comprehensive module and API documentation
Test Refactoring Guide - Clean architecture test patterns and examples
Claude Code Guidelines - AI-assisted development guidelines (for contributors using Claude Code)

License

MIT License - Open source license terms

Future Enhancements

We're always looking to improve check_zpools! Here are planned features and enhancement requests for future versions. If you're interested in any of these features, please open an issue or discussion on GitHub to help prioritize development.

Monitoring Enhancements

Remote ZFS pools monitoring via SSH - Monitor ZFS pools on remote systems without local ZFS installation
Dataset-level monitoring - Track individual dataset health, quotas, and usage in addition to pools
Resilver/scrub progress tracking - Alert when resilver operations are stuck or taking too long
Device-level monitoring - Track individual disk health within pools (vdev status)
Fragmentation tracking - Alert on high fragmentation levels that may impact performance
SMART data integration - Correlate ZFS errors with disk SMART status for predictive failure detection
Pool I/O statistics - Track read/write performance trends and detect I/O bottlenecks
Snapshot monitoring - Track snapshot age, count, and space consumption

Alerting & Notification Enhancements

Multiple notification channels - Slack, Discord, Microsoft Teams, PagerDuty, webhook support
Alert grouping/batching - Batch multiple issues into single notification to reduce noise
Templated alert messages - User-customizable email and notification templates
Alert escalation - Auto-escalate alerts if issues persist beyond configured timeframes
Quiet hours - Suppress non-critical alerts during specified maintenance windows
Alert acknowledgment - Track who acknowledged which alerts for audit compliance

Reporting & Visualization

Interactive TUI dashboard - Real-time monitoring dashboard using Textual (library already available)
Historical trending - Store metrics over time for trend analysis and capacity planning
Weekly/monthly summaries - Scheduled health reports via email
Capacity prediction - Estimate when pools will reach thresholds based on usage trends
Metrics export - Prometheus exporter, InfluxDB, Graphite integration
Web dashboard - Web-based status visualization and configuration management
Grafana dashboards - Pre-built Grafana dashboard templates

Advanced Daemon Features

Adaptive check intervals - Automatically increase check frequency when issues detected
Self-healing actions - Auto-trigger scrub on checksum errors, automated pool recovery
Maintenance windows - Suppress alerts during scheduled maintenance periods
Pool-specific configurations - Different thresholds and check intervals per pool
Integration with monitoring systems - Native Nagios, Zabbix, Icinga plugins

CLI & Usability Enhancements

Historical query - Query and display past check results from state file
Pool comparison - Compare multiple pools side-by-side with diff visualization
Threshold testing - "What-if" simulation for testing threshold changes before deployment
Configuration validation - Validate configuration files before deployment
Dry-run mode - Test monitoring logic without sending alerts

Security & Compliance

Audit logging - Comprehensive audit trail of all check operations and alerts
Read-only mode - Monitor pools without requiring write permissions
Encrypted state files - Encrypt alert state at rest for sensitive environments
Role-based access - Multi-user support with different permission levels

Contributing

Interested in implementing any of these features? We welcome contributions! Please:

Open a discussion to coordinate implementation approach
Review the Contributing Guide for development workflow
Check existing issues to avoid duplicate work

Have an idea not listed here? Please open an issue or discussion on GitHub!

Support & Community

Getting Help

📚 Documentation: Browse comprehensive guides in the Further Documentation section above
💬 Discussions: Ask questions and share ideas on GitHub Discussions
🐛 Bug Reports: Report issues on GitHub Issues
🔒 Security: Report vulnerabilities privately via SECURITY.md

Quick Links

Repository: https://github.com/bitranox/check_zpools
PyPI Package: https://pypi.org/project/check-zpools/
Changelog: CHANGELOG.md - See what's new
Contributing: CONTRIBUTING.md - Join the project

Before Opening an Issue

Check existing issues to avoid duplicates
Review the documentation for your question
Search discussions for similar topics
For bugs, include:
- Your OS and Python version (python --version)
- ZFS version (zpool --version or zfs --version)
- Full error message and traceback
- Configuration file (sanitize sensitive data)
- Steps to reproduce the issue

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.7.3

Feb 13, 2026

3.7.1

Feb 1, 2026

3.7.0

Jan 29, 2026

3.6.3

Dec 15, 2025

3.6.2

Dec 13, 2025

3.6.1

Dec 8, 2025

3.6.0

Dec 3, 2025

3.5.0

Nov 27, 2025

3.3.0

Nov 26, 2025

3.2.2

Nov 26, 2025

3.2.0

Nov 26, 2025

3.1.0

Nov 26, 2025

3.0.1

Nov 26, 2025

3.0.0

Nov 25, 2025

This version

2.5.0

Nov 25, 2025

2.4.0

Nov 24, 2025

2.3.0

Nov 23, 2025

2.1.6

Nov 18, 2025

2.1.5

Nov 18, 2025

2.1.4

Nov 18, 2025

2.1.3

Nov 18, 2025

2.1.2

Nov 18, 2025

2.1.1

Nov 18, 2025

2.1.0

Nov 18, 2025

2.0.6

Nov 18, 2025

2.0.5

Nov 17, 2025

2.0.3

Nov 17, 2025

2.0.2

Nov 17, 2025

2.0.1

Nov 17, 2025

2.0.0

Nov 17, 2025

1.1.5

Nov 17, 2025

1.1.4

Nov 17, 2025

1.1.3

Nov 17, 2025

1.1.2

Nov 17, 2025

1.1.1

Nov 17, 2025

1.1.0

Nov 17, 2025

1.0.3

Nov 17, 2025

1.0.2

Nov 17, 2025

1.0.1

Nov 17, 2025

1.0.0

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

check_zpools-2.5.0.tar.gz (235.9 kB view details)

Uploaded Nov 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

check_zpools-2.5.0-py3-none-any.whl (101.7 kB view details)

Uploaded Nov 25, 2025 Python 3

File details

Details for the file check_zpools-2.5.0.tar.gz.

File metadata

Download URL: check_zpools-2.5.0.tar.gz
Upload date: Nov 25, 2025
Size: 235.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for check_zpools-2.5.0.tar.gz
Algorithm	Hash digest
SHA256	`2317cbb0b507959e0b8aca81a801f4382ccba2e8a7a953becdb350e268169399`
MD5	`bca15cffe1156aa6a4e2898440cca7e3`
BLAKE2b-256	`3174671c01f956e013086d2c796e6915ab31998754224f053fecd0a20a7e6b2c`

See more details on using hashes here.

File details

Details for the file check_zpools-2.5.0-py3-none-any.whl.

File metadata

Download URL: check_zpools-2.5.0-py3-none-any.whl
Upload date: Nov 25, 2025
Size: 101.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for check_zpools-2.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`22a8e72b1dcdd8bc90dbd9cc135dbc12d64ecfb0318d142f7ead1b0a42e781f8`
MD5	`0c482fd3b793851f1cbcd8385a5575bf`
BLAKE2b-256	`0e5ce17cece1033f20fd51b2725b0ccb045e1230d3335595b1b02bf527d391f1`

See more details on using hashes here.

check-zpools 2.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

check_zpools

Features

Platform Support

Install - recommended via UV

Python 3.13+ Baseline

CLI Command Reference

Global Options

ZFS Monitoring Commands

check - One-Shot Pool Health Check

daemon - Continuous Monitoring

Systemd Service Management

service-install - Install Systemd Service

service-uninstall - Remove Systemd Service

service-status - Check Service Status

Configuration Management

config - Display Current Configuration

config-deploy - Deploy Configuration Files

Testing & Utilities

hello - Verify Installation

fail - Test Error Handling

send-email - Advanced Email Testing

send-notification - Test Email Configuration

Package Information

info - Display Package Information

Configuration

Quick Start Configuration

Configuration Sections

[zfs.capacity] - Capacity Monitoring

[zfs.errors] - Error Monitoring

[zfs.scrub] - Scrub Monitoring

[daemon] - Daemon Behavior

[alerts] - Alert Recipients

[email] - SMTP Configuration

Environment Variable Overrides

Email Configuration Examples

Gmail with App Password

Office 365 / Outlook

Multiple SMTP Servers (Failover)

Library Usage

Troubleshooting

Common Issues

"ZFS command not available"

"Permission denied" errors

Email delivery failures

Systemd service not starting

Daemon not sending alerts

Daemon Logging

Log Levels

Check Cycle Statistics

Per-Pool Details

Viewing Logs

Systemd Service Logs

Foreground Mode Logs

Example Log Output

Log Analysis Tips

Monitor Daemon Health

Track Pool Capacity Over Time

Find Issues

Further Documentation

User Documentation

Developer Documentation

License

Future Enhancements

Monitoring Enhancements

Alerting & Notification Enhancements

Reporting & Visualization

Advanced Daemon Features

CLI & Usability Enhancements

Security & Compliance

Contributing

Support & Community

`check` - One-Shot Pool Health Check

`daemon` - Continuous Monitoring

`service-install` - Install Systemd Service

`service-uninstall` - Remove Systemd Service

`service-status` - Check Service Status

`config` - Display Current Configuration

`config-deploy` - Deploy Configuration Files

`hello` - Verify Installation

`fail` - Test Error Handling

`send-email` - Advanced Email Testing

`send-notification` - Test Email Configuration

`info` - Display Package Information

`[zfs.capacity]` - Capacity Monitoring

`[zfs.errors]` - Error Monitoring

`[zfs.scrub]` - Scrub Monitoring

`[daemon]` - Daemon Behavior

`[alerts]` - Alert Recipients

`[email]` - SMTP Configuration