Skip to main content

Model Context Protocol (MCP) server exposing the Thruk monitoring REST API (Naemon / Nagios / Icinga / Shinken).

Project description

thruk-mcp

CI codecov PyPI PyPI downloads License: MIT Python ghcr.io GitHub release

Model Context Protocol (MCP) server for Thruk — the unified web frontend for Naemon, Nagios, Icinga and Shinken.

Expose Thruk's REST API to MCP-compatible clients (Claude Desktop, Dust, LibreChat, OpenWebUI...) so that an LLM can query hosts/services, schedule downtimes, acknowledge problems, force rechecks and more in natural language.

Features

  • Read: hosts, services, hostgroups, servicegroups, downtimes, comments, sites, aggregated stats, current problems
  • Write: schedule/delete downtimes, acknowledge & remove acks, force rechecks
  • Escape hatch: thruk_query tool to call any Thruk REST endpoint
  • Multi-backend support (Thruk federated sites): pass backends="prod,dr" to any tool
  • Transports: stdio (default) or Streamable-HTTP (--listen <port>, endpoint /mcp)
  • Async httpx client with proper error handling and TLS verification
  • Tested with pytest + respx, linted with ruff, packaged with hatchling

Quick start

1. Configure

cp .env.example .env
$EDITOR .env   # set THRUK_BASE_URL and THRUK_API_KEY

An API key can be created from the Thruk user profile page (requires api_keys_enabled in thruk_local.conf) or via the REST API itself.

2a. Run with Docker

docker compose up -d
# MCP Streamable-HTTP endpoint: http://localhost:8001/mcp

2b. Run locally

pip install thruk-mcp        # or: pipx install thruk-mcp

# stdio mode (for Claude Desktop, LibreChat, etc.)
thruk-mcp

# Streamable-HTTP mode — endpoint http://localhost:8001/mcp
thruk-mcp --listen 8001
# equivalently: thruk-mcp --transport streamable-http --listen 8001

# Behind a load balancer / multiple replicas, drop per-session state
# (no sticky routing required):
thruk-mcp --listen 8001 --stateless --json-response

# Multi-tenant: each request brings its own Thruk credentials via headers
# (no fixed THRUK_API_KEY at boot). Requires --stateless; serve over TLS.
thruk-mcp --listen 8001 --stateless --header-auth

For local development of the project itself, see CONTRIBUTING.md.

3. Wire it to an MCP client

Claude Desktop (~/.config/Claude/claude_desktop_config.json or macOS equivalent):

{
  "mcpServers": {
    "thruk": {
      "command": "thruk-mcp",
      "env": {
        "THRUK_BASE_URL": "https://monitor.example.com/thruk",
        "THRUK_API_KEY": "xxxxxxxx"
      }
    }
  }
}

4. Use with the Docker MCP Gateway

The image at ghcr.io/k9fr4n/thruk-mcp:latest defaults to stdio transport, so it can be spawned natively by the gateway.

Option A — Private local catalog

# 1. Create your private catalog
docker mcp catalog create thruk-private

# 2. Register this server (catalog/server.yaml ships with the repo)
docker mcp catalog add thruk-private thruk-mcp ./catalog/server.yaml

# 3. Configure credentials & enable
docker mcp secret set thruk-mcp.api_key=YOUR_KEY
docker mcp config write thruk-mcp.base_url=https://monitor.example.com/thruk
docker mcp server enable thruk-mcp

# 4. Run the gateway with your catalog
docker mcp gateway run --catalog thruk-private

Then point any MCP client (Claude Desktop, VS Code, Cursor, ...) at the gateway as documented here.

Option B — Submit upstream

catalog/server.yaml, catalog/tools.json and catalog/readme.md follow the docker/mcp-registry schema and can be submitted to the official Docker MCP Catalog via PR.

What's exposed

65 MCP Tools

Read — state thruk_list_hosts, thruk_get_host, thruk_list_services, thruk_get_service, thruk_list_hostgroups, thruk_list_servicegroups, thruk_list_contacts, thruk_get_contact, thruk_problems, thruk_stats, thruk_totals (compact 16-field host+service totals, faster than thruk_stats), thruk_sites.

Read — history & comments thruk_list_logs, thruk_list_alerts, thruk_list_notifications, thruk_notification_summary (notifications grouped by contact/host/service/state/command), thruk_recent_events, thruk_list_comments, thruk_list_downtimes, thruk_get_downtime, thruk_state_at (reconstruct the parc state at a past instant from /logs — a post-mortem snapshot), thruk_state_diff (what changed between two past instants t1t2, replayed from /logs).

Read — noise & flap analysis thruk_top_noisy_hosts (hosts ranked by alert count over a window), thruk_top_noisy_services (services ranked by alert count), thruk_flap_summary (hosts/services ranked by state transition count).

Read — problem intelligence thruk_oldest_problems (unhandled problems sorted by age, oldest first), thruk_unacked_critical (CRITICAL/DOWN not acknowledged for > N minutes), thruk_stale_acks (acknowledgements older than N days — forgotten problems), thruk_problem_counts (flat aggregate of unhealthy-state counts, filterable by hostgroup, custom vars or any structured filter — replaces the former thruk_problems_by_hostgroup), thruk_stale_checks (surface checks that stopped running — the dangerous "false green"), thruk_backend_health (per-site supervision-backend health: latency, replication lag, blind spots), thruk_worker_health (distinguish a real outage from a mod-gearman supervision blind spot).

Read — analytics thruk_alert_heatmap (alert counts bucketed by time, useful for spotting recurring patterns), thruk_notification_heatmap (notification counts bucketed by time — spot mail/paging storms), thruk_concurrent_failures (windows where multiple hosts failed simultaneously), thruk_recurring_problems (hosts/services generating repeated alerts over a window), thruk_root_cause (collapse a DOWN/UNREACHABLE storm into its root cause(s) via parent topology), thruk_unreachable_vs_down (split a host outage window into DOWN cause vs UNREACHABLE consequence).

Read — availability / SLA thruk_host_availability (uptime % for a single host — time_up_percent, time_down_percent, time_unreachable_percent and scheduled equivalents), thruk_service_availability (ok/warning/critical/unknown % for a single service), thruk_hostgroup_availability (availability for all hosts or services in a hostgroup, sorted worst-first; type = hosts | services | both), thruk_hostgroup_availability_summary (one aggregated rollup instead of one row per host — time-weighted availability_percent, worst/best, below_threshold count, state distribution; ideal for incident/SLA reports on large groups). All accept since/until (Thruk relative or ISO) or a timeperiod shortcut (lastmonth, thismonth, last24hours, lastweek, …). thruk_reliability_report (per host/service reliability metrics — MTTR / MTBF / incident counts — derived from the log over a window). thruk_incident_timeline (ordered event chronology — the post-mortem "déroulé" — for a host, service or hostgroup: every state change, notification, downtime, flap and acknowledgement in time order, plus an incident/MTTR summary; a scoping filter is required).

Read — performance data thruk_get_perfdata (fetch and parse performance data for a single host or service), thruk_perfdata_snapshot (parsed perfdata for every service matching a filter, in one call), thruk_perfdata_near_threshold (metrics within within_percent % of breaching their warn/crit range — early-warning signal before an alert fires).

Write — downtime management thruk_schedule_downtime (host/service), thruk_schedule_host_services_downtime (all services of a host), thruk_schedule_propagated_host_downtime (parent+children), thruk_schedule_hostgroup_downtime, thruk_schedule_servicegroup_downtime, thruk_delete_downtime, thruk_delete_active_downtimes, thruk_delete_downtimes_by_filter.

Write — problem handling thruk_acknowledge, thruk_bulk_acknowledge (acknowledge multiple hosts/services in one call), thruk_remove_acknowledgement, thruk_recheck, thruk_add_comment, thruk_delete_comment, thruk_checks (enable/disable active checks for a host or service), thruk_notifications (enable/disable host or service notifications, with optional cascade to all services of a host).

Escape hatches thruk_query (raw call to any REST endpoint), thruk_run_background_query (long-running endpoint via Thruk's ?background=1 mechanism with automatic job polling).

All list-style tools share a consistent limit / offset / sort / columns contract. By default they return a tight subset of columns (~10 fields per row) to keep LLM token consumption low. Pass columns="" to opt out and receive every column the Thruk row contains.

5 MCP Resources

URI templates that MCP clients with a resource browser (Claude Desktop, VS Code, ...) can "open" like files:

URI Content
thruk://hosts/{name} Full host JSON
thruk://services/{host}/{service} Full service JSON
thruk://hostgroups/{name} Host group config + members
thruk://problems Current unhandled problems (hosts + services)
thruk://stats Aggregated host/service stats (cached)

3 MCP Prompts

Pre-canned workflows the user can invoke as a slash-command in the MCP client UI:

Prompt Arguments Purpose
investigate_alert host, optional service 7-step incident triage
schedule_maintenance target, duration_minutes, kind Safe downtime workflow with confirmation
diagnose_flapping host, service Root-cause a flapping service (uses thruk_flap_summary)
daily_health_report optional hostgroup Morning read-only health digest (totals, unacked, stale, oldest, noisiest)
incident_triage optional hostgroup Major-incident triage: blast radius, common cause, prioritised actions
capacity_review optional hostgroup, within_percent Saturation review of metrics nearing their warn/crit thresholds
sla_report target, kind, timeperiod Availability / SLA report with downtime breakdown and 99.9% verdict
noise_review optional since Alert-fatigue hygiene: noisiest, flapping, recurring, heatmap clustering

Robustness

  • Connection retrieshttpx.AsyncHTTPTransport(retries=3) handles DNS failures, connection refusals, TLS handshakes.
  • HTTP retries with backoff — 5xx and 429 responses are retried up to 3 times with exponential backoff + jitter (cap 5 s).
  • Opt-in TTL cache — slow-moving endpoints (/sites, /processinfo, /hosts/stats, /services/stats, /contacts, /timeperiods, ...) are cached in-process for 15 s. Any tool can request caching via cache_ttl= on the underlying client. This absorbs the burst of identical calls an LLM agent typically issues across a multi-tool turn.
  • Pagination helperThrukClient.get_all() is an async generator that iterates pages of 500 rows up to a configurable hard limit (default 50 000), so internal callers can scan entire backends without manual offset math.
  • Long-running queries — the thruk_run_background_query tool wraps Thruk's ?background=1 flow and polls /thruk/jobs/<id>/output until the job completes (5 min default timeout).

Environment variables

Connection

Variable Default Description
THRUK_BASE_URL http://localhost/thruk Thruk URL (no trailing slash)
THRUK_API_KEY (required) X-Thruk-Auth-Key header
THRUK_AUTH_USER Impersonation user (superuser key only)
THRUK_VERIFY_SSL true Set false for self-signed certs
THRUK_TIMEOUT 30 HTTP timeout in seconds
THRUK_DEFAULT_BACKENDS CSV of default backend names (federated Thruk)

Security / multi-tenant (v0.6)

Variable Default Description
THRUK_READ_ONLY false Strip every write tool (ack, downtime, recheck, ...)
THRUK_ENABLED_TOOLS Allowlist of tool names. CSV with fnmatch wildcards. Empty = all
THRUK_AUDIT_LOG true Emit one JSON audit line on stderr per write tool invocation
THRUK_MAX_CONCURRENT 0 Cap of concurrent in-flight HTTP requests. 0 = unlimited
THRUK_HTTP_HEADER_AUTH false Streamable-HTTP multi-tenant: take credentials from per-request headers (= --header-auth)

Security

  • Read-only mode — set THRUK_READ_ONLY=true to remove every write tool (thruk_acknowledge, thruk_schedule_*_downtime, thruk_recheck, thruk_delete_*, thruk_run_background_query) from the MCP server. The LLM literally cannot mutate monitoring state. Use this for general-purpose agents that should only observe.

  • Tool allowlistTHRUK_ENABLED_TOOLS=thruk_list_*,thruk_problems,thruk_stats restricts the exposed surface to the listed tools (fnmatch wildcards supported). Useful when fronting multiple LLM clients with the same gateway but different scopes.

  • Audit log — every write tool invocation emits one JSON line on thruk_mcp.audit (stderr by default):

    {"ts":"2026-05-17T22:00:00+00:00","tool":"thruk_acknowledge","user":"alice",
     "args":{"host":"srv01","comment":"investigating"},"target":"srv01","status":"ok"}
    

    Disable with THRUK_AUDIT_LOG=false. Sensitive keys (api_key, password, token) are redacted as *** before logging.

  • Rate limitTHRUK_MAX_CONCURRENT=8 caps in-flight HTTP requests with an asyncio.Semaphore. Combined with the v0.3 TTL cache, this protects the Thruk core from an LLM that loops on tools or chains them aggressively.

  • Header-auth multi-tenant — run thruk-mcp --listen 8001 --stateless --header-auth (or THRUK_HTTP_HEADER_AUTH=1) to serve many users from one process, each with their own Thruk credentials supplied per request via headers:

    Header Maps to Required
    X-Thruk-Auth-Key api_key yes (else 401)
    X-Thruk-Base-Url base_url no (falls back to THRUK_BASE_URL)
    X-Thruk-Auth-User auth_user no
    X-Thruk-Backends default_backends (CSV) no

    The server boots without THRUK_API_KEY. Only credential/endpoint fields come from headers — THRUK_READ_ONLY, THRUK_ENABLED_TOOLS and THRUK_AUDIT_LOG remain server-owned, so a tenant cannot grant itself write access or silence the audit log (which still attributes each call to the tenant's auth_user). Per-tenant HTTP clients are pooled in a bounded LRU cache. The API key travels in a header, so serve only over TLS (terminate TLS in front, or behind a trusted reverse proxy). Requires --stateless.

Development

pip install -e ".[dev]"
pre-commit install                              # one-time setup of git hooks

ruff check src tests && ruff format src tests   # lint + format
mypy src                                        # type-check
pytest -v --cov=thruk_mcp --cov-fail-under=80   # tests with coverage gate

Conventions:

  • Conventional Commits (feat:, fix:, chore:, docs:, refactor:, test:).
  • No direct push to main: branch → PR → squash merge.
  • Any new tool must come with a respx-mocked unit test in tests/test_tools.py; regenerate catalog/tools.json (Docker MCP Registry contract) with python scripts/gen_tools_json.py — it is generated from the live registry, not hand-edited, and CI enforces it via --check.
  • CI gate: ruff, ruff format --check, mypy, pytest with 80 % coverage minimum.

References

Project docs

  • CHANGELOG.md — what changed in each release.
  • UPGRADING.md — per-version migration notes.
  • SUPPORT.md — supported Python / Thruk / MCP-client versions, security policy, release cadence.
  • CONTRIBUTING.md — dev setup, PR conventions, tool / env-var contribution checklists.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thruk_mcp-2.0.1.tar.gz (292.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thruk_mcp-2.0.1-py3-none-any.whl (149.5 kB view details)

Uploaded Python 3

File details

Details for the file thruk_mcp-2.0.1.tar.gz.

File metadata

  • Download URL: thruk_mcp-2.0.1.tar.gz
  • Upload date:
  • Size: 292.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for thruk_mcp-2.0.1.tar.gz
Algorithm Hash digest
SHA256 eb3d69e362363600acd96a343227e594b94f4bc5349b28abed844890e004fb69
MD5 4ac0be31dc74f354e34bc3f473925f90
BLAKE2b-256 790442e4e88e06f129fe61baae6c3359bb4485930da1e9affac1decbd1d6e00c

See more details on using hashes here.

Provenance

The following attestation bundles were made for thruk_mcp-2.0.1.tar.gz:

Publisher: release.yml on k9fr4n/thruk-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thruk_mcp-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: thruk_mcp-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 149.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for thruk_mcp-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d8821aa2a6df57b88821acd39ab681fe09b88c4c070c949fbcb47782eb65ab6f
MD5 6f0b28509fe589be78564b2852078d6b
BLAKE2b-256 24eb5ecd9e6d5abb976b93291797ce2f5460d410364d28aedf435a558301400f

See more details on using hashes here.

Provenance

The following attestation bundles were made for thruk_mcp-2.0.1-py3-none-any.whl:

Publisher: release.yml on k9fr4n/thruk-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page