Skip to main content

A Model Context Protocol (MCP) server for Prometheus Alertmanager alert management and incident response.

Project description

Alertmanager MCP Server

Alertmanager MCP Server

An MCP server that gives AI assistants the power to triage alerts, manage silences, inspect routing, and govern Alertmanager operations — from on-call summarization to notification pipeline testing.

License Python 3.12+ MCP Alertmanager Discord

Quick Start · Docs · Report Bug · Request Feature


Why Alertmanager MCP Server?

The problem: Alertmanager is the notification brain of the Prometheus ecosystem, but operating it effectively requires deep knowledge. Understanding the routing tree to know who gets paged, creating silences with the right matchers and durations, auditing why an alert didn't reach the right receiver, and managing maintenance windows — each of these requires familiarity with Alertmanager's configuration model. If you ask an AI assistant to help, it typically guesses at matcher syntax, creates overly broad silences, or can't explain the routing logic.

The solution: The Alertmanager MCP Server gives AI assistants (like Claude, Cline, or Cursor) structured, safe tools to operate Alertmanager natively. Instead of guessing at matchers or writing silence payloads from memory, your AI can now confidently manage the entire alert lifecycle:

  1. On-Call Triage: The AI summarizes active alerts grouped by severity and service, explains routing paths, and identifies alerts falling into the default route — all in one guided workflow.
  2. Safe Silence Management: Mandatory preview dry-runs before creating silences, duplicate detection, 24-hour duration caps, blast-radius warnings, and policy validation — preventing overly broad silences that could mask real incidents.
  3. Routing Introspection: Simulate routing for any label set (amtool config routes test-equivalent), inspect the full routing tree, list receivers with integration types, and audit which alerts hit the default route.
  4. Governance & Compliance: Export effective configuration for Git storage, audit recent silence changes with author tracking, and validate proposed silences against organizational policy.
  5. Multi-Backend Support: Manage multiple Alertmanager backends with explicit backend_id on every call — no hidden defaults.

Key Features

Backend Discovery & Multi-Backend

  • Discover and inspect multiple Alertmanager backends
  • Health checks, version info, cluster peer status
  • Supports standalone and clustered Alertmanager deployments

Alert Triage & On-Call

  • List and filter alerts by label, severity, state, and receiver
  • Alert group inspection (Alertmanager's native grouping)
  • Human-readable on-call summaries with severity/service breakdowns
  • Push test alerts to verify notification integrations

Silence Lifecycle Management

  • Full CRUD: create, update (extend), expire silences
  • Mandatory preview dry-run before broad silences
  • Duplicate silence detection — blocks creating equivalent active silences
  • 24-hour duration cap (configurable)
  • LLM-friendly silence_alert helper with scope control (instance/service/env)
  • Policy validation for compliance checks

Routing & Notification Introspection

  • Full nested routing tree inspection
  • Receiver enumeration with integration type detection (Slack, PagerDuty, email, webhook)
  • Route simulation for any label set with human-readable explanations
  • Default route audit — identifies misconfigured alerts

Governance & Audit

  • Export effective configuration as YAML or JSON
  • Track recent silence lifecycle changes with author attribution
  • In-memory audit log for all MCP-initiated operations
  • Silence policy validation (duration caps, comment requirements, blast radius)

Production-Ready

  • Structured logging (JSON/text)
  • Environment-based configuration with multi-backend JSON support

Architecture

                    ┌─────────────────────────┐
                    │     MCP Client          │
                    │ (Claude, Cline, Cursor) │
                    └──────────┬──────────────┘
                               │
                    ┌──────────▼──────────────┐
                    │   FastMCP Server Core   │
                    │  (HTTP / SSE / stdio)   │
                    └──────────┬──────────────┘
                               │
      ┌────────────┬───────────┼───────────┬────────────┐
      │            │           │           │            │
 ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
 │  Tools  │ │Resources│ │ Prompts │ │  Utils  │ │ Models  │
 │ (6 grp) │ │ (11)    │ │ (3)     │ │         │ │         │
 └────┬────┘ └────┬────┘ └─────────┘ └─────────┘ └─────────┘
      │            │
      └──────┬─────┘
             │
  ┌──────────▼──────────┐
  │    Service Layer     │
  │                      │
  │ alertmanager_service │
  └──────────┬──────────┘
             │
  ┌──────────▼──────────┐
  │ Alertmanager HTTP API│
  │ (v2 API)             │
  └─────────────────────┘

How it works:

  1. An AI assistant connects via HTTP, SSE, or stdio.
  2. The AI loads am://system/backends resource to discover available backends.
  3. Every subsequent tool call requires an explicit backend_id — no hidden state.
  4. The service layer interacts with Alertmanager's v2 HTTP API.
  5. Safety guardrails enforce silence duration caps and blast-radius warnings.

Table of Contents


Tech Stack

Category Technologies
Language Python 3.12+
MCP Framework FastMCP ≥2.13.3
Protocol Model Context Protocol (MCP)
Alertmanager HTTP API v2 · Silence API · Route Simulation
Transport HTTP · SSE · Streamable-HTTP · stdio
Infrastructure Docker · uv

Getting Started

Prerequisites

  • Docker (recommended) or Python 3.12+ (for local dev)
  • Access to an Alertmanager instance (standalone or clustered)

Quick Start with Docker (recommended)

docker run --rm -it \
  -p 8768:8768 \
  -e ALERTMANAGER_BASE_URL=http://host.docker.internal:9093 \
  -e MCP_TRANSPORT=http \
  talkopsai/alertmanager-mcp-server:latest

The server is now listening on http://localhost:8768/mcp.

Point your MCP client at it:

{
  "mcpServers": {
    "alertmanager": {
      "url": "http://localhost:8768/mcp",
      "description": "MCP Server for Alertmanager alert triage, silence management, and routing"
    }
  }
}

From Source (Python)

  1. Install uv for dependency management.

  2. Clone and set up:

git clone https://github.com/talkops-ai/talkops-mcp.git
cd talkops-mcp/src/alertmanager-mcp-server
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
  1. Configure your .env:
ALERTMANAGER_BASE_URL=http://localhost:9093
MCP_TRANSPORT=http
MCP_LOG_LEVEL=INFO
  1. Run the server:
uv run alertmanager-mcp-server

Or, with the venv activated: alertmanager-mcp-server.

  1. Run tests:
source .venv/bin/activate
pytest tests/

Configuration

All configuration is via environment variables (loaded from .env via python-dotenv).

Server Configuration

Variable Default Description
MCP_SERVER_NAME alertmanager-mcp-server Server name identifier
MCP_SERVER_VERSION 0.1.0 Server version string
MCP_TRANSPORT stdio Transport mode: http, sse, streamable-http, or stdio
MCP_HOST 0.0.0.0 Host address for HTTP server
MCP_PORT 8768 Port for HTTP server
MCP_PATH /mcp MCP endpoint path
MCP_LOG_LEVEL INFO Log level: DEBUG, INFO, WARNING, ERROR
MCP_LOG_FORMAT json Log format: json or text

Alertmanager Backend (Single)

Variable Default Description
ALERTMANAGER_BASE_URL http://localhost:9093 Alertmanager HTTP API base URL
ALERTMANAGER_BACKEND_ID default Backend identifier used in all tool calls
ALERTMANAGER_DISPLAY_NAME (empty) Human-readable backend name
ALERTMANAGER_AUTH_HEADER (empty) Authorization header value (e.g. Bearer <token>)
ALERTMANAGER_VERIFY_SSL true Verify SSL certificates
ALERTMANAGER_TIMEOUT 30 HTTP timeout for Alertmanager API calls (seconds)

Alertmanager Backends (Multi)

For multiple backends, set ALERTMANAGER_BACKENDS as a JSON array:

ALERTMANAGER_BACKENDS='[
  {"id": "prod", "base_url": "https://alertmanager-prod.example.com", "labels": {"env": "prod"}},
  {"id": "staging", "base_url": "https://alertmanager-staging.example.com", "labels": {"env": "staging"}}
]'

Silence Safety

Variable Default Description
AM_MAX_SILENCE_MINUTES 1440 Maximum silence duration in minutes (24h)
AM_SILENCE_WARNING_THRESHOLD 50 Warn if a silence would affect ≥ N alerts

Available Tools

Alert Triage

Tool Description
am_list_alerts List alerts with label/state filters and pagination.
am_list_alert_groups List alert groups as computed by Alertmanager for high-level triage.
am_push_test_alert Fire a synthetic test alert to verify notification integrations.

Silence Lifecycle

Tool Description
am_list_silences List silences with optional state filter and pagination.
am_create_silence Create a silence to suppress matching alerts (with duplicate detection).
am_update_silence Update an existing silence (extend duration or modify end time).
am_expire_silence Expire a silence to reactivate alert notifications.

Silence Helpers

Tool Description
am_preview_silence Preview the blast radius of a silence before creating it.
am_silence_alert Create a narrowly-scoped silence for a specific alert (fingerprint or labels).

Routing & Notifications

Tool Description
am_explain_routing Simulate routing and inhibition for a given label set with explanation.
am_audit_default_route Show alerts falling into the default route, highlighting misconfigurations.

Governance & Audit

Tool Description
am_list_recent_changes List recent silence changes (created, expired, updated) within a time window.
am_validate_silence_policy Validate a proposed silence against organizational policy before creation.

On-Call Triage

Tool Description
am_summarize_oncall Generate a human-readable on-call summary of active alerts.

Available Resources

Resource URI Description
am://system/backends All known backends with health status — use this as the first step in any workflow
am://system/backends/{backend_id} Detailed status, version, cluster info, and health for a specific backend
am://system/status Alertmanager version, uptime, cluster info, and config summary
am://system/receivers Configured receivers (Slack, PagerDuty, email, webhook) with redacted config
am://system/config Routing tree and inhibition rules (secrets redacted)
am://system/audit-log Recent MCP-initiated operations (create/expire/extend silence, push test alert)
am://alerts/active Bounded snapshot of active alerts for default backend
am://alerts/groups Snapshot of alert groups as computed by Alertmanager
am://silences/active Snapshot of active silences for default backend
am://best-practices Alerting best practices
am://onboarding-guide Alert onboarding guide

Available Prompts

Guided workflow prompts that orchestrate multiple tools into step-by-step journeys:

Prompt Name Description Parameters
am-alert-triage-guided Guided workflow for triaging active alerts backend_id, service, env
am-maintenance-silence-guided Guided workflow for creating a maintenance silence backend_id, service, env, duration
am-integration-test-guided Guided workflow for testing notification integrations (Slack, PagerDuty) backend_id, team, receiver

Usage

Supported workflows with prompt examples and links to detailed guides:

Workflow Prompt Example Documentation
On-Call Triage "Summarize what's firing right now for the checkout service in prod." AM_TRIAGE_TEST_GUIDE.md
Maintenance Silence "Silence alerts for the payments service in prod for 2 hours during deployment." AM_SILENCE_TEST_GUIDE.md
Routing Audit "Who gets paged when a critical alert fires for the api-server?" AM_GOVERNANCE_TEST_GUIDE.md
Integration Testing "Push a test alert to verify that the slack-sre receiver is working." AM_GOVERNANCE_TEST_GUIDE.md
Governance Review "Show me all silence changes in the last 24 hours and who created them." AM_GOVERNANCE_TEST_GUIDE.md

See WORKFLOW_JOURNEYS.md for the full workflow reference and PROMPT_REFERENCE.md for natural-language prompts.


Project Structure

alertmanager-mcp-server/
├── alertmanager_mcp_server/        # Main package
│   ├── tools/                      # MCP Tools (6 active tool groups, 14 tools)
│   │   ├── alert_tools.py          # Alert listing, grouping, test alerts
│   │   ├── silence_tools.py        # Silence CRUD lifecycle
│   │   ├── helper_tools.py         # Preview & quick silence helpers
│   │   ├── routing_tools.py        # Routing simulation, default route audit
│   │   ├── governance_tools.py     # Audit, policy validation
│   │   └── triage_tools.py         # On-call summarization
│   ├── resources/                  # MCP Resources (11 URIs)
│   │   ├── backend_resources.py    # Backend health & capabilities
│   │   ├── alert_resources.py      # Active alerts & groups
│   │   ├── silence_resources.py    # Active silences
│   │   ├── config_resources.py     # Receivers & routing config
│   │   ├── status_resources.py     # Version, uptime, cluster info
│   │   ├── audit_resources.py      # MCP operation audit log
│   │   └── static_resources.py     # Best practices & onboarding guide
│   ├── prompts/                    # MCP Prompts (3 guided workflows)
│   │   ├── triage_prompts.py       # Alert triage workflow
│   │   ├── silence_prompts.py      # Maintenance silence workflow
│   │   └── onboarding_prompts.py   # Integration test workflow
│   ├── services/                   # Business logic
│   │   └── alertmanager_service.py # Alertmanager HTTP API wrapper
│   ├── server/                     # FastMCP server setup
│   │   ├── core.py                 # Server creation
│   │   └── bootstrap.py            # Component initialization
│   ├── models/                     # Pydantic data models
│   │   ├── alert.py                # Alert & AlertMatcher
│   │   ├── silence.py              # Silence & PostableSilence
│   │   ├── backend.py              # BackendDescriptor
│   │   ├── config.py               # ConfigSnapshot, RouteNode, Receiver
│   │   └── audit.py                # AuditEntry
│   ├── utils/                      # Helpers
│   │   ├── __init__.py             # Matcher logic, silence window calc
│   │   └── audit.py                # In-memory audit log
│   ├── static/                     # Static documentation
│   │   ├── ALERTMANAGER_BEST_PRACTICES.md
│   │   ├── ALERTMANAGER_ONBOARDING_GUIDE.md
│   │   └── ALERTMANAGER_MCP_INSTRUCTIONS.md
│   ├── exceptions/                 # Custom exception hierarchy
│   ├── config.py                   # Environment parsing
│   └── main.py                     # Entry point
├── tests/                          # Test suites
├── docs/                           # Documentation
├── pyproject.toml                  # Package definitions (Python 3.12)
└── README.md                       # This documentation

Roadmap

Shipped:

  • Multi-backend discovery with health checks
  • Alert listing with label/state filtering and pagination
  • Alert group inspection (Alertmanager native grouping)
  • Full silence lifecycle (create, update, expire) with safety guardrails
  • Silence preview dry-run with blast-radius analysis
  • Duplicate silence detection
  • LLM-friendly silence_alert helper with scope control
  • Full routing tree introspection
  • Route simulation with human-readable explanations
  • Receiver enumeration with integration type detection
  • Default route audit for misconfiguration detection
  • On-call alert summarization grouped by severity/service
  • Silence policy validation (duration caps, comment requirements)
  • Config export (YAML/JSON) for Git storage
  • Silence change audit with author tracking
  • Test alert injection for integration verification
  • In-memory audit log for all MCP operations
  • 3 guided workflow prompts (triage, silence, integration test)

Coming next:

  • Prometheus MCP cross-integration for metric-level diagnostics
  • AlertmanagerConfig CRD management for Prometheus Operator
  • Silence templates for recurring maintenance windows
  • Webhook receiver testing with response validation
  • Multi-tenant silence policies with team-scoped permissions

See open issues for the full list of proposed features.


Contributing

Contributions are welcome. The process is straightforward:

  1. Fork the repo
  2. Create a branch (git checkout -b feature/SilenceTemplates)
  3. Make your changes and commit
  4. Push and open a PR

If you're considering something bigger, open an issue first so we can align on the approach.


FAQ

Which MCP clients work with this? Any MCP-compatible client including Claude Desktop, Cline, Cursor, and custom clients. Connect via http://localhost:8768/mcp for HTTP transport, or configure stdio for direct process communication.
Does this modify my Alertmanager configuration? Most tools are read-only. The exceptions are: am_create_silence/am_update_silence/am_expire_silence/am_silence_alert (create/expire silences), and am_push_test_alert (fires a real alert into Alertmanager). Governance and routing tools are strictly read-only — they inspect but never modify configuration.
Why does the server enforce silence duration caps? Unbounded silences are a leading cause of missed incidents. The default 24-hour cap ensures silences are time-boxed. If a maintenance window needs to be extended, use am_update_silence to incrementally extend. Override the cap via AM_MAX_SILENCE_MINUTES.
Can I use this with a clustered Alertmanager? Yes. Point ALERTMANAGER_BASE_URL at any cluster member or a load balancer. The server uses the standard Alertmanager v2 API, which handles cluster replication internally.
How does it relate to the Prometheus MCP Server? They are complementary. The Prometheus MCP Server handles metric querying, exporter deployment, and TSDB management. The Alertmanager MCP Server handles alert triage, silences, routing, and notification management. Use both together for full observability coverage.

Troubleshooting

Backend Connection Issues

  1. Verify ALERTMANAGER_BASE_URL points to a reachable Alertmanager instance.
  2. Load the am://system/backends resource to check health status.
  3. If using auth, verify ALERTMANAGER_AUTH_HEADER is set correctly.
  4. For SSL issues, try ALERTMANAGER_VERIFY_SSL=false (development only).

Silence Creation Failures

  1. Duration cap exceeded: The default cap is 24 hours (1440 minutes). Increase AM_MAX_SILENCE_MINUTES or use shorter durations.
  2. Duplicate silence: An equivalent active silence already exists. Use am_list_silences to find it.
  3. Missing matchers: At least one matcher is required. Use am_preview_silence first to validate.

Routing Simulation Issues

  1. Empty routing tree: The Alertmanager instance may not have a configuration loaded. Check am://system/config.
  2. No receivers found: Verify Alertmanager has receivers configured in its alertmanager.yml.
  3. Unexpected routing: Use am_explain_routing with the specific alert labels to trace the routing path.

Security Considerations

  • Never expose the MCP server to the public internet without proper authentication.
  • Silences affect real alert notifications — always preview before creating silences in production.
  • Test alerts fire real notificationsam_push_test_alert will trigger downstream integrations (Slack, PagerDuty, email).
  • Configuration export may contain sensitive routing rules — treat exported configs as confidential.

License

Apache 2.0 — see LICENSE.


Contact

TalkOps AIgithub.com/talkops-ai

Project: github.com/talkops-ai/talkops-mcp

Discord: Join the community


Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

talkops_alertmanager_mcp_server-0.1.0.tar.gz (116.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

talkops_alertmanager_mcp_server-0.1.0-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file talkops_alertmanager_mcp_server-0.1.0.tar.gz.

File metadata

File hashes

Hashes for talkops_alertmanager_mcp_server-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b276007e3d71399c25af3fd8a9cb27676dfc60fe92566f388a535b9af876dc88
MD5 ef7b4c9619824a8dd45660967569354a
BLAKE2b-256 b34fcef5babd79900082b781a0f85920d70e3875c5f2ddaf8f18ace620334db1

See more details on using hashes here.

Provenance

The following attestation bundles were made for talkops_alertmanager_mcp_server-0.1.0.tar.gz:

Publisher: release-pypi.yml on talkops-ai/talkops-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file talkops_alertmanager_mcp_server-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for talkops_alertmanager_mcp_server-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 259b11437b7c0925dd82f99d9f298d2dc351bd01e2cc8f20c6b07b0a2a5a3a3b
MD5 342aedc65ef1743498fb1039ff836958
BLAKE2b-256 d6acf26358df2d552eec9921afe6b7f62b24e82487cc377ebf05d3c11cf9da22

See more details on using hashes here.

Provenance

The following attestation bundles were made for talkops_alertmanager_mcp_server-0.1.0-py3-none-any.whl:

Publisher: release-pypi.yml on talkops-ai/talkops-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page