MCP server for evidence-driven incident triage with safe actions, Jira, Slack, and optional Airflow integration.

Project description

Incident Triage MCP

Python MCP Transport Docker Kubernetes License

Incident Triage MCP is a Model Context Protocol (MCP) tool server for incident response.

It exposes structured, auditable triage tools (evidence collection, runbook search, safe actions, ticketing integrations, etc.) so AI agents (or LLM hosts) can diagnose and respond to outages with guardrails.

What this project is (and isn’t)

✅ Is: an MCP server that provides incident-triage tools + a workflow-friendly “evidence bundle” artifact.
✅ Is: designed to run locally (Claude Desktop stdio), via Docker (HTTP), and in Kubernetes.
❌ Is not: an LLM agent by itself. Agents/hosts call these tools.

Features

True MCP transports: stdio and streamable-http
Tool discovery: tools are auto-discovered by MCP clients (e.g., tools/list)
Structured schemas: Pydantic models for tool inputs/outputs
Evidence Bundle artifact: a single JSON “source of truth” produced by workflows
Artifact store: filesystem (dev) or S3-compatible (MinIO/S3) for Docker/Kubernetes
Audit-first: JSONL audit events (stdout by default for k8s)
Guardrails: RBAC + safe-action allowlists (WIP / expanding)
Pluggable integrations: mock-first, real adapters added progressively (env-based provider selection)
Safe ticketing: draft Jira tickets + gated create (dry-run by default, RBAC + confirm token)
Real idempotency for creates: reusing idempotency_key returns the existing issue
Slack updates: post incident summary + ticket context (safe dry-run by default)
Jira discovery tools: list accessible projects and project-specific issue types (read-only)
Jira Cloud rich text: draft content renders as clean ADF (H2 section headings + bullet lists + inline bold/code)
Demo-friendly tools: evidence.wait_for_bundle and deterministic incident.triage_summary
Local LangGraph CLI agent: run end-to-end triage without Claude Desktop restarts
Automated tests: unit tests cover all MCP tools in server.py

Project layout

incident-triage-mcp/
  pyproject.toml
  README.md
  docker-compose.yml
  airflow/
    dags/
    artifacts/
  runbooks/
  src/
    incident_triage_mcp/
      __init__.py
      server.py
      audit.py
      domain_models.py
      tools/
      adapters/
      policy/
  k8s/
    deployment.yaml
    service.yaml
    airflow-creds.yaml

Quick start (local)

1) Install + run (stdio)

# RBAC + safe actions
MCP_ROLE=viewer|triager|responder|admin
CONFIRM_TOKEN=CHANGE_ME_12345   # required for non-dry-run safe actions

# Jira provider selection
JIRA_PROVIDER=mock|cloud
JIRA_PROJECT_KEY=INC
JIRA_ISSUE_TYPE=Task

# Jira Cloud (required when JIRA_PROVIDER=cloud)
JIRA_BASE_URL=https://your-domain.atlassian.net
JIRA_EMAIL=you@example.com
JIRA_API_TOKEN=***

# from repo root
pip install -e .

# stdio transport (for Claude Desktop)
MCP_TRANSPORT=stdio incident-triage-mcp

Packaging entrypoints (pip + docker)

Pip console scripts:

# MCP server
incident-triage-mcp

# Local LangGraph runner
incident-triage-agent --incident-id INC-123 --service payments-api --artifact-store fs --artifact-dir ./evidence

Docker image entrypoint:

# Default: starts MCP server (streamable-http on :3333)
docker run --rm -p 3333:3333 incident-triage-mcp:latest

# Override command: runs via uv in-project env
docker run --rm incident-triage-mcp:latest incident-triage-agent --incident-id INC-123 --service payments-api

2) Key environment variables

# MCP
MCP_TRANSPORT=stdio|streamable-http
MCP_HOST=0.0.0.0
MCP_PORT=3333

# Audit logging (k8s-friendly)
AUDIT_MODE=stdout|file         # default: stdout
AUDIT_PATH=/data/audit.jsonl   # only used when AUDIT_MODE=file

# Local runbooks (real data source, no creds)
RUNBOOKS_DIR=./runbooks

# Evidence backend (standalone-first)
#   fs      -> read/write local Evidence Bundle JSON files
#   s3      -> read/write via S3 API (MinIO/S3)
#   airflow -> expose airflow_* tools (requires Airflow env vars)
#   none    -> disable evidence reads entirely
EVIDENCE_BACKEND=fs|s3|airflow|none

# Local evidence directory for fs backend
EVIDENCE_DIR=./evidence

# Legacy alias still supported (maps to fs|s3 when EVIDENCE_BACKEND is unset)
ARTIFACT_STORE=fs|s3

# Airflow API (required only when EVIDENCE_BACKEND=airflow)
AIRFLOW_BASE_URL=http://localhost:8080
AIRFLOW_USERNAME=admin
AIRFLOW_PASSWORD=admin

# S3-compatible artifact store (required when EVIDENCE_BACKEND=s3)
S3_ENDPOINT_URL=http://localhost:9000
S3_BUCKET=triage-artifacts
S3_REGION=us-east-1
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin

# Jira ticket defaults
JIRA_PROJECT_KEY=INC
JIRA_ISSUE_TYPE=Task

# Slack notifications
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
SLACK_DEFAULT_CHANNEL=#incident-triage

# Idempotency storage for ticket create retries
IDEMPOTENCY_STORE_PATH=./data/jira_idempotency.json

Standalone Mode (No Airflow)

Boot MCP standalone with only stdio + local runbooks:

MCP_TRANSPORT=stdio \
RUNBOOKS_DIR=./runbooks \
EVIDENCE_BACKEND=fs \
EVIDENCE_DIR=./evidence \
incident-triage-mcp

Offline demo flow (no Airflow required):

Seed deterministic evidence:
- evidence_seed_sample(incident_id="INC-123", service="payments-api", window_minutes=30)
Summarize incident:
- incident_triage_summary(incident_id="INC-123")
Draft Jira ticket from local evidence:
- jira_draft_ticket(incident_id="INC-123")

Notes:

airflow_* tools are only registered when EVIDENCE_BACKEND=airflow.
If EVIDENCE_BACKEND=airflow but Airflow env vars are missing, server still starts and Airflow tool calls return a clear airflow_disabled error.

Quick verification tests:

# standalone behavior (no airflow required)
UV_CACHE_DIR=.uv-cache /opt/anaconda3/bin/uv run --project . \
  python -m unittest tests.test_standalone_mode -v

One-command standalone smoke check:

./scripts/smoke_standalone.sh INC-123 payments-api

Docker Compose (Airflow + Postgres + MCP)

This repo supports a local dev stack where:

Airflow runs evidence workflows
MinIO (S3-compatible) stores Evidence Bundles so the setup also works in Kubernetes
MCP server reads Evidence Bundles from MinIO/S3 (or filesystem in dev mode)

Start

mkdir -p airflow/dags airflow/artifacts airflow/logs airflow/plugins data runbooks

docker compose up --build

Airflow UI

URL: http://localhost:8080
Login: admin / admin

MCP (HTTP)

Default: http://localhost:3333 (streamable HTTP transport)

Tip: Claude Desktop usually spawns MCP servers via stdio. For Docker/HTTP, you typically use an MCP client that supports HTTP or add a small local stdio→HTTP bridge.

MinIO (artifact store)

S3 API: http://localhost:9000
Console UI: http://localhost:9001
Credentials (dev): minioadmin / minioadmin

Check artifacts:

docker run --rm --network incident-triage-mcp_default \
  -e MC_HOST_local=http://minioadmin:minioadmin@minio:9000 \
  minio/mc:latest ls local/triage-artifacts/evidence/v1/

Standalone Docker mode (no Airflow, no MinIO):

mkdir -p data evidence runbooks
docker compose --profile standalone up --build incident-triage-mcp-standalone

MCP endpoint: http://localhost:3334

Testing

Run all tests:

UV_CACHE_DIR=.uv-cache /opt/anaconda3/bin/uv run --project . \
  python -m unittest discover -s tests -p 'test_*.py' -v

The suite currently covers all MCP tools defined in src/incident_triage_mcp/server.py.

Automated Releases

This repo supports automated tag-based release publishing for both PyPI and GHCR.

Release workflow:

Trigger: push a Git tag like v0.2.0
Publishes:
- Python package to PyPI
- Docker image to ghcr.io/<owner>/incident-triage-mcp
- GitHub Release with generated notes

Required repository secret:

PYPI_API_TOKEN (PyPI API token with publish permission)

Release command:

# 1) bump version in pyproject.toml first, then:
git tag v0.2.0
git push origin v0.2.0

Notes:

The workflow validates that tag vX.Y.Z matches project.version in pyproject.toml.
GHCR publish uses the built-in GITHUB_TOKEN.

Evidence Bundle workflow

Airflow produces a single artifact per incident:

fs: ./airflow/artifacts/<INCIDENT_ID>.json (dev)
s3: s3://triage-artifacts/evidence/v1/<INCIDENT_ID>.json (Docker/K8s)

The MCP server exposes tools to:

trigger evidence DAGs
fetch evidence bundles
search runbooks

This is the intended flow:

Agent/host triggers evidence collection (Airflow DAG)
Airflow writes the Evidence Bundle JSON artifact 2.5) Agent/host optionally calls evidence.wait_for_bundle to poll until the artifact exists
Agent/host reads the bundle via MCP tools
(later) ticket creation + safe actions use the same bundle

Demo flow (agent/host)

Typical demo sequence:

Trigger evidence collection:
- airflow_trigger_incident_dag(incident_id="INC-123", service="payments-api")
Wait for the Evidence Bundle:
- evidence_wait_for_bundle(incident_id="INC-123", timeout_seconds=90, poll_seconds=2)
Generate a deterministic triage summary (no LLM required):
- incident_triage_summary(incident_id="INC-123")
Optional one-call orchestration (safe ticket dry-run hook):
- incident_triage_run(incident_id="INC-123", service="payments-api", include_ticket=true)
- Override project key for the ticket hook: incident_triage_run(incident_id="INC-123", service="payments-api", include_ticket=true, project_key="PAY")
Optional Slack notification hook (safe dry-run by default):
- incident_triage_run(incident_id="INC-123", service="payments-api", notify_slack=true)
- Set channel and send for real: incident_triage_run(incident_id="INC-123", service="payments-api", notify_slack=true, slack_channel="#incident-triage", slack_dry_run=false)

Jira ticketing demo

Validate Jira Cloud credentials (cloud provider only):
- jira_validate_credentials()
Discover Jira metadata first (recommended):
- jira_list_projects()
- jira_list_issue_types() # uses JIRA_PROJECT_KEY default
- jira_list_issue_types(project_key="SCRUM")
Draft a ticket (no credentials required, uses JIRA_PROJECT_KEY by default):
- jira_draft_ticket(incident_id="INC-123")
- Override project key per call: jira_draft_ticket(incident_id="INC-123", project_key="PAY")
Safe create (mock provider by default):
- Dry run (default):
  - jira_create_ticket(incident_id="INC-123")
  - Override project key per call: jira_create_ticket(incident_id="INC-123", project_key="PAY")
- Create (requires explicit approval inputs):
  - jira_create_ticket(incident_id="INC-123", dry_run=false, reason="Track incident timeline and coordinate responders", confirm_token="CHANGE_ME_12345", idempotency_key="INC-123-PAY-1")

Notes:

Non-dry-run is blocked unless RBAC allows it (MCP_ROLE=responder|admin) and CONFIRM_TOKEN is provided.
Swap providers via env: JIRA_PROVIDER=mock (demo) or JIRA_PROVIDER=cloud (real Jira Cloud).
JIRA_ISSUE_TYPE defaults to Task (used for creates unless overridden in code).
Jira Cloud descriptions are sent as ADF and render section headers/bullets/inline formatting in the Jira UI.
Reusing the same idempotency_key on non-dry-run jira_create_ticket returns the existing issue instead of creating a duplicate.

Runbooks (local Markdown)

Put Markdown runbooks in:

./runbooks/*.md

Then use the MCP tool (example):

runbooks_search(query="5xx latency timeout", limit=5)

Kubernetes (local or remote)

You can deploy the MCP server into Kubernetes (local via kind/minikube or remote like EKS/GKE/AKS).

Local Kubernetes with kind (example)

brew install kind kubectl
kind create cluster --name triage

# build image
docker build -t incident-triage-mcp:0.1.0 .

# load into kind
kind load docker-image incident-triage-mcp:0.1.0 --name triage

# update k8s/deployment.yaml to use image: incident-triage-mcp:0.1.0
kubectl apply -f k8s/

kubectl port-forward svc/incident-triage-mcp 3333:80

Now the MCP service is reachable at http://localhost:3333.

Note: In Kubernetes, AUDIT_MODE=stdout is recommended so log collectors can capture audit events.

If MinIO is running in Docker on your Mac and MCP is running in kind, set S3_ENDPOINT_URL to http://host.docker.internal:9000 in the Kubernetes Deployment.

Roadmap (next)

✅ Ticketing: Jira draft + gated create (mock provider); add Jira Cloud provider wiring + richer formatting
✅ Artifact store for Docker/K8s via MinIO/S3 (filesystem remains for fast local dev)
Add a Helm chart + GitHub Actions to build/push multi-arch Docker images
Expand RBAC + safe actions with preconditions and approval tokens
Add richer observability (metrics + structured tracing)

Contributing

PRs welcome. If you add an integration, prefer this pattern:

define a provider contract (interface)
implement mock + real
select via env vars (no code changes for users)

License

MIT

Project details

Release history Release notifications | RSS feed

0.2.9

Mar 1, 2026

0.2.8

Feb 24, 2026

0.2.7

Feb 24, 2026

0.2.6

Feb 24, 2026

This version

0.2.5

Feb 20, 2026

0.2.3

Feb 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

incident_triage_mcp-0.2.5.tar.gz (45.1 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

incident_triage_mcp-0.2.5-py3-none-any.whl (39.3 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file incident_triage_mcp-0.2.5.tar.gz.

File metadata

Download URL: incident_triage_mcp-0.2.5.tar.gz
Upload date: Feb 20, 2026
Size: 45.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for incident_triage_mcp-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`ae37af14ff0c8c5155306b38dbd10b909c461d410e56203aaf5483cb929e9ea7`
MD5	`049fe85ff96aa4cdc2303745d8c0e7b3`
BLAKE2b-256	`485c66749da53137b96cb254241dc9063d5829bcd80c2f51e718f71bbd5f4571`

See more details on using hashes here.

File details

Details for the file incident_triage_mcp-0.2.5-py3-none-any.whl.

File metadata

Download URL: incident_triage_mcp-0.2.5-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 39.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for incident_triage_mcp-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0364018053f90b74c9bcd0624802fe55c3e20710143b21de86bd7d2a8ef0f798`
MD5	`29858158b8aed8bf1295f58946866986`
BLAKE2b-256	`8eff96e93542290a800061c3cf7294f202588f2fecc498878e3d3bbd6f39cb7a`

See more details on using hashes here.

incident-triage-mcp 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Incident Triage MCP

What this project is (and isn’t)

Features

Project layout

Quick start (local)

1) Install + run (stdio)

Packaging entrypoints (pip + docker)

2) Key environment variables

Standalone Mode (No Airflow)

Docker Compose (Airflow + Postgres + MCP)

Start

Airflow UI

MCP (HTTP)

MinIO (artifact store)

Testing

Automated Releases

Evidence Bundle workflow

Demo flow (agent/host)

Jira ticketing demo

Runbooks (local Markdown)

Kubernetes (local or remote)

Local Kubernetes with kind (example)

Roadmap (next)

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes