MCP server for evidence-driven incident triage with safe actions, Jira, Slack, and optional Airflow integration.
Project description
Incident Triage MCP
Incident Triage MCP is a Model Context Protocol (MCP) tool server for incident response.
It exposes structured, auditable triage tools (evidence collection, runbook search, safe actions, ticketing integrations, etc.) so AI agents (or LLM hosts) can diagnose and respond to outages with guardrails.
What this project is (and isn’t)
- ✅ Is: an MCP server that provides incident-triage tools + a workflow-friendly “evidence bundle” artifact.
- ✅ Is: designed to run locally (Claude Desktop stdio), via Docker (HTTP), and in Kubernetes.
- ❌ Is not: an LLM agent by itself. Agents/hosts call these tools.
Features
- True MCP transports:
stdioandstreamable-http - Tool discovery: tools are auto-discovered by MCP clients (e.g.,
tools/list) - Structured schemas: Pydantic models for tool inputs/outputs
- Evidence Bundle artifact: a single JSON “source of truth” produced by workflows
- Artifact store: filesystem (dev) or S3-compatible (MinIO/S3) for Docker/Kubernetes
- Audit-first: JSONL audit events (stdout by default for k8s)
- Guardrails: RBAC + safe-action allowlists (WIP / expanding)
- Pluggable integrations: mock-first, real adapters added progressively (env-based provider selection)
- Safe ticketing: draft Jira tickets + gated create (dry-run by default, RBAC + confirm token)
- Real idempotency for creates: reusing
idempotency_keyreturns the existing issue - Slack updates: post incident summary + ticket context (safe dry-run by default)
- Jira discovery tools: list accessible projects and project-specific issue types (read-only)
- Jira Cloud rich text: draft content renders as clean ADF (H2 section headings + bullet lists + inline bold/code)
- Demo-friendly tools:
evidence.wait_for_bundleand deterministicincident.triage_summary - Local LangGraph CLI agent: run end-to-end triage without Claude Desktop restarts
- Automated tests: unit tests cover all MCP tools in
server.py
Project layout
incident-triage-mcp/
pyproject.toml
README.md
docker-compose.yml
airflow/
dags/
artifacts/
runbooks/
src/
incident_triage_mcp/
__init__.py
server.py
audit.py
domain_models.py
tools/
adapters/
policy/
k8s/
deployment.yaml
service.yaml
airflow-creds.yaml
Quick start (local)
1) Install + run (stdio)
# RBAC + safe actions
MCP_ROLE=viewer|triager|responder|admin
CONFIRM_TOKEN=CHANGE_ME_12345 # required for non-dry-run safe actions
# Jira provider selection
JIRA_PROVIDER=mock|cloud
JIRA_PROJECT_KEY=INC
JIRA_ISSUE_TYPE=Task
# Jira Cloud (required when JIRA_PROVIDER=cloud)
JIRA_BASE_URL=https://your-domain.atlassian.net
JIRA_EMAIL=you@example.com
JIRA_API_TOKEN=***
# from repo root
pip install -e .
# stdio transport (for Claude Desktop)
MCP_TRANSPORT=stdio incident-triage-mcp
Packaging entrypoints (pip + docker)
Pip console scripts:
# MCP server
incident-triage-mcp
# Local LangGraph runner
incident-triage-agent --incident-id INC-123 --service payments-api --artifact-store fs --artifact-dir ./evidence
Docker image entrypoint:
# Default: starts MCP server (streamable-http on :3333)
docker run --rm -p 3333:3333 incident-triage-mcp:latest
# Override command: runs via uv in-project env
docker run --rm incident-triage-mcp:latest incident-triage-agent --incident-id INC-123 --service payments-api
2) Key environment variables
# MCP
MCP_TRANSPORT=stdio|streamable-http
MCP_HOST=0.0.0.0
MCP_PORT=3333
# Audit logging (k8s-friendly)
AUDIT_MODE=stdout|file # default: stdout
AUDIT_PATH=/data/audit.jsonl # only used when AUDIT_MODE=file
# Local runbooks (real data source, no creds)
RUNBOOKS_DIR=./runbooks
# Evidence backend (standalone-first)
# fs -> read/write local Evidence Bundle JSON files
# s3 -> read/write via S3 API (MinIO/S3)
# airflow -> expose airflow_* tools (requires Airflow env vars)
# none -> disable evidence reads entirely
EVIDENCE_BACKEND=fs|s3|airflow|none
# Local evidence directory for fs backend
EVIDENCE_DIR=./evidence
# Legacy alias still supported (maps to fs|s3 when EVIDENCE_BACKEND is unset)
ARTIFACT_STORE=fs|s3
# Airflow API (required only when EVIDENCE_BACKEND=airflow)
AIRFLOW_BASE_URL=http://localhost:8080
AIRFLOW_USERNAME=admin
AIRFLOW_PASSWORD=admin
# S3-compatible artifact store (required when EVIDENCE_BACKEND=s3)
S3_ENDPOINT_URL=http://localhost:9000
S3_BUCKET=triage-artifacts
S3_REGION=us-east-1
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin
# Jira ticket defaults
JIRA_PROJECT_KEY=INC
JIRA_ISSUE_TYPE=Task
# Slack notifications
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
SLACK_DEFAULT_CHANNEL=#incident-triage
# Idempotency storage for ticket create retries
IDEMPOTENCY_STORE_PATH=./data/jira_idempotency.json
Standalone Mode (No Airflow)
Boot MCP standalone with only stdio + local runbooks:
MCP_TRANSPORT=stdio \
RUNBOOKS_DIR=./runbooks \
EVIDENCE_BACKEND=fs \
EVIDENCE_DIR=./evidence \
incident-triage-mcp
Offline demo flow (no Airflow required):
- Seed deterministic evidence:
evidence_seed_sample(incident_id="INC-123", service="payments-api", window_minutes=30)
- Summarize incident:
incident_triage_summary(incident_id="INC-123")
- Draft Jira ticket from local evidence:
jira_draft_ticket(incident_id="INC-123")
Notes:
airflow_*tools are only registered whenEVIDENCE_BACKEND=airflow.- If
EVIDENCE_BACKEND=airflowbut Airflow env vars are missing, server still starts and Airflow tool calls return a clearairflow_disablederror.
Quick verification tests:
# standalone behavior (no airflow required)
UV_CACHE_DIR=.uv-cache /opt/anaconda3/bin/uv run --project . \
python -m unittest tests.test_standalone_mode -v
One-command standalone smoke check:
./scripts/smoke_standalone.sh INC-123 payments-api
Docker Compose (Airflow + Postgres + MCP)
This repo supports a local dev stack where:
- Airflow runs evidence workflows
- MinIO (S3-compatible) stores Evidence Bundles so the setup also works in Kubernetes
- MCP server reads Evidence Bundles from MinIO/S3 (or filesystem in dev mode)
Start
mkdir -p airflow/dags airflow/artifacts airflow/logs airflow/plugins data runbooks
docker compose up --build
Airflow UI
- URL:
http://localhost:8080 - Login:
admin / admin
MCP (HTTP)
- Default:
http://localhost:3333(streamable HTTP transport)
Tip: Claude Desktop usually spawns MCP servers via stdio. For Docker/HTTP, you typically use an MCP client that supports HTTP or add a small local stdio→HTTP bridge.
MinIO (artifact store)
- S3 API:
http://localhost:9000 - Console UI:
http://localhost:9001 - Credentials (dev):
minioadmin / minioadmin
Check artifacts:
docker run --rm --network incident-triage-mcp_default \
-e MC_HOST_local=http://minioadmin:minioadmin@minio:9000 \
minio/mc:latest ls local/triage-artifacts/evidence/v1/
Standalone Docker mode (no Airflow, no MinIO):
mkdir -p data evidence runbooks
docker compose --profile standalone up --build incident-triage-mcp-standalone
- MCP endpoint:
http://localhost:3334
Testing
Run all tests:
UV_CACHE_DIR=.uv-cache /opt/anaconda3/bin/uv run --project . \
python -m unittest discover -s tests -p 'test_*.py' -v
The suite currently covers all MCP tools defined in src/incident_triage_mcp/server.py.
Automated Releases
This repo supports automated tag-based release publishing for both PyPI and GHCR.
Release workflow:
- Trigger: push a Git tag like
v0.2.0 - Publishes:
- Python package to PyPI
- Docker image to
ghcr.io/<owner>/incident-triage-mcp - GitHub Release with generated notes
Required repository secret:
PYPI_API_TOKEN(PyPI API token with publish permission)
Release command:
# 1) bump version in pyproject.toml first, then:
git tag v0.2.0
git push origin v0.2.0
Notes:
- The workflow validates that tag
vX.Y.Zmatchesproject.versioninpyproject.toml. - GHCR publish uses the built-in
GITHUB_TOKEN.
Evidence Bundle workflow
Airflow produces a single artifact per incident:
fs: ./airflow/artifacts/<INCIDENT_ID>.json(dev)s3: s3://triage-artifacts/evidence/v1/<INCIDENT_ID>.json(Docker/K8s)
The MCP server exposes tools to:
- trigger evidence DAGs
- fetch evidence bundles
- search runbooks
This is the intended flow:
- Agent/host triggers evidence collection (Airflow DAG)
- Airflow writes the Evidence Bundle JSON artifact
2.5) Agent/host optionally calls
evidence.wait_for_bundleto poll until the artifact exists - Agent/host reads the bundle via MCP tools
- (later) ticket creation + safe actions use the same bundle
Demo flow (agent/host)
Typical demo sequence:
- Trigger evidence collection:
airflow_trigger_incident_dag(incident_id="INC-123", service="payments-api")
- Wait for the Evidence Bundle:
evidence_wait_for_bundle(incident_id="INC-123", timeout_seconds=90, poll_seconds=2)
- Generate a deterministic triage summary (no LLM required):
incident_triage_summary(incident_id="INC-123")
- Optional one-call orchestration (safe ticket dry-run hook):
incident_triage_run(incident_id="INC-123", service="payments-api", include_ticket=true)- Override project key for the ticket hook:
incident_triage_run(incident_id="INC-123", service="payments-api", include_ticket=true, project_key="PAY")
- Optional Slack notification hook (safe dry-run by default):
incident_triage_run(incident_id="INC-123", service="payments-api", notify_slack=true)- Set channel and send for real:
incident_triage_run(incident_id="INC-123", service="payments-api", notify_slack=true, slack_channel="#incident-triage", slack_dry_run=false)
Jira ticketing demo
-
Validate Jira Cloud credentials (cloud provider only):
jira_validate_credentials()
-
Discover Jira metadata first (recommended):
jira_list_projects()jira_list_issue_types()# usesJIRA_PROJECT_KEYdefaultjira_list_issue_types(project_key="SCRUM")
-
Draft a ticket (no credentials required, uses
JIRA_PROJECT_KEYby default):jira_draft_ticket(incident_id="INC-123")- Override project key per call:
jira_draft_ticket(incident_id="INC-123", project_key="PAY")
-
Safe create (mock provider by default):
- Dry run (default):
jira_create_ticket(incident_id="INC-123")- Override project key per call:
jira_create_ticket(incident_id="INC-123", project_key="PAY")
- Create (requires explicit approval inputs):
jira_create_ticket(incident_id="INC-123", dry_run=false, reason="Track incident timeline and coordinate responders", confirm_token="CHANGE_ME_12345", idempotency_key="INC-123-PAY-1")
- Dry run (default):
Notes:
- Non-dry-run is blocked unless RBAC allows it (
MCP_ROLE=responder|admin) andCONFIRM_TOKENis provided. - Swap providers via env:
JIRA_PROVIDER=mock(demo) orJIRA_PROVIDER=cloud(real Jira Cloud). JIRA_ISSUE_TYPEdefaults toTask(used for creates unless overridden in code).- Jira Cloud descriptions are sent as ADF and render section headers/bullets/inline formatting in the Jira UI.
- Reusing the same
idempotency_keyon non-dry-runjira_create_ticketreturns the existing issue instead of creating a duplicate.
Runbooks (local Markdown)
Put Markdown runbooks in:
./runbooks/*.md
Then use the MCP tool (example):
runbooks_search(query="5xx latency timeout", limit=5)
Kubernetes (local or remote)
You can deploy the MCP server into Kubernetes (local via kind/minikube or remote like EKS/GKE/AKS).
Local Kubernetes with kind (example)
brew install kind kubectl
kind create cluster --name triage
# build image
docker build -t incident-triage-mcp:0.1.0 .
# load into kind
kind load docker-image incident-triage-mcp:0.1.0 --name triage
# update k8s/deployment.yaml to use image: incident-triage-mcp:0.1.0
kubectl apply -f k8s/
kubectl port-forward svc/incident-triage-mcp 3333:80
Now the MCP service is reachable at http://localhost:3333.
Note: In Kubernetes,
AUDIT_MODE=stdoutis recommended so log collectors can capture audit events.
If MinIO is running in Docker on your Mac and MCP is running in kind, set
S3_ENDPOINT_URLtohttp://host.docker.internal:9000in the Kubernetes Deployment.
Roadmap (next)
- ✅ Ticketing: Jira draft + gated create (mock provider); add Jira Cloud provider wiring + richer formatting
- ✅ Artifact store for Docker/K8s via MinIO/S3 (filesystem remains for fast local dev)
- Add a Helm chart + GitHub Actions to build/push multi-arch Docker images
- Expand RBAC + safe actions with preconditions and approval tokens
- Add richer observability (metrics + structured tracing)
Contributing
PRs welcome. If you add an integration, prefer this pattern:
- define a provider contract (interface)
- implement
mock+real - select via env vars (no code changes for users)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file incident_triage_mcp-0.2.5.tar.gz.
File metadata
- Download URL: incident_triage_mcp-0.2.5.tar.gz
- Upload date:
- Size: 45.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae37af14ff0c8c5155306b38dbd10b909c461d410e56203aaf5483cb929e9ea7
|
|
| MD5 |
049fe85ff96aa4cdc2303745d8c0e7b3
|
|
| BLAKE2b-256 |
485c66749da53137b96cb254241dc9063d5829bcd80c2f51e718f71bbd5f4571
|
File details
Details for the file incident_triage_mcp-0.2.5-py3-none-any.whl.
File metadata
- Download URL: incident_triage_mcp-0.2.5-py3-none-any.whl
- Upload date:
- Size: 39.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0364018053f90b74c9bcd0624802fe55c3e20710143b21de86bd7d2a8ef0f798
|
|
| MD5 |
29858158b8aed8bf1295f58946866986
|
|
| BLAKE2b-256 |
8eff96e93542290a800061c3cf7294f202588f2fecc498878e3d3bbd6f39cb7a
|