Skip to main content

MCP server to monitor and manage remote Linux servers via SSH. 63 tools: health checks, log search, APM, SLOs, anomaly detection, auto-remediation, live dashboard, CIS benchmarks, CVE scanning, database monitoring, compliance reports, team RBAC, PagerDuty/Telegram/OpsGenie.

Project description

Server Guardian MCP

The most comprehensive server management MCP ever built. 63 tools, 8 connection types, 16 modules — log search, access log APM, SLO tracking, anomaly detection, auto-remediation playbooks, CIS benchmarks, CVE scanning, database monitoring, network monitoring, file integrity, live web dashboard, compliance reports, public status pages, team RBAC, PagerDuty/Telegram/OpsGenie — all through Claude. No agents. Just SSH.

"The AI SRE that lives in your terminal. SSH into any server, diagnose any problem, fix it automatically — all through a conversation with Claude. No agents. No SaaS bills. No PromQL."

Live Dashboard

python -m server_guardian_mcp dashboard           # start on port 8080
python -m server_guardian_mcp dashboard --port 9090

Real-time web UI with auto-refresh every 30 seconds. Dark theme, Chart.js charts for CPU/memory/disk trends, active alerts feed, incident timeline.

Why Server Guardian?

What you say to Claude What happens
"Is my server okay?" SSH in, check CPU/RAM/disk/temp, detect anomalies vs baseline
"Why is production slow?" Check processes, disk, logs, access log APM, identify the bottleneck
"Search logs for OOM errors" Index logs in SQLite, search with pattern detection, show error rates
"Show me endpoint latency" Parse nginx access logs — p50/p95/p99 latency, error rates, slowest endpoints
"Are we meeting our SLOs?" Track uptime/latency/error targets, calculate error budget remaining
"What happened overnight?" Generate incident narrative from alerts, service events, playbook runs
"Fix it automatically" Run playbooks: clear disk, restart services, renew SSL certs
"Run a security audit" 61 CIS benchmark checks + CVE scan + rootkit detection + FIM
"Generate a compliance report" Branded HTML report with score (A-F) for SOC2/ISO prep
"How's the database?" Slow query analysis, connection counts, replication lag, table sizes
"Am I overpaying?" Rightsizing analysis: "CPU at 0.4%, memory at 7.7% — downsize to save 50%"
"What connects to what?" Map service dependencies from active network connections
"Write the postmortem" Auto-generate structured postmortem from incident timeline
"Create a status page" Public-facing uptime page for customers (replaces $29/mo tools)

Benchmarks vs Alternatives

Feature Server Guardian ssh-mcp mcp-ssh-manager HomeButler
Total tools 63 2 37 20
Connection types 8 1 1 1
Log search + pattern detection Yes - - -
Access log APM (p50/p95/p99) Yes - - -
SLO tracking + error budgets Yes - - -
Smart anomaly detection Yes - - -
Auto-remediation playbooks Yes - - -
CIS benchmark (61 checks) Yes - - -
CVE scanning + rootkit detection Yes - - -
File integrity monitoring Yes - - -
Database monitoring (MySQL/PG) Yes - - -
Network bandwidth monitoring Yes - - -
Service dependency mapping Yes - - -
Root cause correlation Yes - - -
Resource rightsizing Yes - - -
Multi-step API tests Yes - - -
Maintenance windows Yes - - -
Public status page Yes - - -
AI postmortem generation Yes - - -
Live web dashboard (Chart.js) Yes - - -
Compliance report (SOC2/ISO) Yes - - -
Team RBAC (admin/operator/viewer) Yes - - -
PagerDuty / Telegram / OpsGenie Yes - - -
Background watchdog daemon Yes - - Yes
Email / Slack / Discord alerts Yes - - Yes
Multi-cloud (AWS/GCP/Azure) Yes - - -
Docker container management Yes - Yes Yes

Quick Install

Claude Code (recommended)

claude mcp add server-guardian -- uvx server-guardian-mcp

pip

pip install server-guardian-mcp
claude mcp add server-guardian -- python -m server_guardian_mcp

From source

pip install -e .
claude mcp add server-guardian -- python -m server_guardian_mcp

Setup (2 minutes)

1. Create your .env

cp .env.example .env

2. Add your servers

# SSH (most common)
SERVER_PROD=ssh,203.0.113.10,22,deploy,key,~/.ssh/prod_key,Production

# Local machine
SERVER_LOCAL=local,,,,,My Machine

# Docker / Kubernetes / AWS SSM / GCP / Azure / WinRM also supported

3. Auto-discover existing servers

"Discover my SSH servers" — reads ~/.ssh/config and shows ready-to-paste .env lines.

4. Add aliases (optional)

SERVER_ALIASES=prod:PROD,stg:STAGING,dev:DEV

All 63 Tools

Core Server Management (6)

Tool What it does
list_all_servers Show all servers with online/offline status and latency
check_server_health Full snapshot: CPU, RAM, disk, swap, temp, load, top processes, network
run_shell_commands Run one or more shell commands on any server
run_shell_script Run multi-line bash scripts with shared variables
fetch_system_logs Fetch dmesg/syslog/journal/auth/nginx/custom logs with grep filter
list_running_processes Processes sorted by CPU or memory, with name filter

Service Management (5)

Tool What it does
manage_systemd_service Start/stop/restart/enable/disable/status/logs for any systemd service
list_all_services List ALL systemd services, filter by running/failed/inactive
find_failed_services Find every crashed/failed service in one call
restart_failed_services Bulk restart failed services — pass names or "ALL_FAILED"
watch_service_status Quick is-active + is-enabled check for specific services

Monitoring & Alerting (5)

Tool What it does
check_ssl_certificate SSL cert expiry, chain, issuer for any domain (no SSH)
check_http_endpoint HTTP status, response time, headers for any URL (no SSH)
monitor_server_health Health check + store in SQLite + auto-alert on thresholds
monitor_endpoints Check HTTP/SSL targets + store + alert on failures
get_active_alerts Show unresolved alerts grouped by severity

Log Search & APM (2)

Tool What it does
search_logs Index logs in SQLite, search with pattern detection, extract error rates
analyze_access_logs Nginx/Apache APM — per-endpoint p50/p95/p99 latency, error rates, throughput, top IPs

SLO Tracking & Reporting (4)

Tool What it does
manage_slos Define uptime/latency/error rate targets, track compliance, error budgets
generate_postmortem_tool Structured incident postmortem from alerts, services, playbook data
generate_status_page_tool Public-facing status page for customers (replaces Better Stack $29/mo)
get_weekly_report Weekly health summary for email or team review

Database Monitoring (2)

Tool What it does
query_database Run SQL queries on MySQL, PostgreSQL, or SQLite on any server
monitor_database Slow queries, connections, replication lag, table sizes (MySQL/PostgreSQL auto-detected)

Network Monitoring (2)

Tool What it does
inspect_network Listening ports, active connections, interfaces, DNS, routing
monitor_network Bandwidth per interface, connection states, TCP retransmissions, throughput rates

Security & Compliance (6)

Tool What it does
run_security_audit 10-point security check (SSH, firewall, logins, updates, sudo)
run_cis_benchmark 61 CIS Linux Benchmark checks across filesystem, network, SSH, PAM, logging
scan_vulnerabilities CVE scanning (package versions), rootkit detection, crypto miner detection
check_file_integrity FIM — hash critical files (/etc/passwd, sshd_config, etc.), detect unauthorized changes
manage_firewall UFW/iptables: status, allow, deny, delete rules, enable/disable
generate_compliance_report_tool Branded HTML report with score (A-F), suitable for SOC2/ISO

Docker (2)

Tool What it does
list_docker_containers Containers with CPU, memory, network, block I/O stats
fetch_docker_logs Container logs with grep filter and time range

Disk & Files (4)

Tool What it does
analyze_disk_usage Find largest items, files >100MB, inode usage
read_remote_file Read files on server (tail/head/all) with metadata
upload_file_to_server SFTP upload with size verification
download_file_from_server SFTP download

Multi-Server (2)

Tool What it does
run_on_all_servers Same commands on multiple servers — pass ["ALL"] for all
compare_across_servers Spot config drift: same command, side-by-side results

System Administration (4)

Tool What it does
manage_cron_jobs List, add, remove cron jobs on any server
manage_users List users, user info, add SSH keys, list keys, who is logged in
manage_packages List/install/remove/upgrade packages (apt, yum, dnf, apk auto-detected)
manage_nginx Status, list sites, show config, test, reload, restart, access/error logs

Git Deploy (1)

Tool What it does
git_deploy Status, pull, log, branch, switch, stash, diff on server git repos

Discovery (1)

Tool What it does
discover_ssh_servers Auto-discover servers from ~/.ssh/config with ready-to-paste .env lines

Dashboard & Analytics (6)

Tool What it does
multi_server_dashboard One-call summary of ALL servers: health, CPU, RAM, disk, failed services
get_monitoring_history Query health trends, service events, endpoint checks from SQLite
get_incident_timeline Chronological event log for a server
forecast_disk_usage Predict when disk will be full based on growth rate
generate_html_dashboard Self-contained HTML status page — open in any browser
resolve_alert Mark an alert as resolved

Intelligence & Automation (3)

Tool What it does
detect_anomalies_tool Statistical anomaly detection — flags metrics >2.5 sigma from baseline
replay_incident Generate chronological narrative from alerts, service events, playbook runs
manage_playbooks Auto-remediation: disk cleanup, service restart, SSL renewal, custom playbooks

Team & Integrations (3)

Tool What it does
team_manage RBAC user management: admin/operator/viewer roles with API keys
check_integrations Status and test for PagerDuty, Telegram, OpsGenie
live_dashboard_info How to start the live web dashboard and available API endpoints

Advanced Operations (5)

Tool What it does
run_api_test_tool Multi-step API tests with variable extraction and assertions
manage_maintenance_windows Suppress alerts during planned work
get_rightsizing_recommendations Identify over/under-provisioned resources to save costs
map_service_dependencies Discover service topology from active network connections
analyze_root_cause Correlate anomalies across metrics, services, alerts for root cause analysis

Access Log APM

80% of APM value with zero agent install. Parse nginx/Apache access logs for:

Tell Claude: "analyze access logs on PROD"
  • Per-endpoint latency percentiles (p50, p95, p99)
  • Error rates (4xx, 5xx) per endpoint
  • Throughput (requests per endpoint)
  • Slowest endpoints ranked
  • Status code breakdown
  • Top IPs by request volume
  • URL normalization (replaces IDs/UUIDs with placeholders)

Log Search & Pattern Detection

Tell Claude: "search logs on PROD for OOM" or "show me log patterns"
  • Fetches logs via SSH, indexes in SQLite for future searching
  • Pattern detection — clusters similar log lines, shows frequency
  • Error rate extraction (log-to-metrics)
  • Supports journal, syslog, auth, nginx, or any custom log path

SLO Tracking & Error Budgets

Tell Claude: "create an SLO for 99.9% uptime on PROD"
Tell Claude: "show me SLO status"
  • Define uptime, latency, or error rate targets
  • Track compliance from stored health/endpoint data
  • Calculate error budget remaining and burn rate
  • Configurable measurement windows (7d, 30d, 90d)

CIS Benchmark & Vulnerability Scanning

Tell Claude: "run CIS benchmark on PROD"
Tell Claude: "scan for vulnerabilities on PROD"
  • 61 CIS Linux Benchmark checks across: filesystem, software updates, boot security, process hardening, network config, SSH, PAM, user management, logging, cron
  • CVE scanning — lists installed packages, checks for security updates
  • Rootkit detection — hidden processes, suspicious kernel modules, SUID files, crypto miners, suspicious cron jobs
  • File integrity monitoring — hashes critical files, alerts on unauthorized changes

Database Monitoring

Tell Claude: "monitor database on PROD"
  • MySQL: slow query log, connection stats, replication lag, table sizes, processlist
  • PostgreSQL: pg_stat_statements, connections, replication, table sizes, lock analysis, cache hit ratio
  • Auto-detects which database is installed

Network Monitoring

Tell Claude: "monitor network on PROD"
  • Bandwidth per interface (bytes/sec, Mbps)
  • Connection state tracking (ESTABLISHED, TIME_WAIT, CLOSE_WAIT)
  • TCP retransmission rates
  • Historical trends stored in SQLite

Resource Rightsizing

Tell Claude: "rightsizing recommendations for PROD"
  • Analyzes CPU, memory, disk usage over time
  • Identifies over-provisioned resources ("CPU at 0.4% — downsize from 16 to 8 cores")
  • Identifies under-provisioned resources ("Memory at 92% — upgrade RAM")
  • Cost savings estimates

Service Dependency Mapping

Tell Claude: "map dependencies on PROD"
  • Parses active TCP connections to discover what processes talk to what
  • Groups by process (nginx -> database:5432, app -> redis:6379)
  • Stored in SQLite for historical tracking

Root Cause Analysis

Tell Claude: "analyze root cause on PROD"
  • Correlates metric spikes with service failures and alerts
  • Detects cascading failure patterns
  • Identifies resource exhaustion as cause of service crashes
  • Temporal correlation across all monitoring data

Smart Anomaly Detection

Tell Claude: "detect anomalies on PROD"
  • Builds baselines per metric grouped by hour and day of week
  • Flags values >2.5 standard deviations from the mean
  • No ML dependencies — pure statistics from SQLite data

Auto-Remediation Playbooks

5 built-in playbooks:

Playbook Trigger Action
disk_cleanup Disk > 90% Clear journal, /tmp, old logs, package cache
restart_failed_services Failed services detected Restart each failed service
high_memory_cleanup Memory > 95% Drop filesystem caches
high_cpu_investigation CPU load > 3x cores Log top CPU consumers
ssl_renewal SSL cert < 7 days Run certbot renew, reload nginx

Custom playbooks: drop JSON files in ~/.server-guardian-mcp/playbooks/

Public Status Page

Tell Claude: "generate a status page"
  • Self-hosted uptime page for customers
  • Shows server and endpoint health
  • Active incidents section
  • Auto-refreshes every 60 seconds
  • Replaces Better Stack ($29/mo) and Instatus ($20/mo) — free

Multi-Step API Tests

Tell Claude: "test my API"
  • Chain API calls: login -> extract token -> call API with token -> verify response
  • Variable extraction from JSON responses
  • Assertions: status code, body content, response time
  • Save and re-run named tests

Maintenance Windows

Tell Claude: "create maintenance window for PROD for 2 hours"
  • Suppress alerts during planned work
  • Configurable duration
  • List and delete windows

Compliance Reports

Tell Claude: "generate a compliance report for PROD"
  • Security score (0-100) with letter grade (A-F)
  • Detailed check results with pass/fail/warning badges
  • Active alerts section
  • Print-friendly, works in any browser
  • Suitable for SOC2/ISO prep and client deliverables

Team Mode (RBAC)

GUARDIAN_TEAM_MODE=true
GUARDIAN_API_KEY=sg_your_api_key_here
Role Permissions
admin Full access — all tools, user management
operator Run commands, restart services, deploy — no user management
viewer Read-only — view health, logs, alerts, dashboards

External Integrations

PAGERDUTY_ROUTING_KEY=your-routing-key
TELEGRAM_BOT_TOKEN=your-bot-token
TELEGRAM_CHAT_ID=your-chat-id
OPSGENIE_API_KEY=your-api-key

Background Watchdog

Runs independently of Claude — no AI, no API cost. Monitors 24/7 and sends alerts via email, Slack, Discord.

python -m server_guardian_mcp watchdog           # run forever
python -m server_guardian_mcp watchdog --once    # run one cycle

Alert thresholds

Condition Severity
Disk > 90% Critical
Disk > 80% Warning
CPU load > 2x cores Warning
Temperature > 85C Warning
Server unreachable Critical
Failed services Warning
HTTP endpoint down Critical
SSL cert < 7 days Critical
SSL cert < 30 days Warning

Connection Types

Type Connects to Requires
ssh Linux/Mac servers paramiko (included)
local Your own machine nothing
docker Docker containers docker CLI
winrm Windows servers pip install pywinrm
k8s Kubernetes pods kubectl CLI
aws-ssm AWS EC2 instances aws CLI
gcloud GCP Compute Engine gcloud CLI
azure Azure VMs az CLI

Security

  • Command blocklist — blocks rm -rf, fork bombs, reverse shells
  • Sensitive file protection — blocks .pem, .key, .env, /etc/shadow
  • SQL safety — read-only by default
  • Read-only modeGUARDIAN_MODE=readonly
  • Rate limiting — 30 calls/min per tool
  • Audit logging — all invocations logged with sensitive param redaction
  • Shell injection prevention — shlex.quote on all inputs
  • Output capped at 512KB per command
  • File integrity monitoring — detect unauthorized file changes
  • CIS benchmark compliance — 61 security checks
  • CVE + rootkit scanning — detect known vulnerabilities and malware

Architecture

  • 63 MCP tools across 16 modules
  • 8 connection adapters (SSH, Local, Docker, WinRM, K8s, AWS SSM, GCloud, Azure)
  • 15 SQLite tables (health, services, endpoints, alerts, audit, baselines, playbooks, users, logs, SLOs, file hashes, network, maintenance, API tests, dependencies)
  • Background watchdog with email/Slack/Discord/PagerDuty/Telegram/OpsGenie alerts
  • Live web dashboard (Starlette + Chart.js)
  • Statistical anomaly detection engine
  • Auto-remediation playbook engine
  • Access log APM parser
  • CIS benchmark + CVE scanner
  • Database monitoring (MySQL + PostgreSQL)
  • Network monitoring with bandwidth tracking
  • SLO tracking with error budgets
  • Team RBAC (admin/operator/viewer)
  • Compliance report generator
  • Public status page generator

Requirements

  • Python 3.10+
  • mcp>=1.0.0
  • paramiko>=3.0.0
  • uvicorn>=0.27.0
  • starlette>=0.36.0

License

Proprietary — Copyright (c) 2026 Md Nazish Arman. All rights reserved.

Free for personal, non-commercial evaluation only. Commercial use, business use, or any revenue-generating use requires a paid license. See LICENSE for full terms.

Author

Md Nazish Arman

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

server_guardian_mcp-1.0.3.tar.gz (202.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

server_guardian_mcp-1.0.3-py3-none-any.whl (119.0 kB view details)

Uploaded Python 3

File details

Details for the file server_guardian_mcp-1.0.3.tar.gz.

File metadata

  • Download URL: server_guardian_mcp-1.0.3.tar.gz
  • Upload date:
  • Size: 202.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for server_guardian_mcp-1.0.3.tar.gz
Algorithm Hash digest
SHA256 6c94cd206f42da84ba5174dc4a062f3e4c2c96835d6d59baf6ccad5dc4564083
MD5 4f493a6aa9515a8cf46b467211a3fd6d
BLAKE2b-256 95c9fc09d967146ac170006963b50c409847619eca857c620bbb96d6593c94ec

See more details on using hashes here.

File details

Details for the file server_guardian_mcp-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for server_guardian_mcp-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 253990a77c4ce6b6cf216061f8f855b632c565509dfd4b9c3307f2bd03758397
MD5 140afdfa4dd2706d26c0e9bc260a4902
BLAKE2b-256 71970e05cbd24209c4de1327cf38e67b2e7f4e8d2bcb2cfbe140df169c193b49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page