MCP server to monitor and manage remote Linux servers via SSH. 63 tools: health checks, log search, APM, SLOs, anomaly detection, auto-remediation, live dashboard, CIS benchmarks, CVE scanning, database monitoring, compliance reports, team RBAC, PagerDuty/Telegram/OpsGenie.

These details have not been verified by PyPI

Project description

Server Guardian MCP

The most comprehensive server management MCP ever built. 63 tools, 8 connection types, 16 modules — log search, access log APM, SLO tracking, anomaly detection, auto-remediation playbooks, CIS benchmarks, CVE scanning, database monitoring, network monitoring, file integrity, live web dashboard, compliance reports, public status pages, team RBAC, PagerDuty/Telegram/OpsGenie — all through Claude. No agents. Just SSH.

"The AI SRE that lives in your terminal. SSH into any server, diagnose any problem, fix it automatically — all through a conversation with Claude. No agents. No SaaS bills. No PromQL."

Live Dashboard

python -m server_guardian_mcp dashboard           # start on port 8080
python -m server_guardian_mcp dashboard --port 9090

Real-time web UI with auto-refresh every 30 seconds. Dark theme, Chart.js charts for CPU/memory/disk trends, active alerts feed, incident timeline.

Why Server Guardian?

What you say to Claude	What happens
"Is my server okay?"	SSH in, check CPU/RAM/disk/temp, detect anomalies vs baseline
"Why is production slow?"	Check processes, disk, logs, access log APM, identify the bottleneck
"Search logs for OOM errors"	Index logs in SQLite, search with pattern detection, show error rates
"Show me endpoint latency"	Parse nginx access logs — p50/p95/p99 latency, error rates, slowest endpoints
"Are we meeting our SLOs?"	Track uptime/latency/error targets, calculate error budget remaining
"What happened overnight?"	Generate incident narrative from alerts, service events, playbook runs
"Fix it automatically"	Run playbooks: clear disk, restart services, renew SSL certs
"Run a security audit"	61 CIS benchmark checks + CVE scan + rootkit detection + FIM
"Generate a compliance report"	Branded HTML report with score (A-F) for SOC2/ISO prep
"How's the database?"	Slow query analysis, connection counts, replication lag, table sizes
"Am I overpaying?"	Rightsizing analysis: "CPU at 0.4%, memory at 7.7% — downsize to save 50%"
"What connects to what?"	Map service dependencies from active network connections
"Write the postmortem"	Auto-generate structured postmortem from incident timeline
"Create a status page"	Public-facing uptime page for customers (replaces $29/mo tools)

Benchmarks vs Alternatives

Feature	Server Guardian	ssh-mcp	mcp-ssh-manager	HomeButler
Total tools	63	2	37	20
Connection types	8	1	1	1
Log search + pattern detection	Yes	-	-	-
Access log APM (p50/p95/p99)	Yes	-	-	-
SLO tracking + error budgets	Yes	-	-	-
Smart anomaly detection	Yes	-	-	-
Auto-remediation playbooks	Yes	-	-	-
CIS benchmark (61 checks)	Yes	-	-	-
CVE scanning + rootkit detection	Yes	-	-	-
File integrity monitoring	Yes	-	-	-
Database monitoring (MySQL/PG)	Yes	-	-	-
Network bandwidth monitoring	Yes	-	-	-
Service dependency mapping	Yes	-	-	-
Root cause correlation	Yes	-	-	-
Resource rightsizing	Yes	-	-	-
Multi-step API tests	Yes	-	-	-
Maintenance windows	Yes	-	-	-
Public status page	Yes	-	-	-
AI postmortem generation	Yes	-	-	-
Live web dashboard (Chart.js)	Yes	-	-	-
Compliance report (SOC2/ISO)	Yes	-	-	-
Team RBAC (admin/operator/viewer)	Yes	-	-	-
PagerDuty / Telegram / OpsGenie	Yes	-	-	-
Background watchdog daemon	Yes	-	-	Yes
Email / Slack / Discord alerts	Yes	-	-	Yes
Multi-cloud (AWS/GCP/Azure)	Yes	-	-	-
Docker container management	Yes	-	Yes	Yes

Quick Install

Claude Code (recommended)

claude mcp add server-guardian -- uvx server-guardian-mcp

pip

pip install server-guardian-mcp
claude mcp add server-guardian -- python -m server_guardian_mcp

From source

pip install -e .
claude mcp add server-guardian -- python -m server_guardian_mcp

Setup (2 minutes)

1. Create your .env

cp .env.example .env

2. Add your servers

# SSH (most common)
SERVER_PROD=ssh,203.0.113.10,22,deploy,key,~/.ssh/prod_key,Production

# Local machine
SERVER_LOCAL=local,,,,,My Machine

# Docker / Kubernetes / AWS SSM / GCP / Azure / WinRM also supported

3. Auto-discover existing servers

"Discover my SSH servers" — reads ~/.ssh/config and shows ready-to-paste .env lines.

4. Add aliases (optional)

SERVER_ALIASES=prod:PROD,stg:STAGING,dev:DEV

All 63 Tools

Core Server Management (6)

Tool	What it does
`list_all_servers`	Show all servers with online/offline status and latency
`check_server_health`	Full snapshot: CPU, RAM, disk, swap, temp, load, top processes, network
`run_shell_commands`	Run one or more shell commands on any server
`run_shell_script`	Run multi-line bash scripts with shared variables
`fetch_system_logs`	Fetch dmesg/syslog/journal/auth/nginx/custom logs with grep filter
`list_running_processes`	Processes sorted by CPU or memory, with name filter

Service Management (5)

Tool	What it does
`manage_systemd_service`	Start/stop/restart/enable/disable/status/logs for any systemd service
`list_all_services`	List ALL systemd services, filter by running/failed/inactive
`find_failed_services`	Find every crashed/failed service in one call
`restart_failed_services`	Bulk restart failed services — pass names or "ALL_FAILED"
`watch_service_status`	Quick is-active + is-enabled check for specific services

Monitoring & Alerting (5)

Tool	What it does
`check_ssl_certificate`	SSL cert expiry, chain, issuer for any domain (no SSH)
`check_http_endpoint`	HTTP status, response time, headers for any URL (no SSH)
`monitor_server_health`	Health check + store in SQLite + auto-alert on thresholds
`monitor_endpoints`	Check HTTP/SSL targets + store + alert on failures
`get_active_alerts`	Show unresolved alerts grouped by severity

Log Search & APM (2)

Tool	What it does
`search_logs`	Index logs in SQLite, search with pattern detection, extract error rates
`analyze_access_logs`	Nginx/Apache APM — per-endpoint p50/p95/p99 latency, error rates, throughput, top IPs

SLO Tracking & Reporting (4)

Tool	What it does
`manage_slos`	Define uptime/latency/error rate targets, track compliance, error budgets
`generate_postmortem_tool`	Structured incident postmortem from alerts, services, playbook data
`generate_status_page_tool`	Public-facing status page for customers (replaces Better Stack $29/mo)
`get_weekly_report`	Weekly health summary for email or team review

Database Monitoring (2)

Tool	What it does
`query_database`	Run SQL queries on MySQL, PostgreSQL, or SQLite on any server
`monitor_database`	Slow queries, connections, replication lag, table sizes (MySQL/PostgreSQL auto-detected)

Network Monitoring (2)

Tool	What it does
`inspect_network`	Listening ports, active connections, interfaces, DNS, routing
`monitor_network`	Bandwidth per interface, connection states, TCP retransmissions, throughput rates

Security & Compliance (6)

Tool	What it does
`run_security_audit`	10-point security check (SSH, firewall, logins, updates, sudo)
`run_cis_benchmark`	61 CIS Linux Benchmark checks across filesystem, network, SSH, PAM, logging
`scan_vulnerabilities`	CVE scanning (package versions), rootkit detection, crypto miner detection
`check_file_integrity`	FIM — hash critical files (/etc/passwd, sshd_config, etc.), detect unauthorized changes
`manage_firewall`	UFW/iptables: status, allow, deny, delete rules, enable/disable
`generate_compliance_report_tool`	Branded HTML report with score (A-F), suitable for SOC2/ISO

Docker (2)

Tool	What it does
`list_docker_containers`	Containers with CPU, memory, network, block I/O stats
`fetch_docker_logs`	Container logs with grep filter and time range

Disk & Files (4)

Tool	What it does
`analyze_disk_usage`	Find largest items, files >100MB, inode usage
`read_remote_file`	Read files on server (tail/head/all) with metadata
`upload_file_to_server`	SFTP upload with size verification
`download_file_from_server`	SFTP download

Multi-Server (2)

Tool	What it does
`run_on_all_servers`	Same commands on multiple servers — pass ["ALL"] for all
`compare_across_servers`	Spot config drift: same command, side-by-side results

System Administration (4)

Tool	What it does
`manage_cron_jobs`	List, add, remove cron jobs on any server
`manage_users`	List users, user info, add SSH keys, list keys, who is logged in
`manage_packages`	List/install/remove/upgrade packages (apt, yum, dnf, apk auto-detected)
`manage_nginx`	Status, list sites, show config, test, reload, restart, access/error logs

Git Deploy (1)

Tool	What it does
`git_deploy`	Status, pull, log, branch, switch, stash, diff on server git repos

Discovery (1)

Tool	What it does
`discover_ssh_servers`	Auto-discover servers from ~/.ssh/config with ready-to-paste .env lines

Dashboard & Analytics (6)

Tool	What it does
`multi_server_dashboard`	One-call summary of ALL servers: health, CPU, RAM, disk, failed services
`get_monitoring_history`	Query health trends, service events, endpoint checks from SQLite
`get_incident_timeline`	Chronological event log for a server
`forecast_disk_usage`	Predict when disk will be full based on growth rate
`generate_html_dashboard`	Self-contained HTML status page — open in any browser
`resolve_alert`	Mark an alert as resolved

Intelligence & Automation (3)

Tool	What it does
`detect_anomalies_tool`	Statistical anomaly detection — flags metrics >2.5 sigma from baseline
`replay_incident`	Generate chronological narrative from alerts, service events, playbook runs
`manage_playbooks`	Auto-remediation: disk cleanup, service restart, SSL renewal, custom playbooks

Team & Integrations (3)

Tool	What it does
`team_manage`	RBAC user management: admin/operator/viewer roles with API keys
`check_integrations`	Status and test for PagerDuty, Telegram, OpsGenie
`live_dashboard_info`	How to start the live web dashboard and available API endpoints

Advanced Operations (5)

Tool	What it does
`run_api_test_tool`	Multi-step API tests with variable extraction and assertions
`manage_maintenance_windows`	Suppress alerts during planned work
`get_rightsizing_recommendations`	Identify over/under-provisioned resources to save costs
`map_service_dependencies`	Discover service topology from active network connections
`analyze_root_cause`	Correlate anomalies across metrics, services, alerts for root cause analysis

Access Log APM

80% of APM value with zero agent install. Parse nginx/Apache access logs for:

Tell Claude: "analyze access logs on PROD"

Per-endpoint latency percentiles (p50, p95, p99)
Error rates (4xx, 5xx) per endpoint
Throughput (requests per endpoint)
Slowest endpoints ranked
Status code breakdown
Top IPs by request volume
URL normalization (replaces IDs/UUIDs with placeholders)

Log Search & Pattern Detection

Tell Claude: "search logs on PROD for OOM" or "show me log patterns"

Fetches logs via SSH, indexes in SQLite for future searching
Pattern detection — clusters similar log lines, shows frequency
Error rate extraction (log-to-metrics)
Supports journal, syslog, auth, nginx, or any custom log path

SLO Tracking & Error Budgets

Tell Claude: "create an SLO for 99.9% uptime on PROD"
Tell Claude: "show me SLO status"

Define uptime, latency, or error rate targets
Track compliance from stored health/endpoint data
Calculate error budget remaining and burn rate
Configurable measurement windows (7d, 30d, 90d)

CIS Benchmark & Vulnerability Scanning

Tell Claude: "run CIS benchmark on PROD"
Tell Claude: "scan for vulnerabilities on PROD"

61 CIS Linux Benchmark checks across: filesystem, software updates, boot security, process hardening, network config, SSH, PAM, user management, logging, cron
CVE scanning — lists installed packages, checks for security updates
Rootkit detection — hidden processes, suspicious kernel modules, SUID files, crypto miners, suspicious cron jobs
File integrity monitoring — hashes critical files, alerts on unauthorized changes

Database Monitoring

Tell Claude: "monitor database on PROD"

MySQL: slow query log, connection stats, replication lag, table sizes, processlist
PostgreSQL: pg_stat_statements, connections, replication, table sizes, lock analysis, cache hit ratio
Auto-detects which database is installed

Network Monitoring

Tell Claude: "monitor network on PROD"

Bandwidth per interface (bytes/sec, Mbps)
Connection state tracking (ESTABLISHED, TIME_WAIT, CLOSE_WAIT)
TCP retransmission rates
Historical trends stored in SQLite

Resource Rightsizing

Tell Claude: "rightsizing recommendations for PROD"

Analyzes CPU, memory, disk usage over time
Identifies over-provisioned resources ("CPU at 0.4% — downsize from 16 to 8 cores")
Identifies under-provisioned resources ("Memory at 92% — upgrade RAM")
Cost savings estimates

Service Dependency Mapping

Tell Claude: "map dependencies on PROD"

Parses active TCP connections to discover what processes talk to what
Groups by process (nginx -> database:5432, app -> redis:6379)
Stored in SQLite for historical tracking

Root Cause Analysis

Tell Claude: "analyze root cause on PROD"

Correlates metric spikes with service failures and alerts
Detects cascading failure patterns
Identifies resource exhaustion as cause of service crashes
Temporal correlation across all monitoring data

Smart Anomaly Detection

Tell Claude: "detect anomalies on PROD"

Builds baselines per metric grouped by hour and day of week
Flags values >2.5 standard deviations from the mean
No ML dependencies — pure statistics from SQLite data

Auto-Remediation Playbooks

5 built-in playbooks:

Playbook	Trigger	Action
`disk_cleanup`	Disk > 90%	Clear journal, /tmp, old logs, package cache
`restart_failed_services`	Failed services detected	Restart each failed service
`high_memory_cleanup`	Memory > 95%	Drop filesystem caches
`high_cpu_investigation`	CPU load > 3x cores	Log top CPU consumers
`ssl_renewal`	SSL cert < 7 days	Run certbot renew, reload nginx

Custom playbooks: drop JSON files in ~/.server-guardian-mcp/playbooks/

Public Status Page

Tell Claude: "generate a status page"

Self-hosted uptime page for customers
Shows server and endpoint health
Active incidents section
Auto-refreshes every 60 seconds
Replaces Better Stack ($29/mo) and Instatus ($20/mo) — free

Multi-Step API Tests

Tell Claude: "test my API"

Chain API calls: login -> extract token -> call API with token -> verify response
Variable extraction from JSON responses
Assertions: status code, body content, response time
Save and re-run named tests

Maintenance Windows

Tell Claude: "create maintenance window for PROD for 2 hours"

Suppress alerts during planned work
Configurable duration
List and delete windows

Compliance Reports

Tell Claude: "generate a compliance report for PROD"

Security score (0-100) with letter grade (A-F)
Detailed check results with pass/fail/warning badges
Active alerts section
Print-friendly, works in any browser
Suitable for SOC2/ISO prep and client deliverables

Team Mode (RBAC)

GUARDIAN_TEAM_MODE=true
GUARDIAN_API_KEY=sg_your_api_key_here

Role	Permissions
admin	Full access — all tools, user management
operator	Run commands, restart services, deploy — no user management
viewer	Read-only — view health, logs, alerts, dashboards

External Integrations

PAGERDUTY_ROUTING_KEY=your-routing-key
TELEGRAM_BOT_TOKEN=your-bot-token
TELEGRAM_CHAT_ID=your-chat-id
OPSGENIE_API_KEY=your-api-key

Background Watchdog

Runs independently of Claude — no AI, no API cost. Monitors 24/7 and sends alerts via email, Slack, Discord.

python -m server_guardian_mcp watchdog           # run forever
python -m server_guardian_mcp watchdog --once    # run one cycle

Alert thresholds

Condition	Severity
Disk > 90%	Critical
Disk > 80%	Warning
CPU load > 2x cores	Warning
Temperature > 85C	Warning
Server unreachable	Critical
Failed services	Warning
HTTP endpoint down	Critical
SSL cert < 7 days	Critical
SSL cert < 30 days	Warning

Connection Types

Type	Connects to	Requires
`ssh`	Linux/Mac servers	paramiko (included)
`local`	Your own machine	nothing
`docker`	Docker containers	docker CLI
`winrm`	Windows servers	`pip install pywinrm`
`k8s`	Kubernetes pods	kubectl CLI
`aws-ssm`	AWS EC2 instances	aws CLI
`gcloud`	GCP Compute Engine	gcloud CLI
`azure`	Azure VMs	az CLI

Security

Command blocklist — blocks rm -rf, fork bombs, reverse shells
Sensitive file protection — blocks .pem, .key, .env, /etc/shadow
SQL safety — read-only by default
Read-only mode — GUARDIAN_MODE=readonly
Rate limiting — 30 calls/min per tool
Audit logging — all invocations logged with sensitive param redaction
Shell injection prevention — shlex.quote on all inputs
Output capped at 512KB per command
File integrity monitoring — detect unauthorized file changes
CIS benchmark compliance — 61 security checks
CVE + rootkit scanning — detect known vulnerabilities and malware

Architecture

63 MCP tools across 16 modules
8 connection adapters (SSH, Local, Docker, WinRM, K8s, AWS SSM, GCloud, Azure)
15 SQLite tables (health, services, endpoints, alerts, audit, baselines, playbooks, users, logs, SLOs, file hashes, network, maintenance, API tests, dependencies)
Background watchdog with email/Slack/Discord/PagerDuty/Telegram/OpsGenie alerts
Live web dashboard (Starlette + Chart.js)
Statistical anomaly detection engine
Auto-remediation playbook engine
Access log APM parser
CIS benchmark + CVE scanner
Database monitoring (MySQL + PostgreSQL)
Network monitoring with bandwidth tracking
SLO tracking with error budgets
Team RBAC (admin/operator/viewer)
Compliance report generator
Public status page generator

Requirements

Python 3.10+
mcp>=1.0.0
paramiko>=3.0.0
uvicorn>=0.27.0
starlette>=0.36.0

License

Free for personal, non-commercial evaluation only. Commercial use, business use, or any revenue-generating use requires a paid license. See LICENSE for full terms.

Author

Md Nazish Arman

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.5

Apr 15, 2026

1.0.4

Apr 14, 2026

This version

1.0.3

Apr 14, 2026

1.0.2

Apr 14, 2026

0.2.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

server_guardian_mcp-1.0.3.tar.gz (202.6 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

server_guardian_mcp-1.0.3-py3-none-any.whl (119.0 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file server_guardian_mcp-1.0.3.tar.gz.

File metadata

Download URL: server_guardian_mcp-1.0.3.tar.gz
Upload date: Apr 14, 2026
Size: 202.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for server_guardian_mcp-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`6c94cd206f42da84ba5174dc4a062f3e4c2c96835d6d59baf6ccad5dc4564083`
MD5	`4f493a6aa9515a8cf46b467211a3fd6d`
BLAKE2b-256	`95c9fc09d967146ac170006963b50c409847619eca857c620bbb96d6593c94ec`

See more details on using hashes here.

File details

Details for the file server_guardian_mcp-1.0.3-py3-none-any.whl.

File metadata

Download URL: server_guardian_mcp-1.0.3-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 119.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for server_guardian_mcp-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`253990a77c4ce6b6cf216061f8f855b632c565509dfd4b9c3307f2bd03758397`
MD5	`140afdfa4dd2706d26c0e9bc260a4902`
BLAKE2b-256	`71970e05cbd24209c4de1327cf38e67b2e7f4e8d2bcb2cfbe140df169c193b49`

See more details on using hashes here.

server-guardian-mcp 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Server Guardian MCP

Live Dashboard

Why Server Guardian?

Benchmarks vs Alternatives

Quick Install

Claude Code (recommended)

pip

From source

Setup (2 minutes)

1. Create your .env

2. Add your servers

3. Auto-discover existing servers

4. Add aliases (optional)

All 63 Tools

Core Server Management (6)

Service Management (5)

Monitoring & Alerting (5)

Log Search & APM (2)

SLO Tracking & Reporting (4)

Database Monitoring (2)

Network Monitoring (2)

Security & Compliance (6)

Docker (2)

Disk & Files (4)

Multi-Server (2)

System Administration (4)

Git Deploy (1)

Discovery (1)

Dashboard & Analytics (6)

Intelligence & Automation (3)

Team & Integrations (3)

Advanced Operations (5)

Access Log APM

Log Search & Pattern Detection

SLO Tracking & Error Budgets

CIS Benchmark & Vulnerability Scanning

Database Monitoring

Network Monitoring

Resource Rightsizing

Service Dependency Mapping

Root Cause Analysis

Smart Anomaly Detection

Auto-Remediation Playbooks

Public Status Page

Multi-Step API Tests

Maintenance Windows

Compliance Reports

Team Mode (RBAC)

External Integrations

Background Watchdog

Alert thresholds

Connection Types

Security

Architecture

Requirements

License

Author

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata