IPMI/BMC Server Monitoring with AI-Powered Insights
Project description
IPMI Monitor
Free, self-hosted IPMI/BMC monitoring for your server fleet. Collect System Event Logs (SEL), monitor sensors, track ECC errors, and get alerts - all from a beautiful web dashboard.
๐ธ Screenshots
Event Log - Track SEL events |
Live Sensors - Temperature, fans, voltage |
Hardware Inventory - CPU, Memory, Storage |
System Logs - SSH-based dmesg, syslog, journalctl |
โจ Features
- ๐ Event Collection - Automatically collect IPMI SEL logs (parallel, 32 workers)
- ๐ Real-time Dashboard - Auto-refreshing every second with server status cards
- ๐ก๏ธ Sensor Monitoring - Temperature, fan, voltage, power readings
- ๐พ ECC Memory Tracking - Identify which DIMM has errors
- ๐ฎ GPU Health Monitoring - Detect NVIDIA GPU errors via SSH (Xid errors)
- ๐ SSH System Logs - Collect dmesg, journalctl, syslog, mcelog, Docker daemon logs via SSH
- ๐ณ Docker Log Collection - Monitor Docker daemon errors (storage-opt, overlay, pquota issues)
- ๐ง Hardware Error Detection - AER, PCIe, ECC errors parsed automatically
- ๐ Uptime & Reboot Detection - Track unexpected server reboots
- ๐จ Alert Rules - Configurable alerts with email, Telegram, webhooks
- โ Alert Resolution - Notifications when issues are resolved
- โฑ๏ธ Alert Confirmation - Threshold checks to avoid false positives
- ๐ Prometheus Metrics - Native
/metricsendpoint for Grafana - ๐ User Management - Admin and read-only access levels
- ๐ฅ Full Backup/Restore - Export everything: servers, credentials, SSH keys, alerts
- ๐ณ Docker Ready - Multi-arch images (amd64/arm64)
- ๐ Version Display - Shows version, git commit, and build time in header
- โฌ๏ธ Update Notifications - Checks GitHub for newer releases
- ๐ง Bulk Credentials - Apply SSH/IPMI credentials to multiple servers at once
- ๐ BMC Reset - Cold/warm reset BMC without affecting host OS
- ๐ค Optional AI Features - Enable AI-powered insights via Settings โ AI Features
๐ Quick Start
One Command Setup โก
Ubuntu 24.04+ / Python 3.12+ (uses pipx):
sudo apt install pipx -y
pipx install ipmi-monitor
pipx ensurepath && source ~/.bashrc
sudo ipmi-monitor quickstart
Ubuntu 22.04 / Python 3.10 (direct pip):
pip install ipmi-monitor
sudo ipmi-monitor quickstart
Alternative (if you get "externally-managed-environment" error):
pip install ipmi-monitor --break-system-packages
sudo ipmi-monitor quickstart
That's it! Answer a few questions:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ IPMI Monitor - Quick Setup โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Detected: my-server (192.168.1.100)
Step 1: Add Server to Monitor
Server name: gpu-server-01
BMC IP address: 192.168.1.80
BMC username: ADMIN
BMC password: ******
โ IPMI connection successful
Add SSH access for detailed monitoring? [Y/n]: y
Server IP (for SSH): 192.168.1.81
SSH username: root
SSH password: ******
Step 2: Web Interface Settings
Web interface port: [5000]
Step 3: AI Features (Optional)
Enable AI Insights? [y/N]: n
Step 4: Starting IPMI Monitor
โ Configuration saved
โ Service installed and started
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ Setup Complete! โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Web Interface: http://192.168.1.100:5000
After Setup
# Add more servers
ipmi-monitor add-server --bmc-ip 192.168.1.82 --username admin
# Check status
ipmi-monitor status
# View logs
ipmi-monitor logs
Bulk Import (Many Servers)
Create a simple text file and paste when prompted:
Option 1: SSH only (no IPMI)
global:root,sshpassword
192.168.1.101
192.168.1.102
192.168.1.103
Option 2: SSH + IPMI (full monitoring)
globalSSH:root,sshpassword
globalIPMI:ADMIN,ipmipassword
192.168.1.101,192.168.1.80
192.168.1.102,192.168.1.82
192.168.1.103,192.168.1.84
Option 3: Per-server credentials
# serverIP,sshUser,sshPass,ipmiUser,ipmiPass,bmcIP
192.168.1.101,root,pass1,ADMIN,ipmi1,192.168.1.80
192.168.1.102,root,pass2,ADMIN,ipmi2,192.168.1.82
๐ Full Datacenter Suite
For complete GPU datacenter monitoring, combine with DC Overview:
# On master server - install both tools
pip install dc-overview ipmi-monitor
# dc-overview: Grafana + Prometheus + GPU metrics
sudo dc-overview quickstart
# ipmi-monitor: BMC/IPMI health + SEL logs + AI insights
sudo ipmi-monitor quickstart
| Tool | What it monitors |
|---|---|
| dc-overview | GPU utilization, temperature, power, CPU, RAM, disk |
| ipmi-monitor | BMC health, SEL events, ECC errors, sensors, system logs |
CLI Commands
ipmi-monitor setup # Interactive setup wizard
ipmi-monitor run # Start web interface
ipmi-monitor run --port 8080 # Custom port
ipmi-monitor daemon # Run as daemon (for systemd)
ipmi-monitor status # Show status and config
ipmi-monitor add-server # Add a server interactively
ipmi-monitor list-servers # List configured servers
Option 2: Docker Compose
For containerized deployments or if you prefer Docker:
Step 1: Create project directory
mkdir ipmi-monitor && cd ipmi-monitor
Step 2: Create docker-compose.yml:
version: '3.8'
services:
ipmi-monitor:
image: ghcr.io/cryptolabsza/ipmi-monitor:latest
container_name: ipmi-monitor
restart: unless-stopped
ports:
- "5000:5000"
environment:
- APP_NAME=My Server Fleet # Customize this
- IPMI_USER=admin
- IPMI_PASS=YourIPMIPassword # Your BMC password
- ADMIN_PASS=changeme # CHANGE THIS!
- SECRET_KEY=change-this-to-random-string
volumes:
- ipmi_data:/app/data # โ ๏ธ IMPORTANT: Persists your data!
labels:
- "com.centurylinklabs.watchtower.enable=true" # Enable auto-updates
volumes:
ipmi_data:
Step 3: Start the service
docker-compose up -d
Step 4: Open http://localhost:5000 and add your servers!
Option 3: Docker Run
# Create a named volume for data persistence
docker volume create ipmi_data
# Run the container
docker run -d \
--name ipmi-monitor \
--label com.centurylinklabs.watchtower.enable=true \
-p 5000:5000 \
-e IPMI_USER=admin \
-e IPMI_PASS=YourIPMIPassword \
-e ADMIN_PASS=YourAdminPassword \
-e SECRET_KEY=your-random-secret-key \
-v ipmi_data:/app/data \
--restart unless-stopped \
ghcr.io/cryptolabsza/ipmi-monitor:latest
โ ๏ธ Important: Data Persistence
Always use a named volume to preserve your data across container updates:
# โ
CORRECT - Named volume (survives updates)
volumes:
- ipmi_data:/app/data
# โ WRONG - No volume (data lost on rebuild)
# (no volume specified)
๐ Configuration File Reference
servers.yaml
servers:
- name: GPU-Server-01 # Display name
bmc_ip: 192.168.1.80 # BMC/IPMI IP (required)
username: admin # BMC username
password: ipmi-password # BMC password
protocol: auto # auto, ipmi, or redfish
# Optional: SSH for system logs
server_ip: 192.168.1.81 # Server OS IP
ssh_user: root
ssh_port: 22
ssh_password: ssh-password # Or use ssh_key
ssh_key: ~/.ssh/id_rsa # Path to SSH private key
config.yaml
settings:
web_port: 5000
refresh_interval: 60 # Seconds between collections
enable_prometheus: true # /metrics endpoint
ai:
enabled: false # Enable AI features
license_key: sk-ipmi-xxxx # CryptoLabs license key
๐ Keeping Up to Date
pip install
pip install --upgrade ipmi-monitor
sudo systemctl restart ipmi-monitor
Docker Manual Update
# Pull the latest image
docker pull ghcr.io/cryptolabsza/ipmi-monitor:latest
# Recreate the container (preserves data volume)
docker-compose up -d
Automatic Updates with Watchtower (Docker)
Add Watchtower to your docker-compose.yml:
services:
ipmi-monitor:
# ... your existing config ...
labels:
- "com.centurylinklabs.watchtower.enable=true"
watchtower:
image: containrrr/watchtower
container_name: watchtower
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- WATCHTOWER_CLEANUP=true
- WATCHTOWER_POLL_INTERVAL=300 # Check every 5 minutes
command: --label-enable # Only update labeled containers
| Tag | Description |
|---|---|
:latest |
Latest stable release (recommended) |
:develop |
Development builds (testing new features) |
:v1.0.3 |
Specific version (pin for stability) |
๐ Troubleshooting
Container won't start
# Check logs
docker logs ipmi-monitor
# Common issues:
# - Port 5000 already in use: Change port mapping to "5001:5000"
# - Permission denied: Ensure docker socket access
Can't connect to BMC
# Test from the container
docker exec ipmi-monitor ipmitool -I lanplus -H 192.168.1.80 -U admin -P password power status
# Common issues:
# - Wrong IP address (use BMC IP, not server OS IP)
# - Firewall blocking port 623 (IPMI)
# - Wrong credentials
SSH inventory collection fails
# Test SSH from container
docker exec ipmi-monitor ssh -o StrictHostKeyChecking=no root@192.168.1.81 hostname
# Common issues:
# - SSH key not added to container (add via Settings โ SSH Keys)
# - Server IP not set (only BMC IP configured)
# - Firewall blocking port 22
Data disappeared after update
Your volume name must match! Check with:
docker volume ls | grep ipmi
If you see multiple volumes (e.g., ipmi_data and ipmi-monitor_ipmi_data), you may have used different names. Restore by:
docker stop ipmi-monitor
docker run --rm -v OLD_VOLUME:/from -v NEW_VOLUME:/to alpine cp -av /from/. /to/
โ๏ธ Environment Variables (Docker)
| Variable | Default | Description |
|---|---|---|
APP_NAME |
IPMI Monitor | Displayed in header |
IPMI_USER |
admin | Default BMC username |
IPMI_PASS |
(required) | Default BMC password |
IPMI_PASS_NVIDIA |
- | Separate password for NVIDIA DGX BMCs (16-char requirement) |
ADMIN_USER |
admin | Dashboard admin username |
ADMIN_PASS |
changeme | Dashboard admin password (change this!) |
SECRET_KEY |
(auto) | Flask session secret (set this for persistent sessions!) |
POLL_INTERVAL |
300 | Seconds between collections |
DATA_RETENTION_DAYS |
30 | How long to keep events |
SSH_USER |
root | Default SSH username for system log collection |
SSH_PASS |
- | Default SSH password (or use SSH keys) |
๐ง Setting Up SSH for Enhanced Monitoring
SSH access enables powerful features:
- System Logs - dmesg, journalctl, syslog, Docker daemon logs
- Hardware Inventory - Detailed CPU, DIMM, GPU, NIC, storage info
- GPU Monitoring - NVIDIA Xid errors, driver version, CUDA version
- Uptime Tracking - Detect unexpected reboots
Option 1: SSH Keys (Recommended)
- Go to Settings โ SSH Keys
- Click Add SSH Key
- Paste your private key content (from
~/.ssh/id_rsaor similar) - Give it a name (e.g., "datacenter-key")
- In Settings โ Servers, assign the key to each server
Option 2: SSH Password
- Go to Settings โ Defaults
- Enter your SSH username and password
- Click Apply to All Servers
Important: Server IP vs BMC IP
- BMC IP (e.g.,
192.168.1.80) - IPMI/Redfish management interface - Server IP (e.g.,
192.168.1.81) - The actual OS/SSH interface
When adding a server, set both IPs:
- BMC IP: For IPMI/Redfish event collection
- Server IP: For SSH-based inventory and logs
๐ฎ GPU Monitoring (NVIDIA)
IPMI Monitor can detect and monitor NVIDIA GPUs via SSH:
- GPU Count & Models - Detected via
nvidia-smi - Driver & CUDA Version - For compatibility tracking
- Xid Errors - Parsed from dmesg/syslog (GPU failures, ECC errors)
- PCIe Health - AER/correctable/uncorrectable errors
Collecting GPU Inventory
- Ensure SSH is configured for the server
- Go to server detail page
- Click Collect Inventory
- GPU info appears under ๐ฎ GPU section
๐ Detailed DIMM Inventory
For servers with Redfish or SSH access, IPMI Monitor collects per-DIMM details:
- Slot/Locator (e.g., A1, B2)
- Manufacturer (Samsung, SK Hynix, Micron, etc.)
- Part Number
- Size (32 GB, 64 GB)
- Speed (Configured vs Rated - highlights if running slower)
This helps identify:
- Mixed memory configurations
- Under-clocked DIMMs
- Which slot has ECC errors
๐ค AI Features (Optional)
IPMI Monitor can integrate with the CryptoLabs AI service for:
- Fleet Summary - AI-generated daily analysis
- Predictive Maintenance - Identify failing components
- Root Cause Analysis - Correlate events across servers
- Task Generation - Prioritized maintenance tasks
Enabling AI Features
- Go to Settings โ AI Features
- Get an API key from cryptolabs.co.za/my-account
- Enter the key and click Enable
๐ AI features are optional - IPMI Monitor works fully offline without them.
๐ IPMI vs Redfish
IPMI Monitor supports both protocols and auto-detects which to use:
| Feature | IPMI/ipmitool | Redfish |
|---|---|---|
| Event Collection | โ SEL logs | โ Log Service |
| Sensor Readings | โ SDR | โ Chassis/Thermal |
| Power Control | โ | โ |
| Inventory | Basic FRU | โ Rich metadata |
| Memory Details | - | โ Per-DIMM info |
| Supported BMCs | All | Dell iDRAC, HPE iLO, Supermicro, Lenovo |
Forcing a Protocol
By default, IPMI Monitor auto-detects. To force a specific protocol:
- Go to Settings โ Servers
- Click Edit on a server
- Set Protocol to
ipmiorredfish
๐จ Alert Configuration
IPMI Monitor can send alerts via multiple channels:
Notification Methods
| Method | Setup |
|---|---|
| Settings โ Alerts โ SMTP configuration | |
| Telegram | Settings โ Alerts โ Bot token + Chat ID |
| Webhook | Settings โ Alerts โ Custom URL for Slack, Discord, etc. |
Alert Rules
Create rules to trigger on specific conditions:
- Event Type - SEL event categories (Temperature, Memory, Fan, etc.)
- Severity - Critical, Warning, or both
- Server Filter - All servers or specific ones
- Keyword Match - Filter by event description
Alert Features
- Confirmation Period - Wait N minutes before alerting (avoid false positives)
- Resolution Alerts - Get notified when issues are resolved
- Rate Limiting - Prevent alert floods
๐ API Reference
Public Endpoints
| Endpoint | Description |
|---|---|
GET / |
Dashboard |
GET /api/servers |
List servers |
GET /api/events |
Get events (filterable) |
GET /api/stats |
Dashboard stats |
GET /api/sensors/{bmc_ip} |
Sensor readings |
GET /metrics |
Prometheus metrics |
GET /health |
Health check |
GET /api/version |
Current version info |
GET /api/version/check |
Check for updates |
POST /api/server/{bmc_ip}/bmc/{action} |
BMC reset (cold/warm/info) |
GET /api/server/{bmc_ip}/ssh-logs |
Get SSH system logs |
Admin Endpoints (login required)
| Endpoint | Description |
|---|---|
POST /api/collect |
Trigger collection |
POST /api/servers/add |
Add server |
DELETE /api/servers/{bmc_ip} |
Delete server |
GET /api/backup |
Full configuration backup |
POST /api/restore |
Restore from backup |
๐ Security
IPMI Monitor is designed with security in mind for production datacenter environments:
Credential Protection
- No Command-Line Exposure - IPMI passwords use environment variables (
IPMI_PASSWORD), not-Pflags - SSH Key Isolation - SSH private keys stored in temporary files with 0600 permissions
- Password Masking - Passwords passed via
SSHPASSenvironment variable, not command line
Data Handling
- Local-First - All data stored locally in SQLite
- No Credential Sync - Credentials are never sent externally
Access Control
- Role-Based Access - Admin vs read-only user levels
- Session Management - Secure Flask sessions with configurable secret key
- API Authentication - Protected endpoints require authentication
Best Practices
environment:
- SECRET_KEY=your-random-32-char-key # Always set this!
- ADMIN_PASS=strong-unique-password # Change from default
๐ Password Recovery
IPMI Monitor is self-hosted - there's no central server to reset your password. Since you have root access, you can reset it directly:
# Quick password reset (run on your server)
docker exec -i ipmi-monitor python3 << 'EOF'
from werkzeug.security import generate_password_hash
import sqlite3
new_password = "your_new_password" # CHANGE THIS
conn = sqlite3.connect('/app/data/ipmi_monitor.db')
conn.execute("UPDATE user SET password_hash = ? WHERE username = 'admin'",
(generate_password_hash(new_password),))
conn.commit()
print(f"โ
Admin password updated!")
EOF
๐ See User Guide - Password Recovery for detailed instructions and a reusable script.
๐ ๏ธ Developer Guide
See DEVELOPER_GUIDE.md for:
- Git workflow (develop/main branches)
- Release process
- Docker tag conventions
- CI/CD pipeline details
๐ค Contributing
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open a Pull Request
๐ License
MIT License - see LICENSE for details.
๐ Links
- PyPI: pypi.org/project/ipmi-monitor
- GitHub: github.com/cryptolabsza/ipmi-monitor
- Docker Image: ghcr.io/cryptolabsza/ipmi-monitor
- Documentation: github.com/cryptolabsza/ipmi-monitor/docs
- Discord Community: Join our Discord - Get help, report issues, request features
Made with โค๏ธ by CryptoLabs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ipmi_monitor-1.0.6.tar.gz.
File metadata
- Download URL: ipmi_monitor-1.0.6.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e0bbe4d56874e50c56ef12fd9ff5f7f438cdcfa34f2e8aca8fd46ae4d93e621
|
|
| MD5 |
2c6297acb3942504bf40bc1d6031e685
|
|
| BLAKE2b-256 |
4e6508445484315a4bde30279880576b7565a91545705f6a80f72a27374e6a70
|
File details
Details for the file ipmi_monitor-1.0.6-py3-none-any.whl.
File metadata
- Download URL: ipmi_monitor-1.0.6-py3-none-any.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f1451a6cd5e88c9103b55947143110a3d98df54184a91fe05224413254f10f2
|
|
| MD5 |
f8e827fdcb8eaef98814bf5741968029
|
|
| BLAKE2b-256 |
82061e48b6a67c645114cb664d09612f449c1cb4d4ee5b1d196076867048f978
|