A Model Context Protocol (MCP) server for querying logs from multiple observability platforms (New Relic, Azure)
Project description
MCP Observability Server
A Model Context Protocol (MCP) server that enables Claude to query logs from multiple observability platforms simultaneously. Perfect for SRE workflows, incident investigation, and distributed tracing.
Supported Platforms
- New Relic - Query logs using NRQL
- Azure Application Insights - Query logs using Kusto Query Language (KQL)
Features
- ๐ Unified Search - Search across all platforms with a single query
- ๐ฏ Severity Filtering - Filter by log levels (debug, info, warning, error, critical)
- ๐ Distributed Tracing - Find all logs related to a trace ID across platforms
- โก Concurrent Queries - Queries all providers in parallel for fast results
- ๐ Recent Errors - Quick access to recent error logs across all systems
- ๐ฅ Health Checks - Verify connectivity to all configured providers
- ๐ค Guided Workflows - Pre-built prompts for incident investigation, deployment validation, and root cause analysis
Installation
From PyPI
pip install mcp-observability-server
From Source
git clone https://github.com/yourusername/mcp-observability-server.git
cd mcp-observability-server
pip install -e .
Configuration
1. Create Configuration File
Copy the example configuration:
cp config.yaml.example config.yaml
Edit config.yaml with your credentials:
providers:
newrelic:
enabled: true
api_key: ${NEW_RELIC_API_KEY}
account_id: "1234567"
region: "US"
azure:
enabled: true
workspace_id: ${AZURE_WORKSPACE_ID}
client_id: ${AZURE_CLIENT_ID}
client_secret: ${AZURE_CLIENT_SECRET}
tenant_id: ${AZURE_TENANT_ID}
2. Set Environment Variables
Copy and configure environment variables:
cp .env.example .env
Edit .env with your actual credentials.
3. Configure Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"observability": {
"command": "python",
"args": ["-m", "mcp_observability.server", "/path/to/config.yaml"]
}
}
}
Usage
Once configured, you can ask Claude to query your logs:
Example Queries
Search for errors in the last hour:
Show me all errors from the last hour across all platforms
Search specific text:
Find logs containing "timeout" from the last 30 minutes
Filter by service:
Show me warning and error logs from the api-gateway service in the last 2 hours
Distributed tracing:
Find all logs related to trace ID abc123-def456
Recent errors:
What errors have occurred in the last 15 minutes?
Available Tools
The server exposes these tools to Claude:
query_logs
Search logs across all platforms with flexible filtering.
Parameters:
start_time(required) - ISO format or relative (e.g., "1h", "30m", "2d")end_time(optional) - Defaults to nowquery(optional) - Text to search forseverity(optional) - Array of severity levelsservice_name(optional) - Filter by servicelimit(optional) - Max results (default: 100)providers(optional) - Specific providers to query
get_recent_errors
Quick access to recent error and critical logs.
Parameters:
minutes(optional) - Look back period (default: 60)limit(optional) - Max results per provider (default: 100)service_name(optional) - Filter by service
search_by_trace_id
Find all logs associated with a distributed trace.
Parameters:
trace_id(required) - The trace ID to search forstart_time(optional) - Defaults to 24 hours agoend_time(optional) - Defaults to now
health_check
Verify connectivity to all configured providers.
Guided Workflows (Prompts)
The server provides guided prompts for common SRE workflows. Prompts chain multiple tools together and provide structured analysis frameworks.
investigate-incident
Systematic incident investigation workflow.
Use for: Active production incidents requiring thorough investigation
Parameters:
service_name(optional) - Service to investigatetime_period(default: "1h") - Investigation time windowseverity_threshold(default: "error") - Minimum severity
Example:
Use the investigate-incident prompt for api-gateway service
Workflow: Recent errors โ Pattern analysis โ Trace investigation โ Health checks โ Summary with recommendations
health-check-report
Generate comprehensive health status report.
Use for: Daily health checks, system status overviews
Parameters:
time_period(default: "24h") - Error statistics periodinclude_metrics(default: true) - Include detailed metrics
Example:
Generate a health check report
Workflow: Provider health โ Error analysis โ Service catalog โ Active traces โ Recommendations
post-deployment-check
Validate deployment health by comparing before/after metrics.
Use for: Post-deployment validation, CI/CD pipelines
Parameters:
service_name(required) - Deployed service namedeployment_time(optional) - When deployment occurredlookback_minutes(default: 30) - Baseline comparison period
Example:
Run a post-deployment check for user-service
Workflow: Current errors โ Baseline comparison โ New error detection โ Trace analysis โ Health recommendation (PROCEED/MONITOR/ROLLBACK)
trace-flow-analysis
Analyze distributed trace execution flow and timing.
Use for: Debugging distributed systems, understanding request flow
Parameters:
trace_id(required) - Trace ID to analyzeinclude_timing(default: true) - Include timing breakdown
Example:
Analyze trace flow for abc123-def456
Workflow: Timeline construction โ Service chain mapping โ Timing analysis โ Error detection โ Bottleneck identification โ Root cause
root-cause-analysis
Deep root cause investigation for complex failures.
Use for: Finding originating causes, cascading failure analysis
Parameters:
trace_id(optional) - Specific trace to investigateerror_pattern(optional) - Known error patterntime_window(default: "1h") - Investigation window
Example:
Perform root cause analysis for error "database connection timeout"
Workflow: Evidence gathering โ Timeline building โ Trace flow โ Pattern recognition โ Root cause formulation โ Prevention recommendations
See Prompts README for detailed documentation.
Platform-Specific Configuration
New Relic
- Create an API key in New Relic (User > API Keys)
- Find your account ID in the URL or account dropdown
- Choose region: "US" or "EU"
Azure Application Insights
- Create a service principal in Azure AD
- Grant "Log Analytics Reader" role to the service principal
- Note the workspace ID, client ID, client secret, and tenant ID
Development
Setup Development Environment
# Clone repository
git clone https://github.com/yourusername/mcp-observability-server.git
cd mcp-observability-server
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/
ruff check src/
Testing with MCP Inspector
Test the server interactively using the MCP Inspector:
npx @modelcontextprotocol/inspector \
uv \
--directory /home/gagan/mcp-observability-server \
run \
mcp-observability \
/home/gagan/mcp-observability-server/config.yaml
Or using the Python module directly:
npx @modelcontextprotocol/inspector \
uv \
--directory /home/gagan/mcp-observability-server \
run \
python \
-m \
mcp_observability.server \
/home/gagan/mcp-observability-server/config.yaml
Project Structure
mcp-observability-server/
โโโ src/
โ โโโ mcp_observability/
โ โโโ __init__.py
โ โโโ server.py # Main MCP server
โ โโโ models.py # Data models
โ โโโ utils.py # Utilities
โ โโโ prompts/ # Guided workflow prompts
โ โ โโโ __init__.py
โ โ โโโ incident.py # Incident investigation prompts
โ โ โโโ health.py # Health monitoring prompts
โ โ โโโ deployment.py # Deployment validation prompts
โ โ โโโ trace_analysis.py # Trace flow analysis prompts
โ โ โโโ README.md # Prompts documentation
โ โโโ providers/
โ โโโ base.py # Abstract base
โ โโโ newrelic.py
โ โโโ azure.py
โ
โ
โโโ tests/
โโโ config.yaml.example
โโโ .env.example
โโโ pyproject.toml
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=mcp_observability
# Run specific test file
pytest tests/test_providers.py
Logging
The server includes comprehensive logging to help with debugging and monitoring:
Configure Log Level:
Set the MCP_LOG_LEVEL environment variable:
# In your .env file or environment
export MCP_LOG_LEVEL=DEBUG # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL
Log Levels:
DEBUG- Detailed diagnostic information (queries, parameters, API calls)INFO- General informational messages (default)WARNING- Warning messages for potential issuesERROR- Error messages for failuresCRITICAL- Critical issues that prevent operation
What Gets Logged:
- Server initialization and configuration loading
- Provider initialization and health checks
- Tool invocations with parameters
- Query execution and results
- API calls to observability platforms
- Errors and exceptions with stack traces
Example Log Output:
2026-02-13 10:30:15 - mcp_observability.server - INFO - Starting MCP Observability Server
2026-02-13 10:30:15 - mcp_observability.utils - INFO - Loading config from: config.yaml
2026-02-13 10:30:15 - mcp_observability.providers.newrelic - INFO - New Relic provider initialized for region: US
2026-02-13 10:30:20 - mcp_observability.server - INFO - Tool called: query_logs
2026-02-13 10:30:21 - mcp_observability.providers.newrelic - INFO - New Relic query returned 42 log(s)
Troubleshooting
Common Issues
"Provider unhealthy" in health check
- Verify credentials are correct in config.yaml
- Check environment variables are set
- Ensure network connectivity to provider API
"No logs found"
- Verify time range includes the period you're interested in
- Check that log groups/workspaces are configured correctly
- Ensure services are actually logging during the time period
AWS credentials error
- If using IAM role, ensure instance has correct permissions
- If using access keys, verify they're correct in .env
- Check AWS region matches where your logs are
Timeout errors
- Increase timeout_seconds in provider config
- Reduce limit to fetch fewer results
- Check network connectivity
Enable debug logging
- Set
MCP_LOG_LEVEL=DEBUGin your environment or .env file - Check logs for detailed query information and API responses
- Review stack traces for error details
Performance Tips
- Specify log groups - specify exact log groups instead of querying all
- Use time ranges wisely - Shorter time ranges return faster
- Limit results - Start with smaller limits and increase if needed
- Filter by service - Reduces data scanned across all platforms
- Use recent_errors - Optimized query for error investigation
Security Best Practices
- Never commit credentials - Use environment variables or secrets manager
- Rotate keys regularly - Set up key rotation for all platforms
- Principle of least privilege - Grant only read permissions needed
- Audit access - Monitor who's using the MCP server
- Secure config files - Restrict file permissions on config.yaml
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
MIT License - see LICENSE file for details
Support
- Issues: https://github.com/yourusername/mcp-observability-server/issues
- Discussions: https://github.com/yourusername/mcp-observability-server/discussions
- Documentation: https://docs.example.com/mcp-observability
Roadmap
- Support for more providers (Splunk, Elastic, Grafana Loki)
- Advanced query builders
- Log analytics and pattern detection
- Alerting integration
- Performance metrics collection
- Custom query templates
- Multi-account support per provider
Acknowledgments
Built with the Model Context Protocol by Anthropic.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_observability_server-0.1.1.tar.gz.
File metadata
- Download URL: mcp_observability_server-0.1.1.tar.gz
- Upload date:
- Size: 132.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a92670d52d153751dd8c25a791ccdfab372e2c9d42332d8fd485b1e164f14007
|
|
| MD5 |
c8e0f612d116f900d75143bf4c8f6743
|
|
| BLAKE2b-256 |
d4ec7f054e656fbd32c6af3e233d80e0e6d7a12ad704aafa7238ccf3bb6b453b
|
File details
Details for the file mcp_observability_server-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mcp_observability_server-0.1.1-py3-none-any.whl
- Upload date:
- Size: 54.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
128ad4ad22be0dccdfd939032028894461ff4ad0e2d21d86d1b4e699c8a4d880
|
|
| MD5 |
1cd9f59386a289a21be4c2ac675a3a08
|
|
| BLAKE2b-256 |
692e05f58166fec4c792b04843657a54900ea5c7376fe4e67583cd9fc2fa4b11
|