Skip to main content

A Model Context Protocol (MCP) server for querying logs from multiple observability platforms (New Relic, Azure)

Project description

MCP Observability Server

A Model Context Protocol (MCP) server that enables Claude to query logs from multiple observability platforms simultaneously. Perfect for SRE workflows, incident investigation, and distributed tracing.

Supported Platforms

  • New Relic - Query logs using NRQL
  • Azure Application Insights - Query logs using Kusto Query Language (KQL)

Features

  • ๐Ÿ” Unified Search - Search across all platforms with a single query
  • ๐ŸŽฏ Severity Filtering - Filter by log levels (debug, info, warning, error, critical)
  • ๐Ÿ”— Distributed Tracing - Find all logs related to a trace ID across platforms
  • โšก Concurrent Queries - Queries all providers in parallel for fast results
  • ๐Ÿ“Š Recent Errors - Quick access to recent error logs across all systems
  • ๐Ÿฅ Health Checks - Verify connectivity to all configured providers
  • ๐Ÿค– Guided Workflows - Pre-built prompts for incident investigation, deployment validation, and root cause analysis

Installation

From PyPI

pip install mcp-observability-server

From Source

git clone https://github.com/yourusername/mcp-observability-server.git
cd mcp-observability-server
pip install -e .

Configuration

1. Create Configuration File

Copy the example configuration:

cp config.yaml.example config.yaml

Edit config.yaml with your credentials:

providers:
  newrelic:
    enabled: true
    api_key: ${NEW_RELIC_API_KEY}
    account_id: "1234567"
    region: "US"
  
  azure:
    enabled: true
    workspace_id: ${AZURE_WORKSPACE_ID}
    client_id: ${AZURE_CLIENT_ID}
    client_secret: ${AZURE_CLIENT_SECRET}
    tenant_id: ${AZURE_TENANT_ID}

2. Set Environment Variables

Copy and configure environment variables:

cp .env.example .env

Edit .env with your actual credentials.

3. Configure Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "observability": {
      "command": "python",
      "args": ["-m", "mcp_observability.server", "/path/to/config.yaml"]
    }
  }
}

Usage

Once configured, you can ask Claude to query your logs:

Example Queries

Search for errors in the last hour:

Show me all errors from the last hour across all platforms

Search specific text:

Find logs containing "timeout" from the last 30 minutes

Filter by service:

Show me warning and error logs from the api-gateway service in the last 2 hours

Distributed tracing:

Find all logs related to trace ID abc123-def456

Recent errors:

What errors have occurred in the last 15 minutes?

Available Tools

The server exposes these tools to Claude:

query_logs

Search logs across all platforms with flexible filtering.

Parameters:

  • start_time (required) - ISO format or relative (e.g., "1h", "30m", "2d")
  • end_time (optional) - Defaults to now
  • query (optional) - Text to search for
  • severity (optional) - Array of severity levels
  • service_name (optional) - Filter by service
  • limit (optional) - Max results (default: 100)
  • providers (optional) - Specific providers to query

get_recent_errors

Quick access to recent error and critical logs.

Parameters:

  • minutes (optional) - Look back period (default: 60)
  • limit (optional) - Max results per provider (default: 100)
  • service_name (optional) - Filter by service

search_by_trace_id

Find all logs associated with a distributed trace.

Parameters:

  • trace_id (required) - The trace ID to search for
  • start_time (optional) - Defaults to 24 hours ago
  • end_time (optional) - Defaults to now

health_check

Verify connectivity to all configured providers.

Guided Workflows (Prompts)

The server provides guided prompts for common SRE workflows. Prompts chain multiple tools together and provide structured analysis frameworks.

investigate-incident

Systematic incident investigation workflow.

Use for: Active production incidents requiring thorough investigation
Parameters:

  • service_name (optional) - Service to investigate
  • time_period (default: "1h") - Investigation time window
  • severity_threshold (default: "error") - Minimum severity

Example:

Use the investigate-incident prompt for api-gateway service

Workflow: Recent errors โ†’ Pattern analysis โ†’ Trace investigation โ†’ Health checks โ†’ Summary with recommendations

health-check-report

Generate comprehensive health status report.

Use for: Daily health checks, system status overviews
Parameters:

  • time_period (default: "24h") - Error statistics period
  • include_metrics (default: true) - Include detailed metrics

Example:

Generate a health check report

Workflow: Provider health โ†’ Error analysis โ†’ Service catalog โ†’ Active traces โ†’ Recommendations

post-deployment-check

Validate deployment health by comparing before/after metrics.

Use for: Post-deployment validation, CI/CD pipelines
Parameters:

  • service_name (required) - Deployed service name
  • deployment_time (optional) - When deployment occurred
  • lookback_minutes (default: 30) - Baseline comparison period

Example:

Run a post-deployment check for user-service

Workflow: Current errors โ†’ Baseline comparison โ†’ New error detection โ†’ Trace analysis โ†’ Health recommendation (PROCEED/MONITOR/ROLLBACK)

trace-flow-analysis

Analyze distributed trace execution flow and timing.

Use for: Debugging distributed systems, understanding request flow
Parameters:

  • trace_id (required) - Trace ID to analyze
  • include_timing (default: true) - Include timing breakdown

Example:

Analyze trace flow for abc123-def456

Workflow: Timeline construction โ†’ Service chain mapping โ†’ Timing analysis โ†’ Error detection โ†’ Bottleneck identification โ†’ Root cause

root-cause-analysis

Deep root cause investigation for complex failures.

Use for: Finding originating causes, cascading failure analysis
Parameters:

  • trace_id (optional) - Specific trace to investigate
  • error_pattern (optional) - Known error pattern
  • time_window (default: "1h") - Investigation window

Example:

Perform root cause analysis for error "database connection timeout"

Workflow: Evidence gathering โ†’ Timeline building โ†’ Trace flow โ†’ Pattern recognition โ†’ Root cause formulation โ†’ Prevention recommendations

See Prompts README for detailed documentation.

Platform-Specific Configuration

New Relic

  1. Create an API key in New Relic (User > API Keys)
  2. Find your account ID in the URL or account dropdown
  3. Choose region: "US" or "EU"

Azure Application Insights

  1. Create a service principal in Azure AD
  2. Grant "Log Analytics Reader" role to the service principal
  3. Note the workspace ID, client ID, client secret, and tenant ID

Development

Setup Development Environment

# Clone repository
git clone https://github.com/yourusername/mcp-observability-server.git
cd mcp-observability-server

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/
ruff check src/

Testing with MCP Inspector

Test the server interactively using the MCP Inspector:

npx @modelcontextprotocol/inspector \
  uv \
  --directory /home/gagan/mcp-observability-server \
  run \
  mcp-observability \
  /home/gagan/mcp-observability-server/config.yaml

Or using the Python module directly:

npx @modelcontextprotocol/inspector \
  uv \
  --directory /home/gagan/mcp-observability-server \
  run \
  python \
  -m \
  mcp_observability.server \
  /home/gagan/mcp-observability-server/config.yaml

Project Structure

mcp-observability-server/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ mcp_observability/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ server.py           # Main MCP server
โ”‚       โ”œโ”€โ”€ models.py            # Data models
โ”‚       โ”œโ”€โ”€ utils.py             # Utilities
โ”‚       โ”œโ”€โ”€ prompts/             # Guided workflow prompts
โ”‚       โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚       โ”‚   โ”œโ”€โ”€ incident.py      # Incident investigation prompts
โ”‚       โ”‚   โ”œโ”€โ”€ health.py        # Health monitoring prompts
โ”‚       โ”‚   โ”œโ”€โ”€ deployment.py    # Deployment validation prompts
โ”‚       โ”‚   โ”œโ”€โ”€ trace_analysis.py # Trace flow analysis prompts
โ”‚       โ”‚   โ””โ”€โ”€ README.md        # Prompts documentation
โ”‚       โ””โ”€โ”€ providers/
โ”‚           โ”œโ”€โ”€ base.py          # Abstract base
โ”‚           โ”œโ”€โ”€ newrelic.py
โ”‚           โ”œโ”€โ”€ azure.py
โ”‚           
โ”‚           
โ”œโ”€โ”€ tests/
โ”œโ”€โ”€ config.yaml.example
โ”œโ”€โ”€ .env.example
โ””โ”€โ”€ pyproject.toml

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=mcp_observability

# Run specific test file
pytest tests/test_providers.py

Logging

The server includes comprehensive logging to help with debugging and monitoring:

Configure Log Level:

Set the MCP_LOG_LEVEL environment variable:

# In your .env file or environment
export MCP_LOG_LEVEL=DEBUG  # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL

Log Levels:

  • DEBUG - Detailed diagnostic information (queries, parameters, API calls)
  • INFO - General informational messages (default)
  • WARNING - Warning messages for potential issues
  • ERROR - Error messages for failures
  • CRITICAL - Critical issues that prevent operation

What Gets Logged:

  • Server initialization and configuration loading
  • Provider initialization and health checks
  • Tool invocations with parameters
  • Query execution and results
  • API calls to observability platforms
  • Errors and exceptions with stack traces

Example Log Output:

2026-02-13 10:30:15 - mcp_observability.server - INFO - Starting MCP Observability Server
2026-02-13 10:30:15 - mcp_observability.utils - INFO - Loading config from: config.yaml
2026-02-13 10:30:15 - mcp_observability.providers.newrelic - INFO - New Relic provider initialized for region: US
2026-02-13 10:30:20 - mcp_observability.server - INFO - Tool called: query_logs
2026-02-13 10:30:21 - mcp_observability.providers.newrelic - INFO - New Relic query returned 42 log(s)

Troubleshooting

Common Issues

"Provider unhealthy" in health check

  • Verify credentials are correct in config.yaml
  • Check environment variables are set
  • Ensure network connectivity to provider API

"No logs found"

  • Verify time range includes the period you're interested in
  • Check that log groups/workspaces are configured correctly
  • Ensure services are actually logging during the time period

AWS credentials error

  • If using IAM role, ensure instance has correct permissions
  • If using access keys, verify they're correct in .env
  • Check AWS region matches where your logs are

Timeout errors

  • Increase timeout_seconds in provider config
  • Reduce limit to fetch fewer results
  • Check network connectivity

Enable debug logging

  • Set MCP_LOG_LEVEL=DEBUG in your environment or .env file
  • Check logs for detailed query information and API responses
  • Review stack traces for error details

Performance Tips

  1. Specify log groups - specify exact log groups instead of querying all
  2. Use time ranges wisely - Shorter time ranges return faster
  3. Limit results - Start with smaller limits and increase if needed
  4. Filter by service - Reduces data scanned across all platforms
  5. Use recent_errors - Optimized query for error investigation

Security Best Practices

  1. Never commit credentials - Use environment variables or secrets manager
  2. Rotate keys regularly - Set up key rotation for all platforms
  3. Principle of least privilege - Grant only read permissions needed
  4. Audit access - Monitor who's using the MCP server
  5. Secure config files - Restrict file permissions on config.yaml

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Support

Roadmap

  • Support for more providers (Splunk, Elastic, Grafana Loki)
  • Advanced query builders
  • Log analytics and pattern detection
  • Alerting integration
  • Performance metrics collection
  • Custom query templates
  • Multi-account support per provider

Acknowledgments

Built with the Model Context Protocol by Anthropic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_observability_server-0.1.1.tar.gz (132.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_observability_server-0.1.1-py3-none-any.whl (54.6 kB view details)

Uploaded Python 3

File details

Details for the file mcp_observability_server-0.1.1.tar.gz.

File metadata

  • Download URL: mcp_observability_server-0.1.1.tar.gz
  • Upload date:
  • Size: 132.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mcp_observability_server-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a92670d52d153751dd8c25a791ccdfab372e2c9d42332d8fd485b1e164f14007
MD5 c8e0f612d116f900d75143bf4c8f6743
BLAKE2b-256 d4ec7f054e656fbd32c6af3e233d80e0e6d7a12ad704aafa7238ccf3bb6b453b

See more details on using hashes here.

File details

Details for the file mcp_observability_server-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_observability_server-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 128ad4ad22be0dccdfd939032028894461ff4ad0e2d21d86d1b4e699c8a4d880
MD5 1cd9f59386a289a21be4c2ac675a3a08
BLAKE2b-256 692e05f58166fec4c792b04843657a54900ea5c7376fe4e67583cd9fc2fa4b11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page