Skip to main content

Observability and APM toolkit for MCP servers

Project description

MCP Observatory

MCP Observatory now includes a two-phase execution pattern for high-risk MCP tool calls, with a generic proposer/verifier wrapper that can be reused by any tool:

  1. PROPOSE: plan/simulate, evaluate uncertainty/integrity, no side effects.
  2. COMMIT: execute side effects only when a signed commit token is valid.

Two-Phase Sequence (Text Diagram)

Client
  -> transfer_funds_propose(amount,to)
      -> scoring(output_instability, numeric_variance, prompt_drift)
      -> decision:
          - blocked: deterministic fallback (create_draft), no side effects
          - allowed: issue signed commit_token bound to tool args hash
  <- {proposal_id, commit_token?}

Client
  -> transfer_funds_commit(proposal_id, commit_token, amount, to)
      -> verify signature + expiry + proposal existence + args_hash binding + nonce replay
      -> if valid: perform side effect (funds transfer)
      -> else: block with explicit reason
  <- commit outcome

New Modules

  • mcp_observatory/proposal_commit/hashing.py
    • canonical JSON hashing for stable tool_args_hash
    • normalized prompt_hash
  • mcp_observatory/proposal_commit/scoring.py
    • output_instability = 1 - jaccard_similarity
    • numeric_variance from extracted numbers
    • prompt_drift from prompt hash vs baseline
    • weighted renormalized composite_score
    • demo model_generate(prompt, temperature) stub
  • mcp_observatory/proposal_commit/token.py
    • HMAC-SHA256 token issue/verify
    • payload fields: token_id, proposal_id, tool_name, tool_args_hash, issued_at, expires_at, nonce, composite_score
  • mcp_observatory/proposal_commit/proposer.py
    • generic ToolProposer.propose(...) for any tool name/args
    • deterministic blocked fallback
  • mcp_observatory/proposal_commit/verifier.py
    • commit verification and nonce replay protection
  • mcp_observatory/proposal_commit/storage.py
    • in-memory storage fallback
    • optional Postgres storage via asyncpg
  • mcp_observatory/core/wrapper_api.py
    • generic InvocationWrapperAPI for wrapping either agent or model invocations
    • captures input/output hashes, token/cost metrics, and emits allow/review/block decisions
  • mcp_observatory/instrument.py
    • adds instrument_wrapper_api(...) helper for fast wrapper setup
  • mcp_observatory/demo/server.py
    • MCP-like tools:
      • transfer_funds_propose
      • transfer_funds_commit
  • mcp_observatory/demo/run_demo.py
    • propose -> commit -> replay-attempt demo
  • sql/schema.sql
    • Postgres tables: proposals, commits, nonces, tool_prompt_baselines

Security / Verification Rules

Commit verifies all of the following:

  • token signature is valid (bad_signature on failure)
  • token not expired (expired)
  • proposal exists and was allowed (unknown_proposal)
  • commit args hash equals token payload args hash (args_hash_mismatch)
  • nonce has not already been used (nonce_replay)

Deterministic Fallback on Proposal Block

Blocked proposal response is deterministic and side-effect free:

{
  "status": "blocked",
  "action": "create_draft",
  "reason": "low_integrity",
  "draft": {"tool": "transfer_funds", "amount": 100, "to": "acct_123"}
}

Running the Demo

Without Postgres (default)

No env vars needed; in-memory store is used.

python -m mcp_observatory.demo.run_demo

With Postgres

  1. Set DSN:
export MCP_OBSERVATORY_PG_DSN='postgresql://user:pass@localhost:5432/postgres'
  1. Apply schema:
psql "$MCP_OBSERVATORY_PG_DSN" -f sql/schema.sql
  1. Run demo:
python -m mcp_observatory.demo.run_demo

Testing

PYTHONPATH=. pytest -q

The suite includes tests for token verification, hash stability, replay protection, and expired-token rejection.

Real-World MCP Scenario Demo (10 End-to-End Flows)

A realistic MCP server example is available at:

  • mcp_observatory/demo/real_world_server.py
  • executable shim: examples/real_world_mcp_server.py
  • executable client: examples/real_world_mcp_client.py
  • prompt-to-invocation MVP: examples/prompt_to_mcp_invocation_mvp.py
  • OpenAI GPT utility: examples/openai_prompt_to_mcp_invocation.py

It includes:

  • 10 distinct prompts mapped to 10 different MCP tool handlers
  • per-invocation annotations (e.g. destructiveHint, idempotentHint, openWorldHint)
  • proposal/commit execution for HIGH-risk tools (no secondary-response gating)
  • irreversible actions never pass a secondary LLM response
  • simulated LLM responses and grounding summaries for standard-risk tools
  • deterministic fallback routing for blocked/review-required scenarios
  • prompt-to-invocation MVP now extracts required tool parameters from user prompts before server invocation
  • openai-gpt utility for service selection + parameter extraction + MCP invocation

Run server demo:

python examples/real_world_mcp_server.py

Run client demo (client interacting with server):

python examples/real_world_mcp_client.py

Run prompt -> LLM planner -> server invocation MVP:

python examples/prompt_to_mcp_invocation_mvp.py

Run OpenAI GPT utility (manual, requires OPENAI_API_KEY):

python examples/openai_prompt_to_mcp_invocation.py

Optional (recommended): install OpenAI SDK for first-class client support (client auto-detects and uses SDK when available; otherwise it uses HTTPS fallback):

pip install -e .[openai]

Wrapper API (Agent or Model Invocation)

Use the wrapper to route either agent-side orchestration calls or direct model calls through a single observability envelope.

from mcp_observatory.instrument import instrument_wrapper_api

wrapper = instrument_wrapper_api("my-service")

result = await wrapper.invoke(
    source="agent",
    model="gpt-4o-mini",
    prompt="Generate deployment plan",
    input_payload={"request_id": "abc123", "task": "deployment_plan"},
    call=lambda: {"plan": "blue-green rollout"},
)

print(result.decision.action)  # allow/review/block
print(result.span.cost_usd)

Dual-run measurement is supported with dual_invoke=True and shadow parameters (shadow_source, shadow_model, shadow_agent_params, shadow_model_params, and shadow_call) to compare alternate execution paths and capture disagreement metrics.

The wrapper output (WrapperResult) includes:

  • output: raw callable output
  • span: captured telemetry metrics (tokens, cost, hashes, timing)
  • decision: policy decision suitable for downstream execution routing
  • shadow_output and shadow_span (when dual_invoke=True) with comparison metrics on primary span (shadow_disagreement_score, shadow_numeric_variance)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_observatory-0.2.0.tar.gz (52.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_observatory-0.2.0-py3-none-any.whl (62.6 kB view details)

Uploaded Python 3

File details

Details for the file mcp_observatory-0.2.0.tar.gz.

File metadata

  • Download URL: mcp_observatory-0.2.0.tar.gz
  • Upload date:
  • Size: 52.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_observatory-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7dd820b9d18c20cf9630a0b362335525e73f895ea3b262d4de2f69d757a91063
MD5 7a8745b7487b6105b52333ac8fb43eed
BLAKE2b-256 42f84651be09b27469fe76fed41cd32027b83e73a0123f668ace008b37ca8246

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_observatory-0.2.0.tar.gz:

Publisher: deploy.yaml on rajatarun/mcp-observatory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_observatory-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_observatory-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35deba2356bafca3531191f970ded8d046b13d79a29d02ce08c18c4bd3a59beb
MD5 1d63a5b7bad234c9aeee3394b38f8b90
BLAKE2b-256 867c2305b2d833d65c1ff881ca3673747c13227a4b7d93d1d67f6f5ed342edf5

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_observatory-0.2.0-py3-none-any.whl:

Publisher: deploy.yaml on rajatarun/mcp-observatory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page