Observability and APM toolkit for MCP servers
Project description
MCP Observatory
MCP Observatory now includes a two-phase execution pattern for high-risk MCP tool calls, with a generic proposer/verifier wrapper that can be reused by any tool:
- PROPOSE: plan/simulate, evaluate uncertainty/integrity, no side effects.
- COMMIT: execute side effects only when a signed commit token is valid.
Two-Phase Sequence (Text Diagram)
Client
-> transfer_funds_propose(amount,to)
-> scoring(output_instability, numeric_variance, prompt_drift)
-> decision:
- blocked: deterministic fallback (create_draft), no side effects
- allowed: issue signed commit_token bound to tool args hash
<- {proposal_id, commit_token?}
Client
-> transfer_funds_commit(proposal_id, commit_token, amount, to)
-> verify signature + expiry + proposal existence + args_hash binding + nonce replay
-> if valid: perform side effect (funds transfer)
-> else: block with explicit reason
<- commit outcome
New Modules
mcp_observatory/proposal_commit/hashing.py- canonical JSON hashing for stable
tool_args_hash - normalized
prompt_hash
- canonical JSON hashing for stable
mcp_observatory/proposal_commit/scoring.pyoutput_instability = 1 - jaccard_similaritynumeric_variancefrom extracted numbersprompt_driftfrom prompt hash vs baseline- weighted renormalized
composite_score - demo
model_generate(prompt, temperature)stub
mcp_observatory/proposal_commit/token.py- HMAC-SHA256 token issue/verify
- payload fields:
token_id, proposal_id, tool_name, tool_args_hash, issued_at, expires_at, nonce, composite_score
mcp_observatory/proposal_commit/proposer.py- generic
ToolProposer.propose(...)for any tool name/args - deterministic blocked fallback
- generic
mcp_observatory/proposal_commit/verifier.py- commit verification and nonce replay protection
mcp_observatory/proposal_commit/storage.py- in-memory storage fallback
- optional Postgres storage via
asyncpg
mcp_observatory/core/wrapper_api.py- generic
InvocationWrapperAPIfor wrapping eitheragentormodelinvocations - captures input/output hashes, token/cost metrics, and emits
allow/review/blockdecisions
- generic
mcp_observatory/instrument.py- adds
instrument_wrapper_api(...)helper for fast wrapper setup
- adds
mcp_observatory/demo/server.py- MCP-like tools:
transfer_funds_proposetransfer_funds_commit
- MCP-like tools:
mcp_observatory/demo/run_demo.py- propose -> commit -> replay-attempt demo
sql/schema.sql- Postgres tables:
proposals,commits,nonces,tool_prompt_baselines
- Postgres tables:
Security / Verification Rules
Commit verifies all of the following:
- token signature is valid (
bad_signatureon failure) - token not expired (
expired) - proposal exists and was allowed (
unknown_proposal) - commit args hash equals token payload args hash (
args_hash_mismatch) - nonce has not already been used (
nonce_replay)
Deterministic Fallback on Proposal Block
Blocked proposal response is deterministic and side-effect free:
{
"status": "blocked",
"action": "create_draft",
"reason": "low_integrity",
"draft": {"tool": "transfer_funds", "amount": 100, "to": "acct_123"}
}
Running the Demo
Without Postgres (default)
No env vars needed; in-memory store is used.
python -m mcp_observatory.demo.run_demo
With Postgres
- Set DSN:
export MCP_OBSERVATORY_PG_DSN='postgresql://user:pass@localhost:5432/postgres'
- Apply schema:
psql "$MCP_OBSERVATORY_PG_DSN" -f sql/schema.sql
- Run demo:
python -m mcp_observatory.demo.run_demo
Testing
PYTHONPATH=. pytest -q
The suite includes tests for token verification, hash stability, replay protection, and expired-token rejection.
Real-World MCP Scenario Demo (10 End-to-End Flows)
A realistic MCP server example is available at:
mcp_observatory/demo/real_world_server.py- executable shim:
examples/real_world_mcp_server.py - executable client:
examples/real_world_mcp_client.py - prompt-to-invocation MVP:
examples/prompt_to_mcp_invocation_mvp.py - OpenAI GPT utility:
examples/openai_prompt_to_mcp_invocation.py
It includes:
- 10 distinct prompts mapped to 10 different MCP tool handlers
- per-invocation annotations (e.g.
destructiveHint,idempotentHint,openWorldHint) - proposal/commit execution for HIGH-risk tools (no secondary-response gating)
- irreversible actions never pass a secondary LLM response
- simulated LLM responses and grounding summaries for standard-risk tools
- deterministic fallback routing for blocked/review-required scenarios
- prompt-to-invocation MVP now extracts required tool parameters from user prompts before server invocation
- openai-gpt utility for service selection + parameter extraction + MCP invocation
Run server demo:
python examples/real_world_mcp_server.py
Run client demo (client interacting with server):
python examples/real_world_mcp_client.py
Run prompt -> LLM planner -> server invocation MVP:
python examples/prompt_to_mcp_invocation_mvp.py
Run OpenAI GPT utility (manual, requires OPENAI_API_KEY):
python examples/openai_prompt_to_mcp_invocation.py
Optional (recommended): install OpenAI SDK for first-class client support (client auto-detects and uses SDK when available; otherwise it uses HTTPS fallback):
pip install -e .[openai]
Wrapper API (Agent or Model Invocation)
Use the wrapper to route either agent-side orchestration calls or direct model calls through a single observability envelope.
from mcp_observatory.instrument import instrument_wrapper_api
wrapper = instrument_wrapper_api("my-service")
result = await wrapper.invoke(
source="agent",
model="gpt-4o-mini",
prompt="Generate deployment plan",
input_payload={"request_id": "abc123", "task": "deployment_plan"},
call=lambda: {"plan": "blue-green rollout"},
)
print(result.decision.action) # allow/review/block
print(result.span.cost_usd)
Dual-run measurement is supported with dual_invoke=True and shadow parameters (shadow_source, shadow_model, shadow_agent_params, shadow_model_params, and shadow_call) to compare alternate execution paths and capture disagreement metrics.
The wrapper output (WrapperResult) includes:
output: raw callable outputspan: captured telemetry metrics (tokens, cost, hashes, timing)decision: policy decision suitable for downstream execution routingshadow_outputandshadow_span(whendual_invoke=True) with comparison metrics on primary span (shadow_disagreement_score,shadow_numeric_variance)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_observatory-0.2.1.tar.gz.
File metadata
- Download URL: mcp_observatory-0.2.1.tar.gz
- Upload date:
- Size: 53.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30b0a58e376a6383ab6a81b863c9afed37f171b86da1242fc23dd419d5bda56c
|
|
| MD5 |
bfc817967d64b9a2f49056098b37dce7
|
|
| BLAKE2b-256 |
c110a38d90fab5c6ad95aad5513efb659c103966aae780fc50fe7a16e69cbd9b
|
Provenance
The following attestation bundles were made for mcp_observatory-0.2.1.tar.gz:
Publisher:
deploy.yaml on rajatarun/mcp-observatory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_observatory-0.2.1.tar.gz -
Subject digest:
30b0a58e376a6383ab6a81b863c9afed37f171b86da1242fc23dd419d5bda56c - Sigstore transparency entry: 1185778425
- Sigstore integration time:
-
Permalink:
rajatarun/mcp-observatory@aa556e18ba26c9bc2b7cf16f2127720b14a71d9a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/rajatarun
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
deploy.yaml@aa556e18ba26c9bc2b7cf16f2127720b14a71d9a -
Trigger Event:
push
-
Statement type:
File details
Details for the file mcp_observatory-0.2.1-py3-none-any.whl.
File metadata
- Download URL: mcp_observatory-0.2.1-py3-none-any.whl
- Upload date:
- Size: 63.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f6fcaf47d19444668fc75432a5008659659b0029ece226783443be99aff5acb
|
|
| MD5 |
663473f5e8f1df493717cd944ca822f5
|
|
| BLAKE2b-256 |
31b517752165520bc7781e19efa00b736562c30455ac2a6f24c0af377b54838f
|
Provenance
The following attestation bundles were made for mcp_observatory-0.2.1-py3-none-any.whl:
Publisher:
deploy.yaml on rajatarun/mcp-observatory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_observatory-0.2.1-py3-none-any.whl -
Subject digest:
4f6fcaf47d19444668fc75432a5008659659b0029ece226783443be99aff5acb - Sigstore transparency entry: 1185778430
- Sigstore integration time:
-
Permalink:
rajatarun/mcp-observatory@aa556e18ba26c9bc2b7cf16f2127720b14a71d9a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/rajatarun
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
deploy.yaml@aa556e18ba26c9bc2b7cf16f2127720b14a71d9a -
Trigger Event:
push
-
Statement type: