Skip to main content

Model Collector Agent SDK for OpenTelemetry monitoring

Project description

MCA SDK - Model Collector Agent

Pipeline Status Security Scan PyPI version

Production-ready OpenTelemetry SDK for healthcare ML model monitoring. Provides comprehensive instrumentation for Predictive ML, Generative AI, and Agentic AI models with HIPAA-compliant telemetry collection, centralized configuration management, and enterprise security features.

Architecture

┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ Internal     │  │  Vendor API  │  │  GenAI       │  │  E2E Tests   │
│ Model        │  │  (FastAPI)   │  │  Assistant   │  │  (pytest)    │
│ (Py + SDK)   │  │              │  │  (LiteLLM)   │  │              │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                  │                  │                  │
       │ OTLP/HTTP        │ Custom JSON      │ OTLP/HTTP        │ OTLP/HTTP
       │ :4318            │                  │ :4318            │ :4318
       │                  ▼                  │                  │
       │         ┌─────────────────┐         │                  │
       │         │ Vendor Bridge   │         │                  │
       │         │ (Polling 30s)   │         │                  │
       │         │ JSON→OTLP       │         │                  │
       │         └────────┬────────┘         │                  │
       │                  │ OTLP/HTTP        │                  │
       │                  │ :4318            │                  │
       ▼                  ▼                  ▼                  ▼
    ┌──────────────────────────────────────────────────────────────┐
    │           OpenTelemetry Collector (Port 4318)                │
    │                                                               │
    │  ┌──────────┐    ┌────────────────┐    ┌──────────────┐     │
    │  │  Batch   │ →  │  Attributes    │ →  │    Debug     │     │
    │  │Processor │    │  Processor     │    │  Exporter    │     │
    │  │(10s/100) │    │(region, env)   │    │  (stdout)    │     │
    │  └──────────┘    └────────────────┘    └──────────────┘     │
    └──────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
                            Docker Logs / Console
                         (Simulates GCP Backend)

Using GCP Development Infrastructure

The MCA SDK supports testing against real GCP services (Cloud Logging, Cloud Trace, Prometheus) before deploying to production. The GCP dev environment provides a shared OpenTelemetry Collector at 10.164.76.8 (internal VPC) that exports to GCP project bhsf-mca-dev.

Quick Start

Docker Compose:

cd mca-prototype
docker-compose -f docker-compose.yml -f docker-compose.gcp-dev.yml up internal-model

Local Python:

export MCA_COLLECTOR_ENDPOINT=http://10.164.76.8:4318
export MCA_COLLECTOR_PROTOCOL=http
export MCA_ALLOW_INSECURE_COLLECTOR=true
export MCA_SERVICE_NAME=your-model
export MCA_MODEL_ID=your-model-id
export MCA_TEAM_NAME=your-team
python your_model.py

Environment Variables

Variable Value for GCP Dev Description
MCA_COLLECTOR_ENDPOINT http://10.164.76.8:4318 GCP dev collector HTTP endpoint
MCA_COLLECTOR_PROTOCOL http Protocol (http or grpc)
MCA_ALLOW_INSECURE_COLLECTOR true Allow HTTP for internal endpoint

Verification

Check your telemetry in GCP Console:

Or use gcloud CLI:

gcloud logging read "logName=projects/bhsf-mca-dev/logs/emms-model-telemetry" --limit=10
gcloud trace list --project=bhsf-mca-dev --limit=10

Prerequisites

  • VPC Access: The collector endpoint is internal (10.164.76.8). Use VPN if testing from outside the VPC.
  • Connectivity Test: curl http://10.164.76.8:4318/ should succeed

For complete setup instructions, troubleshooting, and Kubernetes deployment, see the GCP Dev Environment Guide.

Installation

From Source (Development)

If you are developing or testing the SDK locally:

# Clone the repository
git clone <repository-url>
cd sdk/mca-prototype

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package in editable mode with development dependencies
pip install -e ".[dev]"

From PyPI (Recommended for usage)

Install the MCA SDK from PyPI:

pip install mca-sdk

Zero-Code Auto-Instrumentation (Easiest Integration)

If you want to instrument your models and application immediately without altering your source code, you can use our zero-code CLI wrapper (mca-instrument).

1. Install with instrumentation dependencies:

pip install "mca-sdk[instrument]"
opentelemetry-bootstrap -a install  # Installs framework-specific instrumentors (FastAPI, requests, etc.)

2. Run your application using the wrapper:

export MCA_MODEL_ID=your-model-id
export MCA_TEAM_NAME=clinical-ai
mca-instrument -- python your_model_script.py

This automatically captures HTTP requests, database calls, and standard ML framework predictions (when combined with autolog()).

With Optional Dependencies

Install with specific optional dependency groups:

# For GenAI/LLM monitoring (includes LiteLLM)
pip install mca-sdk[genai]

# For zero-code instrumentation (includes OpenTelemetry auto-instrumentation)
pip install mca-sdk[instrument]

# For vendor integration (includes requests for Model Registry)
pip install mca-sdk[vendor]

# For development (includes pytest, black, mypy, etc.)
pip install mca-sdk[dev]

# All optional dependencies
pip install mca-sdk[all]

Development Setup (Automated)

For developers contributing to the MCA SDK, we provide a complete automated setup:

# Run the automated setup script and install all dependencies
make dev-setup

This initializes the virtual environment, installs dependencies, sets up pre-commit hooks, and builds required Docker containers.

Version Pinning

Pin to a specific version for production deployments:

# Install exact version
pip install mca-sdk==0.6.7

# Install with version constraints
pip install "mca-sdk>=0.6.7,<1.0.0"

Verify Installation

# Check installed version
pip show mca-sdk

# Test import
python -c "from mca_sdk import MCAClient; print('MCA SDK installed successfully')"

Getting Help from AI Coding Assistants

If you're using an AI coding assistant (Claude, ChatGPT, etc.) to help integrate the SDK:

# AI assistants can read this guide included in the package
import mca_sdk
import os
guide = os.path.join(os.path.dirname(mca_sdk.__file__), 'FOR_AI_ASSISTANTS.md')
print(open(guide).read())

The guide provides simple 3-step instructions for AI assistants to help you:

  1. Analyze your code to find prediction functions
  2. Apply the appropriate pattern (decorator for ML models, callback for LLMs)
  3. Help setup environment variables

Troubleshooting

Import Error after installation:

  • Verify installation: pip show mca-sdk
  • Check Python version: python --version (requires Python 3.10+)
  • Test import: python -c "from mca_sdk import MCAClient"

Dependency conflicts:

  • Use a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install mca-sdk
    
  • Clear pip cache: pip cache purge

OpenTelemetry version conflicts:

  • The SDK requires opentelemetry-sdk>=1.20.0
  • Check installed versions: pip list | grep opentelemetry
  • Upgrade if needed: pip install --upgrade mca-sdk

Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • Python 3.10+ (for running tests and standalone examples)
  • No GCP account needed (uses debug exporter)

Step 1: Start the Stack

cd mca-prototype
docker-compose up

Expected output indicators:

  • mca-otel-collector container starts and shows collector startup
  • mca-vendor-api shows FastAPI startup on port 8080
  • mca-vendor-bridge begins polling and exporting metrics every 30s

Step 2: Run the Demo Model

Option A: Using PyPI Package (Recommended)

In another terminal, install mca-sdk from PyPI and run the standalone demo:

# Install package
pip install mca-sdk

# Run standalone demo (can be executed from any directory)
python mca-prototype/sdk-examples/internal-model/instrumented_model.py

Option B: Development Mode

Install dependencies and run example from repository:

cd mca-prototype
pip install -r sdk-examples/internal-model/requirements.txt
python sdk-examples/internal-model/instrumented_model.py

Expected behavior (both options):

  • Runs predictions with 1-second intervals
  • Prints prediction latency for each iteration
  • Sends metrics, logs, and traces to collector
  • Flushes all telemetry at completion

Step 3: Observe the Collector Logs

Look for output in the collector terminal showing received telemetry:

Metrics from Internal Model:

ResourceMetrics #0
Resource attributes:
     -> service.name: Str(demo-readmission-model)
     -> model.id: Str(mdl-001)
     -> gcp.region: Str(us-central1)        ← Added by collector
     -> environment: Str(prototype)         ← Added by collector
Metric #0
     -> Name: model_predictions_total
     -> Value: 10

Metrics from Vendor API (appears every 30 seconds):

Resource attributes:
     -> service.name: Str(vendor-sepsis-v2)
     -> model.type: Str(vendor)
     -> gcp.region: Str(us-central1)        ← Added by collector
     -> environment: Str(prototype)         ← Added by collector
Metric #0
     -> Name: model.accuracy
     -> Value: 0.89

Traces from Internal Model:

Span #0
     -> Name: model.predict
     -> Attributes:
          -> model.id: Str(mdl-001)
          -> prediction_id: Str(pred-1234)

Step 4: Run E2E Tests

# Collector must be running from Step 1
cd mca-prototype
pip install -r requirements.txt
pytest tests/integration/test_e2e_flow.py -v -s

Expected output:

  • Health check passes
  • Counter metric test sends value 42, verifies in logs (waits 12s for batch timeout)
  • Histogram test sends 5 values, verifies in logs
  • Attribute enrichment test confirms gcp.region and environment added

Step 5: Run Unit Tests

pytest tests/integration/test_sdk_integration.py -v

Expected: ~20 tests pass covering provider initialization, metric operations, graceful failure handling, and resource attribute propagation

SDK Features

The MCA SDK provides comprehensive instrumentation capabilities:

  • CLI Wrappers - Zero-code instrumentation via mca-instrument and mca-run (NEW!)
  • autolog() - Zero-code instrumentation for ML frameworks
  • @predict() Decorator - Automatic prediction function instrumentation
  • LiteLLM Integration - Native callback for GenAI/LLM monitoring
  • Agentic AI Support - Goal tracking and tool execution monitoring
  • Registry Integration - Centralized configuration management
  • Security - Queue encryption, certificate management, GCP authentication
  • Resilience - Circuit breakers, retry logic, graceful degradation
  • Buffering - Dead Letter Queue (DLQ) for failed telemetry

CLI Wrappers - Zero-Code Instrumentation

The MCA SDK provides two CLI commands that enable instrumentation without modifying your model code. Simply change how you run your script.

mca-instrument - Full OpenTelemetry Auto-Instrumentation

Wraps your script with OpenTelemetry auto-instrumentation for comprehensive telemetry including HTTP requests, database calls, and framework-level operations.

Installation:

pip install "mca-sdk[instrument]"
opentelemetry-bootstrap -a install  # Install framework instrumentors

Usage:

# Basic usage - use -- to separate wrapper args from script args
mca-instrument --model-id mdl-001 --team clinical-ai -- python my_model.py

# With all options
mca-instrument \
  --model-id mdl-001 \
  --team clinical-ai \
  --service-name my-service \
  --collector-endpoint http://localhost:4318 \
  --protocol grpc \
  -- python my_model.py --debug

# Using environment variables (higher priority than CLI args)
export MCA_MODEL_ID=mdl-001
export MCA_TEAM_NAME=clinical-ai
mca-instrument -- python my_model.py

Features:

  • Automatic instrumentation of HTTP libraries (requests, urllib3, httpx)
  • Database query tracking (psycopg2, pymongo, redis)
  • Framework instrumentation (Flask, FastAPI, Django)
  • No code changes required in your model script
  • POSIX-compliant signal handling and exit codes
  • Timeout protection prevents hanging on collector failures

Configuration Options:

  • --model-id - Model identifier (required, or set MCA_MODEL_ID)
  • --team - Team name (required, or set MCA_TEAM_NAME)
  • --service-name - Service name (defaults to model-id)
  • --collector-endpoint - OTel Collector URL (default: http://localhost:4318)
  • --protocol - OTLP protocol: http/protobuf or grpc (default: http/protobuf)
  • --registry-url - Model Registry API URL (optional)
  • --debug - Enable debug logging

Important Notes:

  • Requires opentelemetry-instrument on PATH (installed via [instrument] extra)
  • Use -- separator to avoid argument conflicts with your script
  • Works with Python scripts, bash scripts, uvicorn, gunicorn, etc.
  • Respects virtual environments (no sys.executable assumptions)

mca-run - MCA Client Auto-Initialization

Simpler variant that initializes MCAClient without OpenTelemetry auto-instrumentation. Useful when you only need MCA SDK telemetry without framework-level instrumentation.

Installation:

pip install mca-sdk  # No extra dependencies needed

Usage:

# Same arguments as mca-instrument
mca-run --model-id mdl-001 --team clinical-ai -- python my_model.py

# Works with any executable
mca-run --model-id mdl-001 --team test -- bash train.sh

Features:

  • Injects MCA_* environment variables
  • Auto-initializes MCAClient via PYTHONPATH injection
  • 3-second shutdown timeout prevents CI pipeline hangs
  • Works with any executable (not just Python)
  • No OpenTelemetry dependencies required

How It Works:

  1. Creates temporary sitecustomize.py that imports mca_sdk.cli._bootstrap
  2. Injects directory into PYTHONPATH
  3. Runs your command exactly as provided
  4. MCAClient auto-initializes from environment variables
  5. Registers atexit handler with timeout for clean shutdown

Use Cases:

  • Legacy scripts you can't modify
  • Quick prototyping without code changes
  • Scripts that already use MCA SDK but need environment setup
  • Non-Python executables that call Python scripts

Working with Multi-File Projects and Directories

The CLI wrapper operates at the process level, not the file/directory level. It does NOT automatically scan directories or identify Python files to instrument.

What Gets Instrumented:

  • The Python process you start
  • All modules imported by that process
  • HTTP libraries (requests, urllib3, httpx)
  • Database drivers (psycopg2, pymongo, redis)
  • Web frameworks (Flask, FastAPI, Django)

What Does NOT Get Instrumented:

  • Files not imported by your entry point
  • Standalone scripts not executed by your command
  • Files in the directory that aren't run

Scenario 1: Project with Imports (WORKS AUTOMATICALLY)

Project structure:

myproject/
├── main.py         # Entry point
├── models.py       # Imported by main.py
├── utils.py        # Imported by main.py
└── config.yaml     # Loaded by main.py

Command:

mca-instrument -- python main.py

Result: main.py, models.py, utils.py all instrumented via imports. YAML file reading is instrumented if using instrumented libraries.

Scenario 2: Multiple Standalone Scripts (DOESN'T WORK AUTOMATICALLY)

Project structure:

scripts/
├── train.py        # Standalone script
├── evaluate.py     # Standalone script
└── deploy.py       # Standalone script

Problem:

mca-instrument -- python train.py
# Only train.py is instrumented

Solutions:

# Option 1: Run each separately
mca-instrument -- python train.py
mca-instrument -- python evaluate.py
mca-instrument -- python deploy.py

# Option 2: Create wrapper script (RECOMMENDED)
# run_pipeline.sh:
#   python train.py
#   python evaluate.py
#   python deploy.py

mca-instrument -- bash run_pipeline.sh
# All three scripts instrumented (inherit environment)

Scenario 3: Mixed File Types (NON-PYTHON FILES IGNORED)

Project structure:

project/
├── main.py         # Entry point
├── helper.py       # Imported by main.py
├── config.yaml     # Loaded at runtime
├── data.csv        # Read by pandas
└── README.md       # Documentation

Command:

mca-instrument -- python main.py

Result:

  • main.py and helper.py instrumented
  • config.yaml, data.csv reading instrumented if using instrumented libraries
  • README.md ignored (not code)

Scenario 4: Web Server (ENTIRE APPLICATION INSTRUMENTED)

Project structure:

api/
├── app.py          # FastAPI application
├── routes/
│   ├── users.py    # Imported by app.py
│   └── models.py   # Imported by app.py
└── database.py     # Imported by routes

Command:

mca-instrument -- uvicorn api.app:app

Result: Entire application instrumented, including all HTTP requests, database queries, and route handlers.

Scenario 5: Python Module Execution

Project structure:

mypackage/
├── __main__.py     # Entry point for -m
├── core.py         # Imported by __main__
└── utils.py        # Imported by core

Command:

mca-instrument -- python -m mypackage

Result: All modules in the package instrumented via import chain.

Key Takeaway: The CLI wrapper instruments whatever command you provide. For multi-file projects, ensure all files are either:

  1. Imported by your entry point
  2. Executed sequentially in a wrapper script
  3. Run as separate CLI wrapper invocations

CLI Best Practices

Argument Separation: Always use -- to separate wrapper arguments from script arguments:

# Good - explicit separation
mca-instrument --model-id mdl-001 --team test -- python script.py --debug

# Bad - may cause conflicts
mca-instrument --model-id mdl-001 --team test python script.py --debug

Environment Variable Precedence: Environment variables take precedence over CLI arguments:

export MCA_MODEL_ID=from-env
mca-instrument --model-id from-cli --team test -- python script.py
# Uses: from-env (environment variable wins)

Service Name Defaulting: If not specified, service name defaults to model ID:

mca-instrument --model-id mdl-001 --team test -- python script.py
# Results in: MCA_SERVICE_NAME=mdl-001

Error Handling: Both commands validate required arguments and exit with code 2 if validation fails:

mca-instrument --team test -- python script.py
# Error: Missing required configuration: model ID
# Exit code: 2

Signal Handling: Proper POSIX signal handling with timeout escalation:

  • SIGTERM/SIGINT forwarded to child process
  • 5-second graceful shutdown timeout
  • Escalates to SIGKILL if child doesn't exit
  • Returns exit code 128 + signal number for signal termination

Troubleshooting CLI Commands

opentelemetry-instrument not found:

pip install "mca-sdk[instrument]"
opentelemetry-bootstrap -a install
which opentelemetry-instrument  # Verify on PATH

Command not executing:

  • Use -- separator
  • Check that command is executable
  • Verify command is on PATH or use full path

Collector unreachable:

  • Verify collector endpoint: curl http://localhost:4318/
  • Check network connectivity
  • Review collector logs for errors
  • Commands have 3-5 second timeout protection

Environment variable conflicts:

  • List current environment: env | grep MCA_
  • Remember: environment > CLI arguments
  • Unset conflicting vars: unset MCA_MODEL_ID

autolog() - Zero-Code Instrumentation

The autolog() function provides automatic instrumentation for popular ML frameworks without requiring code changes. Simply call autolog() after initializing MCAClient, and all predictions from supported frameworks will automatically emit OpenTelemetry metrics and traces.

Supported Frameworks:

  • scikit-learn: predict(), predict_proba(), fit()
  • XGBoost: Booster.predict(), XGBClassifier.predict(), XGBRegressor.predict()
  • LightGBM: Booster.predict(), LGBMClassifier.predict(), LGBMRegressor.predict()

Basic Usage

from mca_sdk import MCAClient, autolog
from sklearn.ensemble import RandomForestClassifier

# Initialize client first
client = MCAClient(
    service_name="ml-service",
    model_id="model-v1",
    team_name="ml-team"
)

# Enable autolog (one line!)
autolog()

# Now sklearn predictions are automatically instrumented
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)  # Automatically tracked!

client.shutdown()

Configuration Options

# Enable for specific frameworks only
autolog(frameworks=["sklearn", "xgboost"])

# Exclude specific frameworks
autolog(exclude=["lightgbm"])

# Capture input/output data with size limits (use with caution - PHI risk)
# Payloads larger than max_payload_size will be truncated to prevent memory bloat
autolog(capture_input=True, capture_output=True, max_payload_size=5000)

# Note: Invalid framework names will raise ValueError
# Supported: sklearn, xgboost, lightgbm

Telemetry Generated

Metrics:

  • model.prediction.count: Counter for prediction calls
  • model.prediction.latency: Histogram of prediction latencies
  • Attributes: framework, model_class, method

Traces:

  • Span name: {framework}.{model_class}.{method}
  • Example: sklearn.RandomForestClassifier.predict

Important Notes

  1. MCAClient must be initialized before autolog(): The autolog() function requires an active MCAClient instance. If predictions are made without an initialized client, a warning is logged once (to avoid spam) and predictions execute without telemetry.
  2. Framework detection: Only patches frameworks that are imported in sys.modules
  3. PHI considerations: Be cautious with capture_input=True - input data may contain PHI. Use max_payload_size to limit captured data size.
  4. Payload size limits: Large predictions (common in batch ML) automatically truncated to max_payload_size (default: 10KB) to prevent memory bloat and network payload explosion
  5. Thread-safe: Autolog is thread-safe and prevents double-patching
  6. Error handling: Exceptions during prediction are properly recorded in spans with ERROR status and re-raised to preserve application behavior
  7. Validation: Invalid framework names in frameworks or exclude parameters raise ValueError immediately

See autolog demo for complete examples.

@predict() Decorator

The MCA SDK provides a @predict() decorator for automatic instrumentation of prediction functions. The decorator captures inputs, outputs, latency, and errors without manual metric recording.

Basic Usage

from mca_sdk import MCAClient

client = MCAClient(
    service_name="my-ml-service",
    model_id="model-v1",
    team_name="ml-team"
)

@client.predict()
def make_prediction(features: dict) -> dict:
    # Your prediction logic
    score = sum(features.values()) * 0.1
    return {"prediction": "positive" if score > 0.5 else "negative"}

# Decorator automatically tracks:
# - Prediction count (counter)
# - Latency (histogram)
# - Prediction ID (for actuals join)
# - Errors and exceptions
result = make_prediction({"feature1": 2.5, "feature2": 3.8})

Advanced Configuration

@client.predict(
    span_name="custom_prediction_name",  # Custom trace span name
    capture_input=True,                   # Capture function inputs
    capture_output=True,                  # Capture function outputs
    model_version="2.0",                  # Additional metric attributes
    threshold=0.7
)
def advanced_prediction(data: dict) -> dict:
    return {"result": "ok"}

Async Function Support

@client.predict()
async def async_prediction(features: dict) -> dict:
    await asyncio.sleep(0.01)  # Async operations
    return {"prediction": "result"}

result = await async_prediction({"feature1": 1.0})

Multi-threaded Usage

The decorator is thread-safe and generates unique prediction IDs for concurrent calls:

from concurrent.futures import ThreadPoolExecutor

@client.predict()
def threaded_prediction(value: int) -> dict:
    return {"output": value * 2}

with ThreadPoolExecutor(max_workers=10) as executor:
    futures = [executor.submit(threaded_prediction, i) for i in range(100)]
    results = [f.result() for f in futures]

HIPAA Compliance Warning

CRITICAL: The decorator captures inputs/outputs for telemetry WITHOUT automatic PHI masking. You MUST sanitize sensitive data (PHI) before passing to decorated functions. Failure to sanitize PHI may result in HIPAA violations and regulatory penalties.

# BAD - May log PHI
@client.predict(capture_input=True)
def unsafe_prediction(patient_data: dict):  # Contains PHI!
    return model.predict(patient_data)

# GOOD - PHI sanitized before capture
@client.predict(capture_input=True)
def safe_prediction(sanitized_features: dict):  # No PHI
    return model.predict(sanitized_features)

Performance

The decorator adds minimal overhead (<5ms per prediction) for span creation and metric recording.

PyPI Package Verification

If you've installed mca-sdk from PyPI, you can verify it works correctly without any local repository dependencies.

Standalone Demo

The standalone demo demonstrates using the PyPI-installed package:

# Install from PyPI
pip install mca-sdk

# Start collector
cd mca-prototype && docker-compose up otel-collector

# Run demo (works from any directory - no sys.path manipulation)
python mca-prototype/sdk-examples/internal-model/instrumented_model.py

Key Points:

  • No sys.path.insert() needed
  • Clean imports: from mca_sdk import MCAClient
  • Works from any directory (not just inside cloned repo)
  • All dependencies auto-installed

PyPI Package Tests

Run verification tests to ensure the package is properly installed:

# Install package first
pip install mca-sdk

# Run PyPI verification tests
pytest tests/integration/test_pypi_package.py -v

These tests verify:

  • Package is installed from PyPI (not local)
  • Imports work without path manipulation
  • Client can be instantiated
  • Metrics can be created and recorded
  • Dependencies are properly installed
  • Optional dependencies (genai, vendor) work if installed

Example Requirements Files

All examples have been updated to use the PyPI package:

internal-model/requirements.txt:

mca-sdk>=0.6.7
opentelemetry-semantic-conventions==0.48b0

internal-genai/requirements.txt:

mca-sdk[genai]>=0.6.7
tiktoken>=0.5.2

internal-agentic/requirements.txt:

mca-sdk>=0.6.7
langchain>=0.1.0

vendor-bridge/requirements.txt:

mca-sdk[vendor]>=0.6.7
fastapi>=0.115.6
uvicorn>=0.34.0

Development Setup

For developers contributing to the MCA Prototype, we provide a reproducible development environment with Docker Compose, hot-reload, pre-commit hooks, and IDE configurations.

Prerequisites

  • Docker and Docker Compose
  • Python 3.11+
  • Git
  • Make (optional, but recommended)

One-Time Setup

# Clone the repository (if not already done)
git clone <repository-url>
cd sdk  # or the directory you cloned into

# Run the automated setup script
make dev-setup
# OR manually:
./scripts/setup-dev.sh

This script will:

  1. Create a Python virtual environment
  2. Install all Python dependencies (SDK, dev tools, pre-commit)
  3. Install pre-commit hooks for automatic code quality checks
  4. Build Docker images for all services
  5. Create a .env file from .env.example

Setup time: <10 minutes (depending on internet speed)

Starting Development Services

# Start all services (collector, examples, vendor API)
make dev-start

# View logs from all services
make logs

# Stop all services
make dev-stop

Available Services:

  • OTel Collector: http://localhost:4318 (OTLP), http://localhost:13133 (health)
  • Vendor API: http://localhost:8080
  • Internal Model: Container mca-internal-model
  • Internal Agentic: Container mca-internal-agentic
  • GenAI Assistant: Container mca-genai-assistant
  • Vendor Bridge: Container mca-vendor-bridge

Hot-Reload for Fast Iteration

All services have volume mounts configured for live code reload:

volumes:
  - ./mca_sdk:/app/mca_sdk  # Changes to SDK reflected immediately

To test hot-reload:

  1. Start services: make dev-start
  2. Modify code in mca_sdk/
  3. Check container logs: docker logs mca-internal-model -f
  4. Changes are reflected without restarting containers

Running Tests

# Run all tests with coverage (requires 85% coverage)
make test

# Run tests manually
pytest tests/ -v --cov=mca-prototype/mca_sdk --cov-fail-under=85

Code Quality & Linting

# Run all linting checks (Black, isort, pylint, mypy, bandit)
make lint

# Auto-format code (Black + isort)
make format

# Run pre-commit hooks manually
make pre-commit

Pre-commit hooks run automatically on every git commit and check:

  • Python formatting (Black)
  • Import sorting (isort)
  • Python linting (pylint)
  • Type checking (mypy)
  • Security scanning (Bandit)
  • YAML/JSON formatting (Prettier)
  • Dockerfile linting (hadolint)

IDE Setup (VS Code)

VS Code configurations are included in .vscode/:

  • settings.json: Python linting, formatting, testing
  • launch.json: Debug configurations for tests and examples

Recommended Extensions:

  • Python (ms-python.python)
  • Black Formatter (ms-python.black-formatter)
  • Pylance (ms-python.vscode-pylance)
  • Docker (ms-azuretools.vscode-docker)
  • Prettier (esbenp.prettier-vscode)

Debug Examples:

  1. Open VS Code
  2. Go to Run & Debug (Ctrl+Shift+D)
  3. Select debug configuration (e.g., "Python: Debug SDK Internal Model Example")
  4. Press F5 to start debugging

Common Development Tasks

# Run a specific example locally
source venv/bin/activate
export PYTHONPATH=$(pwd)/mca-prototype
python mca-prototype/sdk-examples/internal-model/instrumented_model.py

# View logs from a specific service
docker logs mca-internal-model -f

# Rebuild a specific service after Dockerfile changes
cd mca-prototype && docker-compose build internal-model

# Clean up build artifacts
make clean

# Full clean including virtual environment
make clean-all

Development Workflow

  1. Create a feature branch: git checkout -b feature/your-feature
  2. Make code changes in mca_sdk/ or examples
  3. Run tests locally: make test
  4. Run linting: make lint (or let pre-commit handle it)
  5. Commit changes: git commit -m "description" (pre-commit hooks run automatically)
  6. Push and create PR: git push origin feature/your-feature

Troubleshooting Development Setup

Issue: Pre-commit hooks failing

# Run hooks manually to see errors
pre-commit run --all-files

# Auto-fix formatting issues
make format

# Update pre-commit hooks
pre-commit autoupdate

Issue: Docker containers not starting

# Check Docker daemon is running
docker ps

# Rebuild containers
cd mca-prototype && docker-compose build

# Check logs
docker-compose logs

Issue: Tests failing with import errors

# Ensure virtual environment is activated
source venv/bin/activate

# Reinstall dependencies
pip install -r mca-prototype/mca_sdk/requirements.txt

Issue: Port conflicts

# Check what's using ports 4318 or 8080
lsof -i :4318
lsof -i :8080

# Stop conflicting services or change ports in docker-compose.yml

Environment Variables

Development environment variables are configured in .env (created from .env.example):

# Key development settings
DEBUG_MODE=true
LOG_LEVEL=DEBUG
COLLECTOR_ENDPOINT=http://localhost:4318
REGISTRY_URL=http://localhost:8000  # Mock registry for local dev

See .env.example for all available configuration options.

Project Structure

.
├── mca-prototype/
│   ├── docker-compose.yml              # Orchestrates local testing services
│   ├── config/
│   │   └── otel-collector-config.yaml  # Collector pipelines: OTLP → Batch → Attributes → Debug
│   ├── mca_sdk/                        # Python SDK source code
│   ├── mca-sdk-nodejs/                 # Node.js SDK source code (WIP)
│   ├── k8s/ & helm/                    # Kubernetes deployment configurations
│   ├── sdk-examples/
│   │   ├── internal-model/             # Demo: Metrics, Logs, Traces instrumentation
│   │   ├── internal-genai/             # GenAI assistant with LiteLLM + MCA SDK
│   │   ├── internal-agentic/           # Medical research agent with multi-step reasoning
│   │   └── vendor-bridge/              # Converts vendor JSON to OTLP metrics
│   └── tests/                          # Comprehensive Pytest suites
│
├── terraform/                          # Terraform modules for GCP infrastructure
│   ├── modules/                        # cloud-logging, cloud-trace, cloud-monitoring, iam, etc.
│   └── main.tf                         # Main infrastructure orchestration
│
├── docs/                               # Additional documentation & integration guides
├── scripts/                            # Utility scripts for development & deployment
└── Makefile                            # Development tasks (lint, test, build, terraform)

Key Components

Component Purpose Port
OpenTelemetry Collector Receives OTLP data, enriches with metadata, outputs to debug exporter 4318, 13133
Internal Model Demonstrates full SDK instrumentation (metrics/logs/traces) for predictive ML -
Internal GenAI Demonstrates LLM monitoring with LiteLLM + MCA SDK integration -
Internal Agentic Demonstrates agentic AI with goal tracking, tool execution, and multi-step reasoning -
Vendor API Simulates third-party model API with proprietary JSON format 8080
Vendor Bridge Converts vendor JSON to OTLP metrics every 30 seconds -
E2E Tests Validates collector receives and processes data -
Unit Tests Tests SDK integration patterns without network -

Data Pipelines

  1. Metrics Pipeline: OTLP ReceiverAttributes Processor (adds region/env) → Batch Processor (10s/100 metrics) → Debug Exporter (stdout)
  2. Logs Pipeline: Same processors, OTLP logs input
  3. Traces Pipeline: Same processors, OTLP traces input

Enrichment Strategy

  • All telemetry signals enriched with gcp.region: us-central1 and environment: prototype
  • Demonstrates how to add organizational metadata at collector level
  • Resource attributes from application (service name, model ID) preserved

Demo Scenarios

Scenario 1: Internal Model Monitoring

Use Case: Hospital's readmission prediction model with full instrumentation

Steps:

  1. Start collector: docker-compose up
  2. Run model: python sdk-examples/internal-model/instrumented_model.py
  3. Show collector logs with metrics, logs, and traces
  4. Point out enriched attributes (gcp.region, environment)

Key Points:

  • Full observability: metrics (counter/histogram), logs (structured), traces (nested spans)
  • Resource attributes identify model, version, team
  • Collector adds deployment context automatically

Scenario 2: Vendor API Integration

Use Case: Third-party sepsis model doesn't support OTLP natively

Steps:

  1. Collector already running from Scenario 1
  2. Show vendor API JSON: curl http://localhost:8080/metrics
  3. Observe bridge logs converting and exporting
  4. Show collector receiving vendor metrics with model.type: vendor attribute

Key Points:

  • Bridge pattern for non-OTLP APIs
  • Delta calculation for counters (converts 24h rolling count to cumulative)
  • Dynamic resource attributes from API response
  • Polling every 30 seconds

Scenario 3: E2E Validation

Use Case: Verify collector pipeline works correctly

Steps:

  1. Run E2E tests: pytest tests/integration/test_e2e_flow.py -v -s
  2. Show test sending metrics with known values (42)
  3. Show test parsing Docker logs to verify receipt
  4. Demonstrate attribute enrichment validation

Key Points:

  • Tests send real OTLP data to running collector
  • Verifies batch processing (12s wait for 10s timeout)
  • Log-based verification for manual inspection
  • Validates enrichment pipeline

Scenario 4: GenAI/LLM Monitoring

Use Case: Clinical documentation assistant with LLM observability

Steps:

  1. Services already running from docker-compose up
  2. Check GenAI logs: docker logs mca-genai-assistant -f
  3. Observe collector receiving LLM traces with token counts
  4. Show custom metrics in collector logs: docker logs mca-otel-collector | grep genai

Key Points:

  • LiteLLM's automatic trace instrumentation for LLM calls
  • Token usage tracking (prompt and completion tokens)
  • Cost estimation based on token counts
  • Latency monitoring for LLM requests
  • Mock mode for demo purposes (no API calls)
  • Continuous 30-second loop demonstrates ongoing LLM usage patterns

Expected Telemetry:

  • Metrics: genai.tokens.prompt, genai.tokens.completion, genai.request.cost_usd, genai.request.latency_seconds
  • Traces: Automatic spans from LiteLLM with model, token counts, and latency
  • Resource attributes: service.name=genai-clinical-assistant, model.type=generative, llm.provider=openai-mock

Scenario 5: Agentic AI with Multi-Step Reasoning

Use Case: Medical research assistant agent that uses multiple tools to answer clinical questions

Steps:

  1. Collector already running from previous scenarios
  2. Run agent: python sdk-examples/internal-agentic/agent_instrumented.py
  3. Watch agent execute multi-step workflow (planning → research → analysis → synthesis)
  4. Show agent metrics: docker logs mca-otel-collector | grep agent

Key Points:

  • Goal Tracking: Monitors when goals start/complete with success/failure status
  • Tool Execution: Tracks PubMed searches, drug database queries with latency metrics
  • Multi-Step Reasoning: Nested spans show planning, research, analysis, synthesis steps
  • Human Intervention: Tracks when human review is requested
  • Mock Mode: All tools use predefined responses (no external APIs)

Expected Telemetry:

  • Metrics:
    • agent.goals_started_total, agent.goals_completed_total (counters)
    • agent.tool_calls_total (counter with tool_name label)
    • agent.tool_latency_seconds (histogram per tool)
    • agent.human_interventions_total (counter)
    • agent.reasoning_steps_total (counter)
  • Traces:
    • agent.goal (parent span for entire goal)
      • agent.planning (search strategy)
      • agent.tool_execution (PubMed, drug database)
      • agent.reasoning (analysis)
      • agent.synthesis (answer creation)
      • agent.human_intervention (review request)
  • Resource attributes: service.name=medical-research-agent, model.type=agentic, team.name=ai-research-team

Model Registry Integration

The MCA SDK now supports centralized configuration management through a Model Registry API. This enables:

  • Dynamic model metadata and thresholds
  • Automatic periodic refresh (default 10 minutes)
  • Graceful fallback when registry is unavailable
  • Security: HTTPS required for non-localhost, bearer token authentication

Usage

With Environment Variables:

export MCA_REGISTRY_URL="https://registry.example.com"
export MCA_REGISTRY_TOKEN="your-secret-token"
export MCA_MODEL_ID="mdl-001"
export MCA_MODEL_VERSION="2.0.0"

python your_model.py

With Code:

from mca_sdk import MCAClient, MCAConfig

config = MCAConfig(
    service_name="readmission-model",
    model_id="mdl-001",
    model_version="2.0.0",
    registry_url="https://registry.example.com",
    registry_token="your-secret-token",
    refresh_interval_secs=600,  # 10 minutes
)

client = MCAClient(config=config)

# Access registry-provided thresholds
if client.thresholds.get("latency_warn_ms", 0) < latency_ms:
    client.logger.warning("Latency threshold exceeded")

client.shutdown()

Registry API Contract

Model Config Endpoint:

GET /models/{model_id}?version=2.0.0
Authorization: Bearer <token>

Response:
{
  "service_name": "readmission-model",
  "model_id": "mdl-001",
  "model_version": "2.0.0",
  "team_name": "clinical-ai",
  "model_type": "internal",
  "thresholds": {
    "latency_warn_ms": 500,
    "error_rate_warn": 0.05
  },
  "extra_resource": {
    "deployment.env": "production"
  }
}

Deployment Config Endpoint (optional):

GET /deployments/{deployment_id}
Authorization: Bearer <token>

Response:
{
  "deployment_id": "dep-001",
  "environment": "production",
  "region": "us-east-1",
  "resource_overrides": {
    "deployment.zone": "az-1"
  }
}

Features

  • Config Precedence: kwargs > registry > env > YAML > defaults
  • Background Refresh: Updates thresholds every 10 minutes (configurable)
  • Identity Immutability: service_name, model_id changes require restart
  • Resilience: Telemetry continues if registry is down (uses last-known config)
  • Security: HTTPS required, token never logged
  • GCP Authentication: Automatic ID token authentication for Cloud Run APIs (see GCP Auth Guide)
  • Telemetry: Self-monitoring metrics for registry operations

Configuration Options

Environment Variable Description Default
MCA_REGISTRY_URL Registry service URL (HTTPS required) None
MCA_REGISTRY_TOKEN Bearer token for authentication None
MCA_REFRESH_SECS Refresh interval in seconds 600
MCA_PREFER_REGISTRY Registry overrides local config True
MCA_DEPLOYMENT_ID Optional deployment identifier None

Next Steps / Known Limitations

Implemented (Phase 1)

  • ✅ OTLP HTTP receiver for metrics, logs, traces
  • ✅ Batch processing (10s timeout or 100 metrics)
  • ✅ Attribute enrichment (region, environment)
  • ✅ GCP exporter for prototype validation
  • ✅ Vendor API bridge pattern
  • ✅ Full SDK instrumentation example
  • ✅ GenAI/LLM monitoring with LiteLLM integration
  • ✅ Agentic AI instrumentation with goal tracking and tool execution monitoring
  • Model Registry Integration: Centralized config management with automatic refresh
  • GCP Integration: Cloud Trace and Cloud Logging exporters
  • GCP Authentication: Service account and ID token authentication
  • Security: Queue encryption, certificate management, HTTPS enforcement
  • Resilience: Circuit breakers, retry logic with exponential backoff
  • Persistent Storage: Dead Letter Queue (DLQ) for failed telemetry
  • Kubernetes Deployment: Helm charts and Kustomize configurations
  • ✅ Comprehensive testing (unit + e2e)
  • ✅ Docker Compose orchestration
  • ✅ Health check endpoint

Phase 2: Production Readiness (In progress)

  • GCP Cloud Monitoring
  • Security Hardening:
    • mTLS for collector OTLP receiver
    • API key authentication for vendor bridge
  • High Availability: Multi-instance collector with load balancing
  • Alerting: Configure processor for alert generation on metric thresholds (via Terraform)
  • Schema Validation: Enforce metric naming conventions at collector level
  • Node.js SDK: Initial implementation of the MCA SDK for Node.js (in mca-sdk-nodejs/)
  • Cost Optimization: Sampling strategies for high-volume traces

Phase 3: Scale & Features

  • Additional Vendors: More bridge implementations
  • Real Models: Production model integrations
  • Dashboards: GCP console visualizations (via Terraform)
  • SLO Monitoring: Track model performance SLIs
  • Anomaly Detection: Statistical outlier identification
  • Data Retention: Policies for metric aggregation/archival

Known Limitations

  • Collector Authentication: OTLP receiver does not require authentication (use network policies in production)
  • Batch Timeout: Up to 10s delay in data visibility (configurable)
  • Single Instance Collector: No built-in redundancy or failover (use Kubernetes replication)
  • Metric Descriptor Management: GCP Cloud Monitoring descriptors created automatically but not pre-configured
  • Manual E2E Verification: Tests rely on Docker log parsing (consider using OTLP test receiver)

Security Considerations (For Production)

  • HTTPS Enforcement: Both registry_url and collector_endpoint require HTTPS for non-localhost endpoints (enforced since v0.4.1)
    • ✅ Allowed: https://registry.example.com, http://localhost:5000, http://127.0.0.1:4318
    • ❌ Blocked: http://registry.example.com (raises ConfigurationError)
    • Localhost Exception: HTTP is allowed for localhost, 127.0.0.0/8 range, and ::1 for development convenience
    • Security Note: Prevents credential and telemetry exposure over unencrypted connections
  • Audit Logs: Implement comprehensive access logging for collector
  • Encryption: Require TLS for all OTLP communication
  • Access Control: Implement RBAC for collector configuration
  • Data Residency: Ensure GCP region meets compliance requirements

Troubleshooting

Collector not receiving metrics

Symptom: No output in collector logs after running model

Solutions:

  • Check collector is healthy: curl http://localhost:13133/
  • Verify port 4318 is accessible: docker ps
  • Check model completed and flushed: Look for "Flushing metrics" in model output
  • Increase batch timeout: Metrics may be waiting for 10s batch window

Vendor bridge failing to start

Symptom: mca-vendor-bridge container exits with error

Solutions:

  • Check vendor-api is healthy: docker ps (should show healthy status)
  • Verify API is accessible: curl http://localhost:8080/health
  • Check environment variables in docker-compose.yml
  • Review bridge logs: docker logs mca-vendor-bridge

E2E tests skipped

Symptom: Tests show "SKIPPED - Collector is not running"

Solutions:

  • Start collector first: docker-compose up
  • Wait for health endpoint: May take 10-15 seconds on first start
  • Check health manually: curl http://localhost:13133/
  • Rebuild if config changed: docker-compose up --build

Import errors in tests

Symptom: ImportError: cannot import name 'InMemorySpanExporter'

Solutions:

  • Install dependencies: pip install -r requirements.txt
  • Check Python version: Requires 3.10+
  • Virtual environment recommended: python -m venv venv && source venv/bin/activate

Additional Resources

MCA SDK Documentation

OpenTelemetry Resources

Contributing

This is a prototype project for demonstration purposes. For production deployment:

  1. Review security considerations
  2. Implement authentication
  3. Configure real GCP backend exporters
  4. Set up monitoring for the collector itself
  5. Establish metric retention policies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mca_sdk-0.1.dev20.tar.gz (932.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mca_sdk-0.1.dev20-py3-none-any.whl (234.6 kB view details)

Uploaded Python 3

File details

Details for the file mca_sdk-0.1.dev20.tar.gz.

File metadata

  • Download URL: mca_sdk-0.1.dev20.tar.gz
  • Upload date:
  • Size: 932.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mca_sdk-0.1.dev20.tar.gz
Algorithm Hash digest
SHA256 e8f383e79830cc11ab3ffe1e491bcb0a652a0aacd615ee3457115d0bb3bb1997
MD5 3f78f3bdfb7de96b398762197ed09438
BLAKE2b-256 d5de4008c63bbcf2b938ebdc31afc7ac2ae78f737bfcb3b25bfd266e1d95bad7

See more details on using hashes here.

File details

Details for the file mca_sdk-0.1.dev20-py3-none-any.whl.

File metadata

  • Download URL: mca_sdk-0.1.dev20-py3-none-any.whl
  • Upload date:
  • Size: 234.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mca_sdk-0.1.dev20-py3-none-any.whl
Algorithm Hash digest
SHA256 21051166ff668019a918cd646852902b500be91a615e0ef54bdbd1ccc97a0bb8
MD5 07255e5759fd1d688f1a2db5617f5a07
BLAKE2b-256 b748972a7e6d7cfd63d318182f37905c270b7a0407b38b01d20875f19868d908

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page