Skip to main content

Local HIPAA DLP layer for LLM workflows — MCP server for Claude Code, Claude Desktop, Cursor, Windsurf

Project description

MediGuard AI

HIPAA-compliant AI middleware and primary care onboarding agent. Drop it in front of any LLM and patient conversations are automatically scanned, redacted, and logged. The voice agent layer onboards patients and routes them directly to the right specialist — no forms, no waiting room, no GP appointment just to get a referral.


The Problem

Healthtech companies want to use AI. One message containing patient data is a HIPAA violation and a $1M fine. On top of that, the traditional path to a specialist is broken: patients book a GP appointment, wait weeks, get a referral slip, then wait again. Half those GP visits exist only to route the patient somewhere else.

MediGuard AI fixes both problems.


What It Does

A patient calls in. The agent:

  1. Identifies them — looks up their name against the patient database
  2. Returning patient — skips re-collection, welcomes them back, asks what brings them in
  3. New patient — collects name, DOB, and callback number conversationally (no forms)
  4. Triages — uses Claude to semantically match their concern to the right specialist from a doctor database
  5. Recommends — tells them exactly who to see and why, in plain language
  6. Saves — new patients are written to the database so they never repeat themselves on the next call

Every message is intercepted by the DLP pipeline before reaching any model — the AI gets clean, anonymized input and your compliance trail is built automatically.


Architecture

Patient speaks (Voicerun STT)
        |
        v
  ┌─────────────────────────────┐
  │       DLP Pipeline          │
  │                             │
  │  1. Regex scan              │  ← SSN, MRN, DOB, phone, insurance IDs
  │  2. Baseten triage          │  ← fast binary: sensitive or not?
  │  3. Claude semantic scan    │  ← contextual PHI (diagnoses, meds, etc.)
  │  4. OpenAI second opinion   │  ← cross-validates high-severity findings
  │  5. Redact + Audit log      │  ← HIPAA trail, safe message out
  └─────────────────────────────┘
        |
        | (redacted message)
        v
  ┌─────────────────────────────┐
  │    Onboarding & Triage      │
  │                             │
  │  Extract patient fields     │  ← name, DOB, phone, reason (raw_hint bypass)
  │  Lookup patient DB          │  ← returning vs. new patient
  │  Triage to specialist       │  ← Claude semantic match → doctors DB
  │  Save new patient record    │  ← written back to patients.json
  └─────────────────────────────┘
        |
        v
  LLM responds with full context
  (system prompt adapts to: new/returning, triage done/pending)
        |
        v
  Voicerun TTS → patient hears response

Two Conversation Flows

Returning Patient

Agent:   "Can I start by getting your name?"
Patient: "Alice Johnson"
Agent:   "Welcome back, Alice! I can see you were last in on October 15th.
          What brings you in today?"
Patient: "I've been having chest tightness lately."
Agent:   "Based on that, I'd recommend Dr. Sarah Chen, our Cardiologist —
          she's available Monday, Wednesday, and Friday. Our team will reach
          out to schedule the referral directly, no extra GP visit needed."

New Patient

Agent:   "Can I start by getting your name?"
Patient: "James Park"
         [not found → new patient flow]
Agent:   "Nice to meet you, James! I don't have you in our system yet —
          let me get a few quick details. What's your date of birth?"
Patient: "March 4th, 1988"
Agent:   "Got it. And a callback number?"
Patient: "646-555-0192"
Agent:   "What's the main reason you're calling in today?"
Patient: "My stomach has been hurting a lot, especially after eating."
         [triage fires → Dr. Marcus Rivera, Gastroenterologist]
Agent:   "That sounds like a great fit for Dr. Marcus Rivera, our Gastroenterologist —
          he specializes in exactly that and is available Tuesday and Thursday.
          We'll follow up to get you scheduled."
         [James Park saved to patient DB]

Quickstart

# Install dependencies
pip install -r requirements.txt

# Copy and fill in your API keys
cp .env.example .env

# Run via CLI
python main.py "Hi I'm John Smith, MRN 123456, DOB 04/12/1985. I'm on 50mg sertraline for F32.1. My insurance ID is BCX884521."

# Run the dashboard
streamlit run ui/app.py

# Run the API server
uvicorn api.server:app --reload --port 8008

# Run Voicerun agent
cd voicerun/dlp-health-agent
vr push && vr open

Use as an MCP Server (Claude Code / Desktop / Cursor / Windsurf)

MediGuard ships as an installable MCP server — add a few lines to your client's config and every chat gets a local PHI firewall, redactor, and session replay debugger. Raw data never leaves your machine.

Install

pipx install mediguard-dlp
# or, from this repo:
pip install -e .

This registers a mediguard-dlp CLI on your PATH. Verify with which mediguard-dlp.

Configure your MCP client

Claude Code — add to ~/.claude/settings.json:

{
  "mcpServers": {
    "mediguard-dlp": {
      "command": "mediguard-dlp"
    }
  }
}

Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%/Claude/claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "mediguard-dlp": {
      "command": "mediguard-dlp",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "BASETEN_API_KEY": "...",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Cursor / Windsurf — same config block under their MCP settings.

If ANTHROPIC_API_KEY and BASETEN_API_KEY aren't set, the server runs in regex-only mode — still catches structured identifiers (SSN, MRN, DOB, phone, insurance IDs, ZIP) with zero network calls.

Tools exposed

Tool Purpose
dlp_scan Full pipeline scan — regex + Baseten triage + Claude semantic
quick_redact Regex-only redaction, sub-millisecond, no API calls
ingest_payload Load a production log, redact PHI locally, save the clean version
replay_session Step a saved session through the agent pipeline
list_sessions List saved debug sessions
check_secrets Show which secret keys are loaded (values never returned)

Claude Code plugin (alternative install)

This repo is also a Claude Code plugin. From Claude Code:

/plugin marketplace add AndreChuabio/dlp-agent
/plugin install mediguard-dlp

Environment Variables

ANTHROPIC_API_KEY=
OPENAI_API_KEY=
BASETEN_API_KEY=
BASETEN_MODEL=deepseek-ai/DeepSeek-V3.1   # swap to test different triage models
YOUCOM_API_KEY=
VOICERUN_API_KEY=

DLP Detection Layers

Layer What it catches
Regex SSN, credit card, email, phone, MRN, NPI, ICD codes, DOB, insurance IDs, medication dosages
Baseten (DeepSeek V3.1) Binary triage — skips Claude if message is clean
Claude (claude-opus-4-6) Contextual PHI — diagnoses, medications, mental health treatment, insurance context
OpenAI (gpt-4o) Cross-validates high severity Claude findings

Specialist Database

8 mocked specialists in voicerun/data/doctors.json, covering the most common referral paths:

Specialist Handles
Dr. Sarah Chen — Cardiologist Chest pain, high cholesterol, hypertension, palpitations
Dr. Marcus Rivera — Gastroenterologist Stomach pain, acid reflux, IBS, bloating
Dr. Priya Patel — Endocrinologist & Dietitian Diabetes, thyroid, weight management, blood sugar
Dr. James Kim — Orthopedic Specialist Back pain, joint pain, sports injuries, arthritis
Dr. Leila Hassan — Dermatologist Rashes, acne, eczema, skin infections
Dr. Michael Torres — Psychiatrist Anxiety, depression, insomnia, PTSD, burnout
Dr. Aisha Okonkwo — Pulmonologist Cough, asthma, breathing difficulty, sleep apnea
Dr. Robert Walsh — General Practitioner Annual physical, flu, fever, anything unclear

Triage is powered by Claude semantic matching — it understands "my stomach has been off" the same as "abdominal discomfort post-meals."


Dashboard

Three tabs:

  • Agent Chat — conversational onboarding. Patient types naturally, DLP fires on every message, agent collects info and triages in real time
  • DLP Scanner — paste any text, see findings broken down by layer with original vs redacted side by side
  • Voice Input — record audio, transcribed via Whisper, then scanned

Sidebar shows collected patient info as the conversation builds. Live HIPAA audit log at the bottom updates after every scan.


Repo Structure

dlp-agent/
├── main.py                          — CLI entry point
├── agent/
│   ├── tools.py                     — DLP pipeline, patient DB, triage logic
│   ├── orchestrator.py              — agent loop
│   └── prompts.py                   — system prompts
├── api/
│   └── server.py                    — FastAPI /chat + /scan endpoints
├── ui/
│   └── app.py                       — Streamlit dashboard
├── voicerun/
│   ├── data/
│   │   ├── doctors.json             — specialist database (8 doctors)
│   │   └── patients.json            — patient records (persistent across calls)
│   └── dlp-health-agent/
│       ├── handler.py               — onboarding + triage voice agent
│       └── README.md                — Voicerun-specific docs
├── .veris/
│   ├── Dockerfile.sandbox           — Veris simulation container
│   └── veris.yaml                   — Veris environment config
└── tests/
    └── test_agent.py                — smoke tests

API

POST /chat
{
  "message": "patient message here",
  "session_id": "optional-session-id"
}

→ {
  "response": "agent reply",
  "session_id": "...",
  "dlp": {
    "safe_to_send": false,
    "findings_count": 12
  }
}

POST /scan
{
  "text": "raw text to scan",
  "user_id": "optional"
}

GET /health

Sponsor Integrations

Sponsor Role
Anthropic Claude Semantic PHI detection + agent responses + patient info extraction + triage matching
Baseten (DeepSeek V3.1) Fast triage gate — skips Claude when message is clean
OpenAI (Whisper + GPT-4o) Voice transcription + high severity validation + agent LLM
You.com Insurance coverage search using ID only
Voicerun Voice agent layer — patients call in, no typing required
Veris AI Adversarial simulation sandbox — validates detection rate

Built at Enterprise Agent Jam NYC.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mediguard_dlp-0.1.1.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mediguard_dlp-0.1.1-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file mediguard_dlp-0.1.1.tar.gz.

File metadata

  • Download URL: mediguard_dlp-0.1.1.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for mediguard_dlp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f5e299ce750399d42e1a6f42aa17def8543bfd1d26d785edcee534f7d4e37961
MD5 60b86df0d6c571b65ecd21393c3bad9e
BLAKE2b-256 575a45bd59d78c7552591ca77525e8c1ec62497bb718576fa04bcfc883e47627

See more details on using hashes here.

File details

Details for the file mediguard_dlp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mediguard_dlp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for mediguard_dlp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5af6df75b815384b47571e93575474ec980e8c6fc547f30da78a458b74405635
MD5 4971ae768e877705459d5a80fd6ec87d
BLAKE2b-256 cf34b0c18ff7071e2c0aced61d3073597aec6942beec97c45dcb1aa0c4527f1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page