AI-powered HIPAA compliance scanner for healthcare software teams
Project description
HIPAA-Guard
A multi-agent AI system that scans healthcare software codebases for HIPAA violations at commit time. Catches PHI exposure, hardcoded FHIR credentials, overly broad OAuth scopes, missing audit controls, and more — before code ships.
How It Works
HIPAA-Guard runs a 7-agent pipeline triggered by git pre-commit hooks, CI/CD events, or manual CLI commands:
[Trigger] → Scanner → Analyst → Fix Agent ──────────────────→ Verifier
│ │
Reviewer Agent ← REJECT/MODIFY │
(adversarial) │
│ │
APPROVE ────────────────────→ [Clean? Loop max 3x]
Educator Agent (parallel — regulatory citations)
Trend Agent (weekly posture reports + drift detection)
| Agent | Role | LLM? |
|---|---|---|
| Scanner | Regex + AST detection across 15 HIPAA rules | No — pure Python, <10s |
| Analyst | Disambiguates ambiguous findings (test fixture vs real PHI?) | Yes |
| Fix Agent | Proposes minimal diffs for confirmed violations | Yes |
| Reviewer | Adversarially critiques proposed fixes — finds problems, not validates | Yes |
| Verifier | Re-scans fixed file in memory to confirm resolution, no new issues | No |
| Educator | Plain-English regulatory explanation with CFR citation per finding | Yes |
| Trend | 30-day posture tracking, drift detection, weekly reports | Yes |
What It Detects
| Rule | Description | Severity |
|---|---|---|
| CRED-001 | Hardcoded FHIR client_secret |
CRITICAL |
| CRED-002 | Hardcoded API key or bearer token | CRITICAL |
| CRED-003 | FHIR URL with embedded credentials | CRITICAL |
| FHIR-003 | FHIR endpoint over HTTP (not HTTPS) | CRITICAL |
| PHI-001 | SSN pattern in source code | HIGH |
| PHI-003 | MRN in string literal | HIGH |
| PHI-004 | Patient name passed directly to logger | HIGH |
| FHIR-001 | SMART on FHIR patient/* wildcard scope |
HIGH |
| FHIR-002 | SMART on FHIR user/* wildcard scope |
HIGH |
| AUDIT-001 | PHI accessed in function without audit log call | HIGH |
| PHI-002 | Date of birth in string literal | MEDIUM |
| PHI-006 | PHI interpolated in f-string | MEDIUM |
| AUDIT-002 | PHI returned in HTTP response without encryption check | MEDIUM |
| ENC-001 | PHI written to unencrypted file | HIGH |
| PHI-005 | NPI number in source code | MEDIUM |
Quick Start
Requirements: Python 3.9+
# Clone and scan a directory
git clone https://github.com/slindelow/hipaa-guard
cd hipaa-guard
python3 cli.py scan /path/to/your/healthcare/app
# Install pre-commit hook (blocks commits with CRITICAL/HIGH violations)
python3 cli.py install-hook
# Full repo audit + compliance document
python3 cli.py audit
# Check dependency BAA status
python3 cli.py baa
Enable AI pipeline (Analyst, Fix Agent, Educator, Reviewer):
export ANTHROPIC_API_KEY=your_key_here
python3 cli.py scan /path/to/your/app
Without an API key, the scanner runs standalone — fast, no API calls, catches all high-confidence violations.
PHI Safety Architecture
The system reads source code that may contain PHI. Three layers prevent leakage:
phi_safe_handler.pywraps all file I/O — PHI values are replaced with typed placeholders ([REDACTED_SSN],[REDACTED_MRN]) before reaching any log, state file, or LLM prompt. Code structure is preserved; values are not.- Ephemeral session state — scan state lives in a UUID-scoped temp directory, deleted on completion. Never committed, never uploaded.
- LLM prompts use minimum context — agents receive ±50 lines around a finding with PHI redacted. The system prompt for all LLM agents explicitly prohibits PHI reconstruction.
HIPAA BAA note: If your Anthropic account does not have a signed HIPAA BAA, set
"has_hipaa_baa": falseinhipaa-guard.config.json. The system will route all LLM analysis through a local Ollama endpoint instead, keeping all data on-premise.
Differentiating Features
Dependency BAA Checker
Parses your requirements.txt / package.json / pyproject.toml, cross-references a maintained BAA registry, and identifies which packages actually touch PHI-handling code. No existing tool does this automatically.
python3 cli.py baa
FHIR Vendor Profile Validation
Validates FHIR resource structures against Epic, Cerner, and base R4 profiles. Catches vendor-specific extension gaps at commit time instead of during EHR sandbox testing (where they cost days to debug).
Auto-Generated Compliance Documentation
On demand or on release tag, generates a structured HIPAA Technical Safeguards Summary — PHI access points, encryption usage, audit log coverage, FHIR endpoints — mapped to CFR sections. A compliance officer-ready draft in seconds.
False Positive Learning Loop
Mark a finding as a false positive:
python3 cli.py fp <finding_id> --reason "test fixture"
After 30 days, rules with >30% FP rate have their confidence thresholds automatically adjusted. The scanner self-calibrates without any model fine-tuning.
Configuration
hipaa-guard.config.json — project-level settings (commit this):
{
"fhir_targets": ["epic", "cerner"],
"llm": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"has_hipaa_baa": false
},
"scan": {
"onboarding_mode": true
}
}
Set onboarding_mode: true on first run — findings are reported but commits are not blocked while you triage false positives.
config/rule_config.json — per-rule severity, risk tier, confidence thresholds. Updated automatically by the FP feedback loop.
CLI Reference
python3 cli.py scan <path> [path2 ...] Scan files or directories
python3 cli.py audit Full repo audit + compliance doc
python3 cli.py fp <finding_id> Mark finding as false positive
python3 cli.py fn <file> <line> Report a missed finding
python3 cli.py install-hook Install pre-commit git hook
python3 cli.py baa Check dependency BAA status
Project Structure
cli.py Entry point
hipaa-guard.config.json Project config (commit this)
config/
rule_config.json Rule thresholds and risk tiers
baa_registry.json Package → BAA status registry
tools/
scanner.py Core detection engine (no LLM)
phi_pattern_library.py 15 HIPAA detection rules
phi_safe_handler.py PHI redaction for all file I/O
state_manager.py Typed shared state between agents
baa_checker.py Dependency BAA analysis
compliance_doc_generator.py HIPAA safeguard summary generator
git_hook_installer.py Pre-commit hook setup
rule_config_updater.py FP feedback → confidence adjustment
agent/
orchestrator.py Deterministic pipeline dispatcher
analyst_agent.py LLM: disambiguates ambiguous findings
fix_agent.py LLM: proposes minimal diffs
reviewer_agent.py LLM: adversarial fix critique
educator_agent.py LLM: CFR citations + plain-English explanations
verifier.py Re-scans fixed files to confirm resolution
trend_agent.py Posture tracking + weekly reports
workflows/
pre_commit_scan.md SOP: pre-commit trigger
fix_loop.md SOP: Fix → Reviewer → Verifier loop
manual_audit.md SOP: full repo audit
false_positive_intake.md SOP: FP recording and feedback
weekly_trend_report.md SOP: weekly posture report
Architecture: WAT Framework
Built on the WAT (Workflows, Agents, Tools) pattern:
- Workflows (
workflows/) — Markdown SOPs defining what to do and how - Agents (
agent/) — AI orchestration and reasoning - Tools (
tools/) — Deterministic Python scripts for execution
Probabilistic AI handles reasoning; deterministic code handles execution. Agent failures fall conservatively (e.g., Analyst failure → treat all findings as confirmed violations).
Disclaimer
HIPAA-Guard identifies code-level compliance gaps. It is not a certified HIPAA compliance audit and does not constitute legal advice. Consult a qualified HIPAA compliance officer before certifying compliance. Scan results should be retained per 45 CFR §164.316(b)(2) requirements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hipaa_guard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hipaa_guard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 49.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
202bd60e1537b9af348b6f88ebfd553de05c7bf9a324fef4423fa5f5c30f4b9f
|
|
| MD5 |
104d9cba7360ed8f778bf06c5679f15b
|
|
| BLAKE2b-256 |
76dacc2fce6520759077b3a7a2e5b0b560f680bd23c4f25ff97bc15d111f27df
|