AI Visibility Anonymizer - Privacy-preserving middleware for LLMs
Project description
AVA Protocol
AI Visibility Anonymizer — Privacy-preserving middleware for LLM interactions with reversible tokenization.
Author: Gerald Enrique Nelson Mc Kenzie
DOI: 10.5281/zenodo.19111004
Version: 0.1.0 | March 2026
What is AVA?
AVA Protocol sanitizes sensitive data (PII/PHI) before it reaches AI systems, maintains cryptographically-signed audit trails, and enables faithful restoration of original values in AI outputs.
Key Innovation: Reversible tokenization preserves both privacy AND data utility — the AI works with opaque tokens, and real values are restored only in the final output.
import ava
client = ava.Client(engine="presidio", policy="healthcare_strict")
with client.session(reversibility=True) as session:
# Original: "Patient John Smith, SSN 123-45-6789"
safe = session.sanitize(text)
# Sanitized: "Patient AVA_PERS_xK9mP2nQ, SSN AVA_SSN_fG5hI6jK"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": safe}]
)
final = session.restore(response) # Original values restored!
Table of Contents
Installation
Choose your installation based on the modes you need:
# Gateway Client Only (Lightweight, ~50KB)
pip install ava-protocol
# Embedded with Local Presidio (~500MB, includes ML models)
pip install ava-protocol[local]
# AWS Macie integration
pip install ava-protocol[aws]
# Azure PII integration
pip install ava-protocol[azure]
# Google Cloud DLP integration
pip install ava-protocol[gcp]
# Everything (local + aws + azure + gcp + redis)
pip install ava-protocol[all]
Note: Gateway mode requires no extras. Embedded mode requires
[local]for Presidio ML models.
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Your App │────▶│ AVA Client │────▶│ Engine │
│ │ │ (Embedded │ │ (Presidio, │
│ │◀────│ or │◀────│ AWS, etc) │
│ │ │ Gateway) │ │ │
└─────────────┘ └──────┬──────┘ └─────────────┘
│
┌──────┴──────┐
│ Token Vault │
│ (Memory / │
│ SQLite / │
│ Redis) │
└─────────────┘
Operating Modes
Mode 1: Embedded (Local Presidio)
Self-contained deployment. All PII detection happens locally with no external calls. Best for air-gapped or high-security environments.
Install:
pip install ava-protocol[local]
Basic example:
import ava
client = ava.Client(
engine="presidio",
policy="healthcare_strict",
vault_type="memory"
)
with client.session(reversibility=True, ttl=3600) as session:
medical_record = """
Patient: Maria Gonzalez
DOB: 1985-03-15
SSN: 123-45-6789
Email: maria.g@healthmail.com
Diagnosis: Hypertension
"""
# Sanitize before AI processing — AI never sees real data
sanitized = session.sanitize(medical_record)
# Patient: AVA_PERS_xK9mP2nQ
# DOB: AVA_DATE_aB3cD4eF
# SSN: AVA_SSN_fG5hI6jK
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": sanitized}]
)
# Restore original values in the final output
final = session.restore(response['choices'][0]['message']['content'])
With SQLite vault (persistent storage):
client = ava.Client(
engine="presidio",
policy="financial_paranoid",
vault_type="sqlite",
vault_config={
"db_path": "/secure/ava_vault.db",
"encryption_key": os.environ["VAULT_KEY"] # AES-256
}
)
Mode 2: Gateway (Remote Client)
Thin client that connects to a remote AVA Gateway server. No local ML dependencies — all detection is handled server-side.
Install:
pip install ava-protocol # No extras needed
Basic example:
import ava
client = ava.Client(
gateway_url="https://ava-gateway.company.com",
api_key="ava_sk_live_abc123xyz789",
policy="general_moderate"
)
# Identical API to embedded mode
with client.session(reversibility=True) as session:
customer_email = """
Hi, this is Robert Chen from Acme Corp.
My credit card ending in 4532 was charged twice.
Please refund to robert.chen@acme.com.
"""
safe_text = session.sanitize(customer_email)
response = support_ai.process(safe_text)
readable = session.restore(response)
Environment-based config:
# .env
AVA_GATEWAY_URL=https://ava.internal.company.com
AVA_API_KEY=ava_sk_live_xxx
AVA_POLICY=healthcare_strict
AVA_DEFAULT_TTL=1800
# Loads automatically from environment
client = ava.Client.from_env()
Running a Gateway server:
# gateway_server.py — deploy centrally for your organization
from ava.gateway import GatewayServer
server = GatewayServer(
detection_engine="presidio",
vault_type="redis",
vault_config={"host": "redis.company.com", "port": 6379},
policies_path="/etc/ava/policies/"
)
server.run(
host="0.0.0.0",
port=8443,
tls_cert="/etc/ava/server.crt"
)
Mode 3: Mock Engine (Testing)
Regex-based detection with zero dependencies. Designed for unit tests and CI/CD pipelines where you don't want to install heavyweight ML models.
Detects via regex only: Emails, phone numbers, SSNs, credit card numbers. No NLP.
Unit test example:
import ava
import pytest
@pytest.fixture
def mock_client():
return ava.Client(
engine="mock",
policy="general_moderate",
vault_type="memory"
)
def test_email_detection(mock_client):
with mock_client.session() as session:
text = "Contact us at support@example.com"
result = session.sanitize(text)
assert "AVA_EMAI_" in result
assert "support@example.com" not in result
def test_reversibility(mock_client):
with mock_client.session(reversibility=True) as session:
original = "Patient: John Doe"
sanitized = session.sanitize(original)
restored = session.restore(sanitized)
assert restored == original
CI/CD pipeline (GitHub Actions):
# .github/workflows/test.yml
name: AVA Tests
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install AVA (lightweight)
run: pip install ava-protocol # No [local] needed
- name: Run tests with MockEngine
run: pytest tests/ -v
env:
AVA_TEST_ENGINE: mock
Mode 4: AWS Macie Adapter
Enterprise-grade PII detection using AWS Macie. Supports custom data identifiers for organization-specific patterns.
Install:
pip install ava-protocol[aws]
aws configure
Example:
import ava
client = ava.Client(
engine="aws_macie",
policy="financial_paranoid",
vault_type="memory",
engine_config={
"region": "us-east-1",
"custom_data_identifiers": [
"employee-id-pattern",
"customer-account-pattern"
]
}
)
with client.session(reversibility=True) as session:
with open("customer_data.csv", "r") as f:
content = f.read()
sanitized = session.sanitize(content)
insights = sagemaker_model.analyze(sanitized)
report = session.restore(insights)
Mode 5: Azure PII Adapter
Microsoft Azure AI Language PII detection. Supports domain filtering (e.g., healthcare PHI only).
Install:
pip install ava-protocol[azure]
export AZURE_LANGUAGE_ENDPOINT=https://your-resource.cognitiveservices.azure.com
export AZURE_LANGUAGE_KEY=your_api_key_here
Example:
import ava
client = ava.Client(
engine="azure_pii",
policy="healthcare_strict",
vault_type="redis",
vault_config={"host": "redis.company.com"},
engine_config={
"endpoint": "https://ava-pii.cognitiveservices.azure.com",
"domain_filter": "phi" # Health data only
}
)
with client.session(reversibility=True) as session:
clinical_notes = """
Dr. Sarah Johnson examined patient Michael Brown.
Patient reports chest pain. Contact: 555-123-4567
"""
sanitized = session.sanitize(clinical_notes)
response = azure_openai.ChatCompletion.create(
deployment_id="gpt-4",
messages=[{"role": "user", "content": sanitized}]
)
final = session.restore(response['choices'][0]['message']['content'])
Mode 6: Google DLP Adapter
Google Cloud Data Loss Prevention API with 150+ built-in detectors. Supports custom inspect templates for fine-grained control.
Install:
pip install ava-protocol[gcp]
gcloud auth application-default login
Example:
import ava
client = ava.Client(
engine="google_dlp",
policy="legal_confidential",
vault_type="memory",
engine_config={
"project_id": "my-gcp-project",
"inspect_template": "projects/my-gcp-project/inspectTemplates/legal-template",
"min_likelihood": "LIKELY"
}
)
with client.session(reversibility=True) as session:
legal_document = """
ATTORNEY-CLIENT PRIVILEGED
From: attorney@lawfirm.com
Re: Merger Discussion
"""
sanitized = session.sanitize(legal_document)
summary = legal_ai.summarize(sanitized)
privileged_summary = session.restore(summary)
Vault Types
Vaults store the token-to-value mappings that make restoration possible. Choose based on your persistence and scale requirements.
Memory Vault (Default)
client = ava.Client(engine="presidio", vault_type="memory")
In-process dictionary storage. Data never touches disk and is auto-purged on session exit.
Best for: Single-session flows, air-gapped environments, maximum security.
SQLite Vault (Persistent)
client = ava.Client(
engine="presidio",
vault_type="sqlite",
vault_config={
"db_path": "/secure/ava_vault.db",
"encryption_key": os.environ["VAULT_KEY"], # AES-256
"journal_mode": "WAL"
}
)
Survives process restarts. Sessions can be resumed by ID.
Best for: Audit trails, long-running workflows, crash recovery.
Redis Vault (Distributed)
client = ava.Client(
engine="presidio",
vault_type="redis",
vault_config={
"host": "redis.company.com",
"port": 6379,
"password": os.environ["REDIS_PASSWORD"],
"ssl": True
}
)
Multiple services share tokens. Enables cross-machine session sharing.
Best for: Microservices, load-balanced deployments, multi-stage pipelines.
Policies
Policies control which entity types are detected, at what sensitivity, and how tokens are retained.
Built-in Policies
# HIPAA-compliant: all 18 PHI identifiers at sensitivity 5
client = ava.Client(policy="healthcare_strict")
# PCI-DSS level 1: one-time-use tokens for credit card numbers
client = ava.Client(policy="financial_paranoid")
# Attorney-client privilege: extended retention for matter files
client = ava.Client(policy="legal_confidential")
# Balanced business use: names/emails protected, dates preserved
client = ava.Client(policy="general_moderate")
# Scientific data sharing: irreversible hashing (true anonymization)
client = ava.Client(policy="research_anonymized")
Custom YAML Policy
# policies/enterprise_gdpr.yaml
name: enterprise_gdpr
entity_sensitivity:
PERS: 5 # Always protected
EMAI: 5
PHON: 4
DATE: 2
thresholds:
min_confidence: 0.85
retention:
session_ttl: 3600
audit_retention: 90d
client = ava.Client(policy="/path/to/policies/enterprise_gdpr.yaml")
Async API
ava.AsyncClient supports concurrent sanitization, AI calls, and restoration using asyncio.gather.
import asyncio
import ava
async def process_documents():
client = ava.AsyncClient(
engine="presidio",
policy="general_moderate"
)
documents = ["Doc 1...", "Doc 2...", "Doc 3..."]
async with client.session() as session:
# Sanitize all concurrently
sanitized = await asyncio.gather(*[
session.sanitize(doc) for doc in documents
])
# Send to AI concurrently
responses = await asyncio.gather(*[
call_llm(doc) for doc in sanitized
])
# Restore all concurrently
final = await asyncio.gather(*[
session.restore(r) for r in responses
])
return final
asyncio.run(process_documents())
Production Workflows
Healthcare AI Assistant (FastAPI)
import ava
from fastapi import FastAPI
app = FastAPI()
client = ava.Client(engine="presidio", policy="healthcare_strict")
@app.post("/summarize-record")
async def summarize(record_id: str):
record = ehr_system.get_record(record_id)
with client.session(reversibility=True, ttl=1800) as session:
# 1. Sanitize before sending to AI
safe = session.sanitize(record)
# 2. Send to OpenAI — PHI never leaves your environment
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": safe}]
)
# 3. Restore PHI in the summary
summary = session.restore(
response['choices'][0]['message']['content']
)
# 4. Store manifest for audit trail
audit_log.store(session.manifest)
return {"summary": summary, "manifest_id": session.manifest.id}
Financial Customer Service Bot
class CustomerServiceBot:
def __init__(self):
self.client = ava.Client(
gateway_url="https://ava.bank.internal",
api_key=os.environ["AVA_API_KEY"],
policy="financial_paranoid"
)
async def handle(self, message: str):
with self.client.session(reversibility=True) as session:
# Customer input is sanitized before reaching AI
# "My card 4532-1234-5678-9012 is wrong"
# → "My card AVA_CRED_aB3cD4eF is wrong"
safe = session.sanitize(message)
ai_response = await claude.complete(f"Customer: {safe}")
# "I'll check account AVA_CRED_aB3cD4eF"
# Restore real values for the human agent (not the customer)
agent_response = session.restore(ai_response)
return {"to_agent": agent_response}
License
MIT License — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ava_protocol-0.1.4.tar.gz.
File metadata
- Download URL: ava_protocol-0.1.4.tar.gz
- Upload date:
- Size: 56.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b8ca5ff8c892b2aa5434d9d11d7f12235d6259555ced22d9910ef7080d8577f
|
|
| MD5 |
e7ae4fccf30d1b31bee901c3ace610e1
|
|
| BLAKE2b-256 |
93c8fde76c0a2eaa55afe7b8d0790fc39cac829997d63c3ae150fad7a5a2bdcc
|
File details
Details for the file ava_protocol-0.1.4-py3-none-any.whl.
File metadata
- Download URL: ava_protocol-0.1.4-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0c342e78744574abf9bbd53e3ab52cffcad00b95c97425ad5915633e476a830
|
|
| MD5 |
135a4d48b434357a3db367104d3a7442
|
|
| BLAKE2b-256 |
0cae62ebfff769faddbc4f62e2348ea656368d4f038bcf2d3ad10f4503d2edd5
|