AI Visibility Anonymizer - Privacy-preserving middleware for LLMs
Project description
AVA Protocol
AI Visibility Anonymizer - Privacy-preserving middleware for LLM interactions with reversible tokenization.
What is AVA?
AVA Protocol sanitizes sensitive data (PII/PHI) before it reaches AI systems, maintains cryptographically-signed audit trails, and enables faithful restoration of original values in AI outputs.
Key Innovation: Reversible tokenization preserves both privacy AND data utility.
import ava
client = ava.Client(engine="presidio", policy="healthcare_strict")
with client.session(reversibility=True) as session:
# Original: "Patient John Smith, SSN 123-45-6789"
safe = session.sanitize(text)
# Sanitized: "Patient AVA_PERS_xK9mP2nQ, SSN AVA_SSN_fG5hI6jK"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": safe}]
)
final = session.restore(response) # Original values restored!
Installation
# Gateway mode (lightweight, ~50KB)
pip install ava-protocol
# Embedded mode (local ML, ~500MB)
pip install ava-protocol[local]
# Cloud integrations
pip install ava-protocol[aws] # AWS Macie
pip install ava-protocol[azure] # Azure PII
pip install ava-protocol[gcp] # Google DLP
pip install ava-protocol[all] # Everything
Quick Start
1. Gateway Mode (Recommended for Most Users)
Connect to a remote AVA Gateway server:
import ava
client = ava.Client(
gateway_url="https://ava-gateway.company.com",
api_key="your-api-key",
policy="general_moderate"
)
with client.session(reversibility=True) as session:
clean = session.sanitize("Contact john@example.com")
print(clean) # Contact AVA_EMAI_UfhwZS_2
2. Embedded Mode (Self-Contained)
Run everything locally with Presidio:
import ava
client = ava.Client(
engine="presidio",
policy="healthcare_strict",
vault_type="memory"
)
with client.session(reversibility=True) as session:
medical_text = "Patient: Sarah Johnson, DOB: 1985-03-15"
safe = session.sanitize(medical_text)
print(safe) # Patient: AVA_PERS_xK9mP2nQ, DOB: AVA_DATE_aB3cD4eF
3. Mock Engine (Testing/CI)
No ML dependencies - perfect for unit tests:
import ava
client = ava.Client(engine="mock", policy="general_moderate")
with client.session() as session:
result = session.sanitize("Email: test@example.com")
assert "AVA_EMAI_" in result
Operating Modes
Mode 1: Embedded (Local Presidio)
Self-contained deployment for air-gapped environments.
import ava
client = ava.Client(
engine="presidio",
policy="healthcare_strict",
vault_type="sqlite",
vault_config={
"db_path": "/secure/ava_vault.db",
"encryption_key": os.environ["VAULT_KEY"]
}
)
with client.session(reversibility=True, ttl=3600) as session:
medical_record = """
Patient: Maria Gonzalez
DOB: 1985-03-15
SSN: 123-45-6789
Email: maria.g@healthmail.com
"""
sanitized = session.sanitize(medical_record)
# Send to OpenAI
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": sanitized}]
)
# Restore original values
final = session.restore(response['choices'][0]['message']['content'])
Mode 2: Gateway (Remote Client)
Thin client connecting to remote AVA Gateway.
import ava
client = ava.Client(
gateway_url="https://ava-gateway.company.com",
api_key="ava_sk_live_abc123xyz789",
policy="general_moderate"
)
with client.session(reversibility=True) as session:
customer_email = """
Hi, this is Robert Chen from Acme Corp.
My credit card ending in 4532 was charged twice.
"""
safe_text = session.sanitize(customer_email)
response = support_ai.process(safe_text)
readable = session.restore(response)
Environment-based config:
# .env
AVA_GATEWAY_URL=https://ava.internal.company.com
AVA_API_KEY=ava_sk_live_xxx
AVA_POLICY=healthcare_strict
client = ava.Client.from_env() # Auto-loads from environment
Mode 3: Mock Engine (Testing)
Regex-based detection for CI/CD.
import ava
import pytest
@pytest.fixture
def mock_client():
return ava.Client(engine="mock", policy="general_moderate")
def test_email_detection(mock_client):
with mock_client.session() as session:
text = "Contact us at support@example.com"
result = session.sanitize(text)
assert "AVA_EMAI_" in result
def test_reversibility(mock_client):
with mock_client.session(reversibility=True) as session:
original = "Patient: John Doe"
sanitized = session.sanitize(original)
restored = session.restore(sanitized)
assert restored == original
Mode 4: AWS Macie Adapter
import ava
client = ava.Client(
engine="aws_macie",
policy="financial_paranoid",
engine_config={
"region": "us-east-1",
"custom_data_identifiers": ["employee-id-pattern"]
}
)
with client.session(reversibility=True) as session:
with open("customer_data.csv", "r") as f:
content = f.read()
sanitized = session.sanitize(content)
insights = sagemaker_model.analyze(sanitized)
report = session.restore(insights)
Mode 5: Azure PII Adapter
import ava
client = ava.Client(
engine="azure_pii",
policy="healthcare_strict",
engine_config={
"endpoint": "https://ava-pii.cognitiveservices.azure.com",
"domain_filter": "phi"
}
)
with client.session(reversibility=True) as session:
clinical_notes = "Dr. Sarah Johnson examined patient Michael Brown."
sanitized = session.sanitize(clinical_notes)
response = azure_openai.ChatCompletion.create(
deployment_id="gpt-4",
messages=[{"role": "user", "content": sanitized}]
)
final = session.restore(response['choices'][0]['message']['content'])
Mode 6: Google DLP Adapter
import ava
client = ava.Client(
engine="google_dlp",
policy="legal_confidential",
engine_config={
"project_id": "my-gcp-project",
"min_likelihood": "LIKELY"
}
)
with client.session(reversibility=True) as session:
legal_doc = "ATTORNEY-CLIENT PRIVILEGED From: attorney@lawfirm.com"
sanitized = session.sanitize(legal_doc)
summary = legal_ai.summarize(sanitized)
privileged = session.restore(summary)
Vault Types
Memory Vault (Default)
client = ava.Client(engine="presidio", vault_type="memory")
# In-process storage, never touches disk
# Auto-purged on session exit
SQLite Vault (Persistent)
client = ava.Client(
engine="presidio",
vault_type="sqlite",
vault_config={
"db_path": "/secure/ava_vault.db",
"encryption_key": os.environ["VAULT_KEY"],
"journal_mode": "WAL"
}
)
# AES-256 encryption
# Survives process restart
Redis Vault (Distributed)
client = ava.Client(
engine="presidio",
vault_type="redis",
vault_config={
"host": "redis.company.com",
"port": 6379,
"password": os.environ["REDIS_PASSWORD"],
"ssl": True
}
)
# Cross-machine session sharing
# Microservices support
Policies
Built-in Policies
# HIPAA-compliant healthcare
client = ava.Client(policy="healthcare_strict")
# PCI-DSS level 1 financial
client = ava.Client(policy="financial_paranoid")
# Attorney-client privilege
client = ava.Client(policy="legal_confidential")
# Balanced business use
client = ava.Client(policy="general_moderate")
# Scientific data (irreversible)
client = ava.Client(policy="research_anonymized")
Custom Policy (YAML)
# policies/enterprise.yaml
name: enterprise_gdpr
entity_sensitivity:
PERS: 5 # Always protected
EMAI: 5
PHON: 4
DATE: 2
thresholds:
min_confidence: 0.85
retention:
session_ttl: 3600
audit_retention: 90d
client = ava.Client(policy="/path/to/policies/enterprise.yaml")
Async API
import asyncio
import ava
async def process_documents():
client = ava.AsyncClient(engine="presidio", policy="general_moderate")
documents = ["Doc 1...", "Doc 2...", "Doc 3..."]
async with client.session() as session:
# Process all concurrently
sanitized = await asyncio.gather(*[
session.sanitize(doc) for doc in documents
])
# Send to AI concurrently
responses = await asyncio.gather(*[
call_llm(doc) for doc in sanitized
])
# Restore all concurrently
final = await asyncio.gather(*[
session.restore(r) for r in responses
])
return final
asyncio.run(process_documents())
Production Workflow: Healthcare API
import ava
from fastapi import FastAPI
app = FastAPI()
client = ava.Client(engine="presidio", policy="healthcare_strict")
@app.post("/summarize-record")
async def summarize(record_id: str):
record = ehr_system.get_record(record_id)
with client.session(reversibility=True, ttl=1800) as session:
# 1. Sanitize before AI
safe = session.sanitize(record)
# 2. Send to OpenAI
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": safe}]
)
# 3. Restore PHI
summary = session.restore(
response['choices'][0]['message']['content']
)
# 4. Audit
audit_log.store(session.manifest)
return {"summary": summary, "manifest_id": session.manifest.id}
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Your App │────▶│ AVA Client │────▶│ Engine │
│ │ │ (Embedded │ │ (Presidio, │
│ │◀────│ or │◀────│ AWS, etc) │
│ │ │ Gateway) │ │ │
└─────────────┘ └──────┬──────┘ └─────────────┘
│
┌──────┴──────┐
│ Token Vault │
│ (Memory/ │
│ SQLite/ │
│ Redis) │
└─────────────┘
License
MIT License - see LICENSE
Author: Gerald Enrique Nelson Mc Kenzie
DOI: 10.5281/zenodo.19111004
Version: 0.1.0 | March 2026
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ava_protocol-0.1.3.tar.gz.
File metadata
- Download URL: ava_protocol-0.1.3.tar.gz
- Upload date:
- Size: 54.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da3414b33bf2cbf92345a8dc30e3e24c96738c15d21871d0837afe95095a2ee7
|
|
| MD5 |
54eabf8c66aba41055260dc5f7b01856
|
|
| BLAKE2b-256 |
9926385ebd2602533acdeeeebce7c9ebb346a8b20b60c97cdf0e0c64bb28fb14
|
File details
Details for the file ava_protocol-0.1.3-py3-none-any.whl.
File metadata
- Download URL: ava_protocol-0.1.3-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc36a1d59a20a0a28ffba8eeeb4d23c21d008e53d08b83d76f923f4a1b8e3450
|
|
| MD5 |
66ca6ae34754296a31a2479e31bb30c9
|
|
| BLAKE2b-256 |
adf1c3f4ffc62ecfa280a94bf48a0c2638d30c1fa7fdbb9139829f614064e3e7
|