Skip to main content

AI Visibility Anonymizer - Privacy-preserving middleware for LLMs

Project description

AVA Protocol

AI Visibility Anonymizer — Privacy-preserving middleware for LLM interactions with reversible tokenization.

PyPI Python License

Author: Gerald Enrique Nelson Mc Kenzie
DOI: 10.5281/zenodo.19111004
Version: 0.1.0 | March 2026


What is AVA?

AVA Protocol sanitizes sensitive data (PII/PHI) before it reaches AI systems, maintains cryptographically-signed audit trails, and enables faithful restoration of original values in AI outputs.

Key Innovation: Reversible tokenization preserves both privacy AND data utility — the AI works with opaque tokens, and real values are restored only in the final output.

import ava

client = ava.Client(engine="presidio", policy="healthcare_strict")

with client.session(reversibility=True) as session:
    # Original: "Patient John Smith, SSN 123-45-6789"
    safe = session.sanitize(text)
    # Sanitized: "Patient AVA_PERS_xK9mP2nQ, SSN AVA_SSN_fG5hI6jK"

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": safe}]
    )

    final = session.restore(response)  # Original values restored!

Table of Contents

  1. Installation
  2. Architecture
  3. Operating Modes
  4. Vault Types
  5. Policies
  6. Async API
  7. Production Workflows

Installation

Choose your installation based on the modes you need:

# Gateway Client Only (Lightweight, ~50KB)
pip install ava-protocol

# Embedded with Local Presidio (~500MB, includes ML models)
pip install ava-protocol[local]

# AWS Macie integration
pip install ava-protocol[aws]

# Azure PII integration
pip install ava-protocol[azure]

# Google Cloud DLP integration
pip install ava-protocol[gcp]

# Everything (local + aws + azure + gcp + redis)
pip install ava-protocol[all]

Note: Gateway mode requires no extras. Embedded mode requires [local] for Presidio ML models.


Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Your App   │────▶│ AVA Client  │────▶│   Engine    │
│             │     │  (Embedded  │     │  (Presidio, │
│             │◀────│   or        │◀────│   AWS, etc) │
│             │     │   Gateway)  │     │             │
└─────────────┘     └──────┬──────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │ Token Vault │
                    │ (Memory /   │
                    │  SQLite /   │
                    │  Redis)     │
                    └─────────────┘

Operating Modes

Mode 1: Embedded (Local Presidio)

Self-contained deployment. All PII detection happens locally with no external calls. Best for air-gapped or high-security environments.

Install:

pip install ava-protocol[local]

Basic example:

import ava

client = ava.Client(
    engine="presidio",
    policy="healthcare_strict",
    vault_type="memory"
)

with client.session(reversibility=True, ttl=3600) as session:

    medical_record = """
    Patient: Maria Gonzalez
    DOB: 1985-03-15
    SSN: 123-45-6789
    Email: maria.g@healthmail.com
    Diagnosis: Hypertension
    """

    # Sanitize before AI processing — AI never sees real data
    sanitized = session.sanitize(medical_record)
    # Patient: AVA_PERS_xK9mP2nQ
    # DOB: AVA_DATE_aB3cD4eF
    # SSN: AVA_SSN_fG5hI6jK

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )

    # Restore original values in the final output
    final = session.restore(response['choices'][0]['message']['content'])

With SQLite vault (persistent storage):

client = ava.Client(
    engine="presidio",
    policy="financial_paranoid",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"]  # AES-256
    }
)

Mode 2: Gateway (Remote Client)

Thin client that connects to a remote AVA Gateway server. No local ML dependencies — all detection is handled server-side.

Install:

pip install ava-protocol  # No extras needed

Basic example:

import ava

client = ava.Client(
    gateway_url="https://ava-gateway.company.com",
    api_key="ava_sk_live_abc123xyz789",
    policy="general_moderate"
)

# Identical API to embedded mode
with client.session(reversibility=True) as session:

    customer_email = """
    Hi, this is Robert Chen from Acme Corp.
    My credit card ending in 4532 was charged twice.
    Please refund to robert.chen@acme.com.
    """

    safe_text = session.sanitize(customer_email)
    response = support_ai.process(safe_text)
    readable = session.restore(response)

Environment-based config:

# .env
AVA_GATEWAY_URL=https://ava.internal.company.com
AVA_API_KEY=ava_sk_live_xxx
AVA_POLICY=healthcare_strict
AVA_DEFAULT_TTL=1800
# Loads automatically from environment
client = ava.Client.from_env()

Running a Gateway server:

# gateway_server.py — deploy centrally for your organization
from ava.gateway import GatewayServer

server = GatewayServer(
    detection_engine="presidio",
    vault_type="redis",
    vault_config={"host": "redis.company.com", "port": 6379},
    policies_path="/etc/ava/policies/"
)

server.run(
    host="0.0.0.0",
    port=8443,
    tls_cert="/etc/ava/server.crt"
)

Mode 3: Mock Engine (Testing)

Regex-based detection with zero dependencies. Designed for unit tests and CI/CD pipelines where you don't want to install heavyweight ML models.

Detects via regex only: Emails, phone numbers, SSNs, credit card numbers. No NLP.

Unit test example:

import ava
import pytest

@pytest.fixture
def mock_client():
    return ava.Client(
        engine="mock",
        policy="general_moderate",
        vault_type="memory"
    )

def test_email_detection(mock_client):
    with mock_client.session() as session:
        text = "Contact us at support@example.com"
        result = session.sanitize(text)
        assert "AVA_EMAI_" in result
        assert "support@example.com" not in result

def test_reversibility(mock_client):
    with mock_client.session(reversibility=True) as session:
        original = "Patient: John Doe"
        sanitized = session.sanitize(original)
        restored = session.restore(sanitized)
        assert restored == original

CI/CD pipeline (GitHub Actions):

# .github/workflows/test.yml
name: AVA Tests

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install AVA (lightweight)
        run: pip install ava-protocol  # No [local] needed

      - name: Run tests with MockEngine
        run: pytest tests/ -v
        env:
          AVA_TEST_ENGINE: mock

Mode 4: AWS Macie Adapter

Enterprise-grade PII detection using AWS Macie. Supports custom data identifiers for organization-specific patterns.

Install:

pip install ava-protocol[aws]
aws configure

Example:

import ava

client = ava.Client(
    engine="aws_macie",
    policy="financial_paranoid",
    vault_type="memory",
    engine_config={
        "region": "us-east-1",
        "custom_data_identifiers": [
            "employee-id-pattern",
            "customer-account-pattern"
        ]
    }
)

with client.session(reversibility=True) as session:
    with open("customer_data.csv", "r") as f:
        content = f.read()

    sanitized = session.sanitize(content)
    insights = sagemaker_model.analyze(sanitized)
    report = session.restore(insights)

Mode 5: Azure PII Adapter

Microsoft Azure AI Language PII detection. Supports domain filtering (e.g., healthcare PHI only).

Install:

pip install ava-protocol[azure]
export AZURE_LANGUAGE_ENDPOINT=https://your-resource.cognitiveservices.azure.com
export AZURE_LANGUAGE_KEY=your_api_key_here

Example:

import ava

client = ava.Client(
    engine="azure_pii",
    policy="healthcare_strict",
    vault_type="redis",
    vault_config={"host": "redis.company.com"},
    engine_config={
        "endpoint": "https://ava-pii.cognitiveservices.azure.com",
        "domain_filter": "phi"  # Health data only
    }
)

with client.session(reversibility=True) as session:
    clinical_notes = """
    Dr. Sarah Johnson examined patient Michael Brown.
    Patient reports chest pain. Contact: 555-123-4567
    """

    sanitized = session.sanitize(clinical_notes)

    response = azure_openai.ChatCompletion.create(
        deployment_id="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )

    final = session.restore(response['choices'][0]['message']['content'])

Mode 6: Google DLP Adapter

Google Cloud Data Loss Prevention API with 150+ built-in detectors. Supports custom inspect templates for fine-grained control.

Install:

pip install ava-protocol[gcp]
gcloud auth application-default login

Example:

import ava

client = ava.Client(
    engine="google_dlp",
    policy="legal_confidential",
    vault_type="memory",
    engine_config={
        "project_id": "my-gcp-project",
        "inspect_template": "projects/my-gcp-project/inspectTemplates/legal-template",
        "min_likelihood": "LIKELY"
    }
)

with client.session(reversibility=True) as session:
    legal_document = """
    ATTORNEY-CLIENT PRIVILEGED
    From: attorney@lawfirm.com
    Re: Merger Discussion
    """

    sanitized = session.sanitize(legal_document)
    summary = legal_ai.summarize(sanitized)
    privileged_summary = session.restore(summary)

Vault Types

Vaults store the token-to-value mappings that make restoration possible. Choose based on your persistence and scale requirements.

Memory Vault (Default)

client = ava.Client(engine="presidio", vault_type="memory")

In-process dictionary storage. Data never touches disk and is auto-purged on session exit.

Best for: Single-session flows, air-gapped environments, maximum security.

SQLite Vault (Persistent)

client = ava.Client(
    engine="presidio",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"],  # AES-256
        "journal_mode": "WAL"
    }
)

Survives process restarts. Sessions can be resumed by ID.

Best for: Audit trails, long-running workflows, crash recovery.

Redis Vault (Distributed)

client = ava.Client(
    engine="presidio",
    vault_type="redis",
    vault_config={
        "host": "redis.company.com",
        "port": 6379,
        "password": os.environ["REDIS_PASSWORD"],
        "ssl": True
    }
)

Multiple services share tokens. Enables cross-machine session sharing.

Best for: Microservices, load-balanced deployments, multi-stage pipelines.


Policies

Policies control which entity types are detected, at what sensitivity, and how tokens are retained.

Built-in Policies

# HIPAA-compliant: all 18 PHI identifiers at sensitivity 5
client = ava.Client(policy="healthcare_strict")

# PCI-DSS level 1: one-time-use tokens for credit card numbers
client = ava.Client(policy="financial_paranoid")

# Attorney-client privilege: extended retention for matter files
client = ava.Client(policy="legal_confidential")

# Balanced business use: names/emails protected, dates preserved
client = ava.Client(policy="general_moderate")

# Scientific data sharing: irreversible hashing (true anonymization)
client = ava.Client(policy="research_anonymized")

Custom YAML Policy

# policies/enterprise_gdpr.yaml
name: enterprise_gdpr
entity_sensitivity:
  PERS: 5  # Always protected
  EMAI: 5
  PHON: 4
  DATE: 2
thresholds:
  min_confidence: 0.85
retention:
  session_ttl: 3600
  audit_retention: 90d
client = ava.Client(policy="/path/to/policies/enterprise_gdpr.yaml")

Async API

ava.AsyncClient supports concurrent sanitization, AI calls, and restoration using asyncio.gather.

import asyncio
import ava

async def process_documents():
    client = ava.AsyncClient(
        engine="presidio",
        policy="general_moderate"
    )

    documents = ["Doc 1...", "Doc 2...", "Doc 3..."]

    async with client.session() as session:
        # Sanitize all concurrently
        sanitized = await asyncio.gather(*[
            session.sanitize(doc) for doc in documents
        ])

        # Send to AI concurrently
        responses = await asyncio.gather(*[
            call_llm(doc) for doc in sanitized
        ])

        # Restore all concurrently
        final = await asyncio.gather(*[
            session.restore(r) for r in responses
        ])

    return final

asyncio.run(process_documents())

Production Workflows

Healthcare AI Assistant (FastAPI)

import ava
from fastapi import FastAPI

app = FastAPI()
client = ava.Client(engine="presidio", policy="healthcare_strict")

@app.post("/summarize-record")
async def summarize(record_id: str):
    record = ehr_system.get_record(record_id)

    with client.session(reversibility=True, ttl=1800) as session:
        # 1. Sanitize before sending to AI
        safe = session.sanitize(record)

        # 2. Send to OpenAI — PHI never leaves your environment
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": safe}]
        )

        # 3. Restore PHI in the summary
        summary = session.restore(
            response['choices'][0]['message']['content']
        )

        # 4. Store manifest for audit trail
        audit_log.store(session.manifest)

    return {"summary": summary, "manifest_id": session.manifest.id}

Financial Customer Service Bot

class CustomerServiceBot:
    def __init__(self):
        self.client = ava.Client(
            gateway_url="https://ava.bank.internal",
            api_key=os.environ["AVA_API_KEY"],
            policy="financial_paranoid"
        )

    async def handle(self, message: str):
        with self.client.session(reversibility=True) as session:
            # Customer input is sanitized before reaching AI
            # "My card 4532-1234-5678-9012 is wrong"
            # → "My card AVA_CRED_aB3cD4eF is wrong"
            safe = session.sanitize(message)

            ai_response = await claude.complete(f"Customer: {safe}")
            # "I'll check account AVA_CRED_aB3cD4eF"

            # Restore real values for the human agent (not the customer)
            agent_response = session.restore(ai_response)

            return {"to_agent": agent_response}

License

MIT License — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ava_protocol-0.1.4.tar.gz (56.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ava_protocol-0.1.4-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file ava_protocol-0.1.4.tar.gz.

File metadata

  • Download URL: ava_protocol-0.1.4.tar.gz
  • Upload date:
  • Size: 56.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ava_protocol-0.1.4.tar.gz
Algorithm Hash digest
SHA256 6b8ca5ff8c892b2aa5434d9d11d7f12235d6259555ced22d9910ef7080d8577f
MD5 e7ae4fccf30d1b31bee901c3ace610e1
BLAKE2b-256 93c8fde76c0a2eaa55afe7b8d0790fc39cac829997d63c3ae150fad7a5a2bdcc

See more details on using hashes here.

File details

Details for the file ava_protocol-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: ava_protocol-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ava_protocol-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c0c342e78744574abf9bbd53e3ab52cffcad00b95c97425ad5915633e476a830
MD5 135a4d48b434357a3db367104d3a7442
BLAKE2b-256 0cae62ebfff769faddbc4f62e2348ea656368d4f038bcf2d3ad10f4503d2edd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page