Skip to main content

AI Visibility Anonymizer - Privacy-preserving middleware for LLMs

Project description

AVA Protocol

AI Visibility Anonymizer - Privacy-preserving middleware for LLM interactions with reversible tokenization.

PyPI Python License


What is AVA?

AVA Protocol sanitizes sensitive data (PII/PHI) before it reaches AI systems, maintains cryptographically-signed audit trails, and enables faithful restoration of original values in AI outputs.

Key Innovation: Reversible tokenization preserves both privacy AND data utility.

import ava

client = ava.Client(engine="presidio", policy="healthcare_strict")

with client.session(reversibility=True) as session:
    # Original: "Patient John Smith, SSN 123-45-6789"
    safe = session.sanitize(text)
    # Sanitized: "Patient AVA_PERS_xK9mP2nQ, SSN AVA_SSN_fG5hI6jK"

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": safe}]
    )

    final = session.restore(response)  # Original values restored!

Installation

# Gateway mode (lightweight, ~50KB)
pip install ava-protocol

# Embedded mode (local ML, ~500MB)
pip install ava-protocol[local]

# Cloud integrations
pip install ava-protocol[aws]      # AWS Macie
pip install ava-protocol[azure]    # Azure PII
pip install ava-protocol[gcp]      # Google DLP
pip install ava-protocol[all]      # Everything

Quick Start

1. Gateway Mode (Recommended for Most Users)

Connect to a remote AVA Gateway server:

import ava

client = ava.Client(
    gateway_url="https://ava-gateway.company.com",
    api_key="your-api-key",
    policy="general_moderate"
)

with client.session(reversibility=True) as session:
    clean = session.sanitize("Contact john@example.com")
    print(clean)  # Contact AVA_EMAI_UfhwZS_2

2. Embedded Mode (Self-Contained)

Run everything locally with Presidio:

import ava

client = ava.Client(
    engine="presidio",
    policy="healthcare_strict",
    vault_type="memory"
)

with client.session(reversibility=True) as session:
    medical_text = "Patient: Sarah Johnson, DOB: 1985-03-15"
    safe = session.sanitize(medical_text)
    print(safe)  # Patient: AVA_PERS_xK9mP2nQ, DOB: AVA_DATE_aB3cD4eF

3. Mock Engine (Testing/CI)

No ML dependencies - perfect for unit tests:

import ava

client = ava.Client(engine="mock", policy="general_moderate")

with client.session() as session:
    result = session.sanitize("Email: test@example.com")
    assert "AVA_EMAI_" in result

Operating Modes

Mode 1: Embedded (Local Presidio)

Self-contained deployment for air-gapped environments.

import ava

client = ava.Client(
    engine="presidio",
    policy="healthcare_strict",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"]
    }
)

with client.session(reversibility=True, ttl=3600) as session:
    medical_record = """
    Patient: Maria Gonzalez
    DOB: 1985-03-15
    SSN: 123-45-6789
    Email: maria.g@healthmail.com
    """

    sanitized = session.sanitize(medical_record)

    # Send to OpenAI
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )

    # Restore original values
    final = session.restore(response['choices'][0]['message']['content'])

Mode 2: Gateway (Remote Client)

Thin client connecting to remote AVA Gateway.

import ava

client = ava.Client(
    gateway_url="https://ava-gateway.company.com",
    api_key="ava_sk_live_abc123xyz789",
    policy="general_moderate"
)

with client.session(reversibility=True) as session:
    customer_email = """
    Hi, this is Robert Chen from Acme Corp.
    My credit card ending in 4532 was charged twice.
    """

    safe_text = session.sanitize(customer_email)
    response = support_ai.process(safe_text)
    readable = session.restore(response)

Environment-based config:

# .env
AVA_GATEWAY_URL=https://ava.internal.company.com
AVA_API_KEY=ava_sk_live_xxx
AVA_POLICY=healthcare_strict
client = ava.Client.from_env()  # Auto-loads from environment

Mode 3: Mock Engine (Testing)

Regex-based detection for CI/CD.

import ava
import pytest

@pytest.fixture
def mock_client():
    return ava.Client(engine="mock", policy="general_moderate")

def test_email_detection(mock_client):
    with mock_client.session() as session:
        text = "Contact us at support@example.com"
        result = session.sanitize(text)
        assert "AVA_EMAI_" in result

def test_reversibility(mock_client):
    with mock_client.session(reversibility=True) as session:
        original = "Patient: John Doe"
        sanitized = session.sanitize(original)
        restored = session.restore(sanitized)
        assert restored == original

Mode 4: AWS Macie Adapter

import ava

client = ava.Client(
    engine="aws_macie",
    policy="financial_paranoid",
    engine_config={
        "region": "us-east-1",
        "custom_data_identifiers": ["employee-id-pattern"]
    }
)

with client.session(reversibility=True) as session:
    with open("customer_data.csv", "r") as f:
        content = f.read()

    sanitized = session.sanitize(content)
    insights = sagemaker_model.analyze(sanitized)
    report = session.restore(insights)

Mode 5: Azure PII Adapter

import ava

client = ava.Client(
    engine="azure_pii",
    policy="healthcare_strict",
    engine_config={
        "endpoint": "https://ava-pii.cognitiveservices.azure.com",
        "domain_filter": "phi"
    }
)

with client.session(reversibility=True) as session:
    clinical_notes = "Dr. Sarah Johnson examined patient Michael Brown."
    sanitized = session.sanitize(clinical_notes)
    response = azure_openai.ChatCompletion.create(
        deployment_id="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )
    final = session.restore(response['choices'][0]['message']['content'])

Mode 6: Google DLP Adapter

import ava

client = ava.Client(
    engine="google_dlp",
    policy="legal_confidential",
    engine_config={
        "project_id": "my-gcp-project",
        "min_likelihood": "LIKELY"
    }
)

with client.session(reversibility=True) as session:
    legal_doc = "ATTORNEY-CLIENT PRIVILEGED From: attorney@lawfirm.com"
    sanitized = session.sanitize(legal_doc)
    summary = legal_ai.summarize(sanitized)
    privileged = session.restore(summary)

Vault Types

Memory Vault (Default)

client = ava.Client(engine="presidio", vault_type="memory")
# In-process storage, never touches disk
# Auto-purged on session exit

SQLite Vault (Persistent)

client = ava.Client(
    engine="presidio",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"],
        "journal_mode": "WAL"
    }
)
# AES-256 encryption
# Survives process restart

Redis Vault (Distributed)

client = ava.Client(
    engine="presidio",
    vault_type="redis",
    vault_config={
        "host": "redis.company.com",
        "port": 6379,
        "password": os.environ["REDIS_PASSWORD"],
        "ssl": True
    }
)
# Cross-machine session sharing
# Microservices support

Policies

Built-in Policies

# HIPAA-compliant healthcare
client = ava.Client(policy="healthcare_strict")

# PCI-DSS level 1 financial
client = ava.Client(policy="financial_paranoid")

# Attorney-client privilege
client = ava.Client(policy="legal_confidential")

# Balanced business use
client = ava.Client(policy="general_moderate")

# Scientific data (irreversible)
client = ava.Client(policy="research_anonymized")

Custom Policy (YAML)

# policies/enterprise.yaml
name: enterprise_gdpr
entity_sensitivity:
  PERS: 5  # Always protected
  EMAI: 5
  PHON: 4
  DATE: 2
thresholds:
  min_confidence: 0.85
retention:
  session_ttl: 3600
  audit_retention: 90d
client = ava.Client(policy="/path/to/policies/enterprise.yaml")

Async API

import asyncio
import ava

async def process_documents():
    client = ava.AsyncClient(engine="presidio", policy="general_moderate")
    documents = ["Doc 1...", "Doc 2...", "Doc 3..."]

    async with client.session() as session:
        # Process all concurrently
        sanitized = await asyncio.gather(*[
            session.sanitize(doc) for doc in documents
        ])

        # Send to AI concurrently
        responses = await asyncio.gather(*[
            call_llm(doc) for doc in sanitized
        ])

        # Restore all concurrently
        final = await asyncio.gather(*[
            session.restore(r) for r in responses
        ])

    return final

asyncio.run(process_documents())

Production Workflow: Healthcare API

import ava
from fastapi import FastAPI

app = FastAPI()
client = ava.Client(engine="presidio", policy="healthcare_strict")

@app.post("/summarize-record")
async def summarize(record_id: str):
    record = ehr_system.get_record(record_id)

    with client.session(reversibility=True, ttl=1800) as session:
        # 1. Sanitize before AI
        safe = session.sanitize(record)

        # 2. Send to OpenAI
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": safe}]
        )

        # 3. Restore PHI
        summary = session.restore(
            response['choices'][0]['message']['content']
        )

        # 4. Audit
        audit_log.store(session.manifest)

    return {"summary": summary, "manifest_id": session.manifest.id}

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Your App   │────▶│ AVA Client  │────▶│   Engine    │
│             │     │  (Embedded  │     │  (Presidio, │
│             │◀────│   or        │◀────│   AWS, etc) │
│             │     │   Gateway)  │     │             │
└─────────────┘     └──────┬──────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │ Token Vault │
                    │ (Memory/    │
                    │  SQLite/    │
                    │  Redis)     │
                    └─────────────┘

License

MIT License - see LICENSE


Author: Gerald Enrique Nelson Mc Kenzie
DOI: 10.5281/zenodo.19111004
Version: 0.1.0 | March 2026

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ava_protocol-0.1.3.tar.gz (54.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ava_protocol-0.1.3-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file ava_protocol-0.1.3.tar.gz.

File metadata

  • Download URL: ava_protocol-0.1.3.tar.gz
  • Upload date:
  • Size: 54.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ava_protocol-0.1.3.tar.gz
Algorithm Hash digest
SHA256 da3414b33bf2cbf92345a8dc30e3e24c96738c15d21871d0837afe95095a2ee7
MD5 54eabf8c66aba41055260dc5f7b01856
BLAKE2b-256 9926385ebd2602533acdeeeebce7c9ebb346a8b20b60c97cdf0e0c64bb28fb14

See more details on using hashes here.

File details

Details for the file ava_protocol-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ava_protocol-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ava_protocol-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dc36a1d59a20a0a28ffba8eeeb4d23c21d008e53d08b83d76f923f4a1b8e3450
MD5 66ca6ae34754296a31a2479e31bb30c9
BLAKE2b-256 adf1c3f4ffc62ecfa280a94bf48a0c2638d30c1fa7fdbb9139829f614064e3e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page