Skip to main content

AI Visibility Anonymizer - Privacy-preserving middleware for LLMs

Reason this release was yanked:

cleanup

Project description

AVA Protocol

AI Visibility Anonymizer - Privacy-preserving middleware for LLM interactions with reversible tokenization.

PyPI Python License


What is AVA?

AVA Protocol sanitizes sensitive data (PII/PHI) before it reaches AI systems, maintains cryptographically-signed audit trails, and enables faithful restoration of original values in AI outputs.

Key Innovation: Reversible tokenization preserves both privacy AND data utility.

import ava

client = ava.Client(engine="presidio", policy="healthcare_strict")

with client.session(reversibility=True) as session:
    # Original: "Patient John Smith, SSN 123-45-6789"
    safe = session.sanitize(text)
    # Sanitized: "Patient AVA_PERS_xK9mP2nQ, SSN AVA_SSN_fG5hI6jK"

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": safe}]
    )

    final = session.restore(response)  # Original values restored!

Installation

# Gateway mode (lightweight, ~50KB)
pip install ava-protocol

# Embedded mode (local ML, ~500MB)
pip install ava-protocol[local]

# Cloud integrations
pip install ava-protocol[aws]      # AWS Macie
pip install ava-protocol[azure]    # Azure PII
pip install ava-protocol[gcp]      # Google DLP
pip install ava-protocol[all]      # Everything

Quick Start

1. Gateway Mode (Recommended for Most Users)

Connect to a remote AVA Gateway server:

import ava

client = ava.Client(
    gateway_url="https://ava-gateway.company.com",
    api_key="your-api-key",
    policy="general_moderate"
)

with client.session(reversibility=True) as session:
    clean = session.sanitize("Contact john@example.com")
    print(clean)  # Contact AVA_EMAI_UfhwZS_2

2. Embedded Mode (Self-Contained)

Run everything locally with Presidio:

import ava

client = ava.Client(
    engine="presidio",
    policy="healthcare_strict",
    vault_type="memory"
)

with client.session(reversibility=True) as session:
    medical_text = "Patient: Sarah Johnson, DOB: 1985-03-15"
    safe = session.sanitize(medical_text)
    print(safe)  # Patient: AVA_PERS_xK9mP2nQ, DOB: AVA_DATE_aB3cD4eF

3. Mock Engine (Testing/CI)

No ML dependencies - perfect for unit tests:

import ava

client = ava.Client(engine="mock", policy="general_moderate")

with client.session() as session:
    result = session.sanitize("Email: test@example.com")
    assert "AVA_EMAI_" in result

Operating Modes

Mode 1: Embedded (Local Presidio)

Self-contained deployment for air-gapped environments.

import ava

client = ava.Client(
    engine="presidio",
    policy="healthcare_strict",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"]
    }
)

with client.session(reversibility=True, ttl=3600) as session:
    medical_record = """
    Patient: Maria Gonzalez
    DOB: 1985-03-15
    SSN: 123-45-6789
    Email: maria.g@healthmail.com
    """

    sanitized = session.sanitize(medical_record)

    # Send to OpenAI
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )

    # Restore original values
    final = session.restore(response['choices'][0]['message']['content'])

Mode 2: Gateway (Remote Client)

Thin client connecting to remote AVA Gateway.

import ava

client = ava.Client(
    gateway_url="https://ava-gateway.company.com",
    api_key="ava_sk_live_abc123xyz789",
    policy="general_moderate"
)

with client.session(reversibility=True) as session:
    customer_email = """
    Hi, this is Robert Chen from Acme Corp.
    My credit card ending in 4532 was charged twice.
    """

    safe_text = session.sanitize(customer_email)
    response = support_ai.process(safe_text)
    readable = session.restore(response)

Environment-based config:

# .env
AVA_GATEWAY_URL=https://ava.internal.company.com
AVA_API_KEY=ava_sk_live_xxx
AVA_POLICY=healthcare_strict
client = ava.Client.from_env()  # Auto-loads from environment

Mode 3: Mock Engine (Testing)

Regex-based detection for CI/CD.

import ava
import pytest

@pytest.fixture
def mock_client():
    return ava.Client(engine="mock", policy="general_moderate")

def test_email_detection(mock_client):
    with mock_client.session() as session:
        text = "Contact us at support@example.com"
        result = session.sanitize(text)
        assert "AVA_EMAI_" in result

def test_reversibility(mock_client):
    with mock_client.session(reversibility=True) as session:
        original = "Patient: John Doe"
        sanitized = session.sanitize(original)
        restored = session.restore(sanitized)
        assert restored == original

Mode 4: AWS Macie Adapter

import ava

client = ava.Client(
    engine="aws_macie",
    policy="financial_paranoid",
    engine_config={
        "region": "us-east-1",
        "custom_data_identifiers": ["employee-id-pattern"]
    }
)

with client.session(reversibility=True) as session:
    with open("customer_data.csv", "r") as f:
        content = f.read()

    sanitized = session.sanitize(content)
    insights = sagemaker_model.analyze(sanitized)
    report = session.restore(insights)

Mode 5: Azure PII Adapter

import ava

client = ava.Client(
    engine="azure_pii",
    policy="healthcare_strict",
    engine_config={
        "endpoint": "https://ava-pii.cognitiveservices.azure.com",
        "domain_filter": "phi"
    }
)

with client.session(reversibility=True) as session:
    clinical_notes = "Dr. Sarah Johnson examined patient Michael Brown."
    sanitized = session.sanitize(clinical_notes)
    response = azure_openai.ChatCompletion.create(
        deployment_id="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )
    final = session.restore(response['choices'][0]['message']['content'])

Mode 6: Google DLP Adapter

import ava

client = ava.Client(
    engine="google_dlp",
    policy="legal_confidential",
    engine_config={
        "project_id": "my-gcp-project",
        "min_likelihood": "LIKELY"
    }
)

with client.session(reversibility=True) as session:
    legal_doc = "ATTORNEY-CLIENT PRIVILEGED From: attorney@lawfirm.com"
    sanitized = session.sanitize(legal_doc)
    summary = legal_ai.summarize(sanitized)
    privileged = session.restore(summary)

Vault Types

Memory Vault (Default)

client = ava.Client(engine="presidio", vault_type="memory")
# In-process storage, never touches disk
# Auto-purged on session exit

SQLite Vault (Persistent)

client = ava.Client(
    engine="presidio",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"],
        "journal_mode": "WAL"
    }
)
# AES-256 encryption
# Survives process restart

Redis Vault (Distributed)

client = ava.Client(
    engine="presidio",
    vault_type="redis",
    vault_config={
        "host": "redis.company.com",
        "port": 6379,
        "password": os.environ["REDIS_PASSWORD"],
        "ssl": True
    }
)
# Cross-machine session sharing
# Microservices support

Policies

Built-in Policies

# HIPAA-compliant healthcare
client = ava.Client(policy="healthcare_strict")

# PCI-DSS level 1 financial
client = ava.Client(policy="financial_paranoid")

# Attorney-client privilege
client = ava.Client(policy="legal_confidential")

# Balanced business use
client = ava.Client(policy="general_moderate")

# Scientific data (irreversible)
client = ava.Client(policy="research_anonymized")

Custom Policy (YAML)

# policies/enterprise.yaml
name: enterprise_gdpr
entity_sensitivity:
  PERS: 5  # Always protected
  EMAI: 5
  PHON: 4
  DATE: 2
thresholds:
  min_confidence: 0.85
retention:
  session_ttl: 3600
  audit_retention: 90d
client = ava.Client(policy="/path/to/policies/enterprise.yaml")

Async API

import asyncio
import ava

async def process_documents():
    client = ava.AsyncClient(engine="presidio", policy="general_moderate")
    documents = ["Doc 1...", "Doc 2...", "Doc 3..."]

    async with client.session() as session:
        # Process all concurrently
        sanitized = await asyncio.gather(*[
            session.sanitize(doc) for doc in documents
        ])

        # Send to AI concurrently
        responses = await asyncio.gather(*[
            call_llm(doc) for doc in sanitized
        ])

        # Restore all concurrently
        final = await asyncio.gather(*[
            session.restore(r) for r in responses
        ])

    return final

asyncio.run(process_documents())

Production Workflow: Healthcare API

import ava
from fastapi import FastAPI

app = FastAPI()
client = ava.Client(engine="presidio", policy="healthcare_strict")

@app.post("/summarize-record")
async def summarize(record_id: str):
    record = ehr_system.get_record(record_id)

    with client.session(reversibility=True, ttl=1800) as session:
        # 1. Sanitize before AI
        safe = session.sanitize(record)

        # 2. Send to OpenAI
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": safe}]
        )

        # 3. Restore PHI
        summary = session.restore(
            response['choices'][0]['message']['content']
        )

        # 4. Audit
        audit_log.store(session.manifest)

    return {"summary": summary, "manifest_id": session.manifest.id}

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Your App   │────▶│ AVA Client  │────▶│   Engine    │
│             │     │  (Embedded  │     │  (Presidio, │
│             │◀────│   or        │◀────│   AWS, etc) │
│             │     │   Gateway)  │     │             │
└─────────────┘     └──────┬──────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │ Token Vault │
                    │ (Memory/    │
                    │  SQLite/    │
                    │  Redis)     │
                    └─────────────┘

Contributing

git clone https://github.com/yourusername/ava-protocol.git
cd ava-protocol
pip install -e ".[local,dev]"
pytest tests/

License

MIT License - see LICENSE


Author: Gerald Enrique Nelson Mc Kenzie
DOI: 10.5281/zenodo.19111004
Version: 0.1.0 | March 2026

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ava_protocol-0.1.2.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ava_protocol-0.1.2-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file ava_protocol-0.1.2.tar.gz.

File metadata

  • Download URL: ava_protocol-0.1.2.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ava_protocol-0.1.2.tar.gz
Algorithm Hash digest
SHA256 60ebdf862141508d6cef51ca343891038b0a0dbe828c2b4209be30e12fb0d348
MD5 92841f1e687368616473778e6b70fe4d
BLAKE2b-256 c6b453420f31dff9ccda8dabb93d464efe0387ea4007c3d47787b0b8f97c2740

See more details on using hashes here.

File details

Details for the file ava_protocol-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ava_protocol-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ava_protocol-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9783c48c098ec9cd16ef365bc027b34aae239264e4fbcb89fe1590e9fde9966b
MD5 a58f6381e9786322eaed2a891ce33e15
BLAKE2b-256 ac5fb0e7f274aec3b59d768bad39bdc3e6858f5934b26414299780144fe4b385

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page