Skip to main content

Privacy scrubbing for GUI automation data - PII/PHI detection and redaction

Project description

openadapt-privacy

Build Status PyPI version Downloads License: MIT Python 3.10+

Privacy scrubbing for GUI automation data - PII/PHI detection and redaction.

Installation

pip install openadapt-privacy

For Presidio-based scrubbing (recommended):

pip install openadapt-privacy[presidio]
python -m spacy download en_core_web_trf

Quick Start

Text Scrubbing

from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()

text = "Contact John Smith at john.smith@example.com or 555-123-4567"
scrubbed = scrubber.scrub_text(text)

Input:

Contact John Smith at john.smith@example.com or 555-123-4567

Output:

Contact <PERSON> at <EMAIL_ADDRESS> or <PHONE_NUMBER>

Example Inputs & Outputs

Input Output
My email is john@example.com My email is <EMAIL_ADDRESS>
SSN: 923-45-6789 SSN: <US_SSN>
Card: 4532-1234-5678-9012 Card: <CREDIT_CARD>
Call me at 555-123-4567 Call me at <PHONE_NUMBER>
DOB: 01/15/1985 DOB: <DATE_TIME>
Contact John Smith Contact <PERSON>

Dict Scrubbing

Scrub PII from nested dictionaries (e.g., GUI element trees):

from openadapt_privacy import scrub_dict
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()
action = {
    "text": "Email: john@example.com",
    "metadata": {
        "title": "User Profile - John Smith",
        "tooltip": "Click to contact john@example.com",
    },
    "coordinates": {"x": 100, "y": 200},
}
scrubbed = scrub_dict(action, scrubber)

Input:

{
    "text": "Email: john@example.com",
    "metadata": {
        "title": "User Profile - John Smith",
        "tooltip": "Click to contact john@example.com"
    },
    "coordinates": {"x": 100, "y": 200}
}

Output:

{
    "text": "Email: <EMAIL_ADDRESS>",
    "metadata": {
        "title": "User Profile - <PERSON>",
        "tooltip": "Click to contact <EMAIL_ADDRESS>"
    },
    "coordinates": {"x": 100, "y": 200}
}

Recording Pipeline

Process complete GUI automation recordings:

from openadapt_privacy import DictRecordingLoader
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()
loader = DictRecordingLoader()

recording = loader.load_from_dict({
    "task_description": "Send email to John Smith at john@example.com",
    "actions": [
        {"id": 1, "action_type": "click", "text": "Compose", "timestamp": 1000},
        {"id": 2, "action_type": "type", "text": "john@example.com", "timestamp": 2000},
        {"id": 3, "action_type": "click", "text": "Send", "window_title": "Email to john@example.com", "timestamp": 3000},
    ],
})

scrubbed = recording.scrub(scrubber)

Input Recording:

task_description: "Send email to John Smith at john@example.com"

actions:
  [1] click: "Compose"
  [2] type:  "john@example.com"
  [3] click: "Send" (window: "Email to john@example.com")

Output Recording:

task_description: "Send email to <PERSON> at <EMAIL_ADDRESS>"

actions:
  [1] click: "Compose"
  [2] type:  "<EMAIL_ADDRESS>"
  [3] click: "Send" (window: "Email to <EMAIL_ADDRESS>")

Image Scrubbing

Redact PII from screenshots using OCR + NER:

from PIL import Image
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()

image = Image.open("screenshot.png")
scrubbed_image = scrubber.scrub_image(image)
scrubbed_image.save("screenshot_scrubbed.png")

Input Screenshot:

Original screenshot with PII

Output Screenshot:

Scrubbed screenshot with PII redacted

The image redactor:

  1. Runs OCR to detect text regions
  2. Analyzes text for PII entities (email, phone, SSN, etc.)
  3. Fills detected PII regions with solid color (configurable, default: red)

Custom Data Loader

Implement your own loader for custom storage formats:

from openadapt_privacy import RecordingLoader, Recording

class SQLiteRecordingLoader(RecordingLoader):
    def __init__(self, db_path: str):
        self.db_path = db_path

    def load(self, recording_id: str) -> Recording:
        # Load from SQLite database
        ...

    def save(self, recording: Recording, recording_id: str) -> None:
        # Save to SQLite database
        ...

# Usage
loader = SQLiteRecordingLoader("recordings.db")
scrubber = PresidioScrubbingProvider()

# Load, scrub, and save
scrubbed = loader.load_and_scrub("recording_001", scrubber)
loader.save(scrubbed, "recording_001_scrubbed")

Configuration

from openadapt_privacy.config import PrivacyConfig

custom_config = PrivacyConfig(
    SCRUB_CHAR="X",                    # Character for scrub_text_all
    SCRUB_FILL_COLOR=0xFF0000,         # Red for image redaction (BGR)
    SCRUB_KEYS_HTML=[                  # Keys to scrub in dicts
        "text", "value", "title", "tooltip", "custom_field"
    ],
    SCRUB_PRESIDIO_IGNORE_ENTITIES=[   # Entity types to skip
        "DATE_TIME",
    ],
)

Supported Entity Types

Entity Example Input Example Output
PERSON John Smith <PERSON>
EMAIL_ADDRESS john@example.com <EMAIL_ADDRESS>
PHONE_NUMBER 555-123-4567 <PHONE_NUMBER>
US_SSN 923-45-6789 <US_SSN>
CREDIT_CARD 4532-1234-5678-9012 <CREDIT_CARD>
US_BANK_NUMBER 635526789012 <US_BANK_NUMBER>
US_DRIVER_LICENSE A123-456-789-012 <US_DRIVER_LICENSE>
DATE_TIME 01/15/1985 <DATE_TIME>
LOCATION Toronto, ON <LOCATION>

Architecture

openadapt_privacy/
├── base.py           # ScrubbingProvider, TextScrubbingMixin
├── config.py         # PrivacyConfig dataclass
├── loaders.py        # Recording, Action, Screenshot, RecordingLoader
├── providers/
│   ├── __init__.py   # ScrubProvider registry
│   └── presidio.py   # PresidioScrubbingProvider
└── pipelines/
    └── dicts.py      # scrub_dict, scrub_list_dicts

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt_privacy-0.1.1.tar.gz (49.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openadapt_privacy-0.1.1-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file openadapt_privacy-0.1.1.tar.gz.

File metadata

  • Download URL: openadapt_privacy-0.1.1.tar.gz
  • Upload date:
  • Size: 49.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openadapt_privacy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7e0b2a3ed67052c553b598cea6a99d5500f6364bce6c7ef21bf8cb639539b38b
MD5 41fce8f8d87bea2cfe28ba546711969e
BLAKE2b-256 0cb9b4b19c7b0664ba2834f1da046d8ba7fce4cbddf7014969f548fea07e09b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_privacy-0.1.1.tar.gz:

Publisher: release.yml on OpenAdaptAI/openadapt-privacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openadapt_privacy-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for openadapt_privacy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1737ab682baa17773c4f7b83d88aee0549040b7144b26b2391d0a182c42d89a1
MD5 b27042bd1ac135b2d9ef9f38b6e06510
BLAKE2b-256 631dfa04ec57de5fe41db5072430e418371411b79187b700ee8389e67c27714e

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_privacy-0.1.1-py3-none-any.whl:

Publisher: release.yml on OpenAdaptAI/openadapt-privacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page