Skip to main content

Privacy scrubbing for GUI automation data - PII/PHI detection and redaction

Project description

openadapt-privacy

Privacy scrubbing for GUI automation data - PII/PHI detection and redaction.

Installation

pip install openadapt-privacy

For Presidio-based scrubbing (recommended):

pip install openadapt-privacy[presidio]
python -m spacy download en_core_web_trf

Quick Start

Text Scrubbing

from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()

text = "Contact John Smith at john.smith@example.com or 555-123-4567"
scrubbed = scrubber.scrub_text(text)

Input:

Contact John Smith at john.smith@example.com or 555-123-4567

Output:

Contact <PERSON> at <EMAIL_ADDRESS> or <PHONE_NUMBER>

Example Inputs & Outputs

Input Output
My email is john@example.com My email is <EMAIL_ADDRESS>
SSN: 923-45-6789 SSN: <US_SSN>
Card: 4532-1234-5678-9012 Card: <CREDIT_CARD>
Call me at 555-123-4567 Call me at <PHONE_NUMBER>
DOB: 01/15/1985 DOB: <DATE_TIME>
Contact John Smith Contact <PERSON>

Dict Scrubbing

Scrub PII from nested dictionaries (e.g., GUI element trees):

from openadapt_privacy import scrub_dict
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()
action = {
    "text": "Email: john@example.com",
    "metadata": {
        "title": "User Profile - John Smith",
        "tooltip": "Click to contact john@example.com",
    },
    "coordinates": {"x": 100, "y": 200},
}
scrubbed = scrub_dict(action, scrubber)

Input:

{
    "text": "Email: john@example.com",
    "metadata": {
        "title": "User Profile - John Smith",
        "tooltip": "Click to contact john@example.com"
    },
    "coordinates": {"x": 100, "y": 200}
}

Output:

{
    "text": "Email: <EMAIL_ADDRESS>",
    "metadata": {
        "title": "User Profile - <PERSON>",
        "tooltip": "Click to contact <EMAIL_ADDRESS>"
    },
    "coordinates": {"x": 100, "y": 200}
}

Recording Pipeline

Process complete GUI automation recordings:

from openadapt_privacy import DictRecordingLoader
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()
loader = DictRecordingLoader()

recording = loader.load_from_dict({
    "task_description": "Send email to John Smith at john@example.com",
    "actions": [
        {"id": 1, "action_type": "click", "text": "Compose", "timestamp": 1000},
        {"id": 2, "action_type": "type", "text": "john@example.com", "timestamp": 2000},
        {"id": 3, "action_type": "click", "text": "Send", "window_title": "Email to john@example.com", "timestamp": 3000},
    ],
})

scrubbed = recording.scrub(scrubber)

Input Recording:

task_description: "Send email to John Smith at john@example.com"

actions:
  [1] click: "Compose"
  [2] type:  "john@example.com"
  [3] click: "Send" (window: "Email to john@example.com")

Output Recording:

task_description: "Send email to <PERSON> at <EMAIL_ADDRESS>"

actions:
  [1] click: "Compose"
  [2] type:  "<EMAIL_ADDRESS>"
  [3] click: "Send" (window: "Email to <EMAIL_ADDRESS>")

Image Scrubbing

Redact PII from screenshots using OCR + NER:

from PIL import Image
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider

scrubber = PresidioScrubbingProvider()

image = Image.open("screenshot.png")
scrubbed_image = scrubber.scrub_image(image)
scrubbed_image.save("screenshot_scrubbed.png")

Input Screenshot:

Original screenshot with PII

Output Screenshot:

Scrubbed screenshot with PII redacted

The image redactor:

  1. Runs OCR to detect text regions
  2. Analyzes text for PII entities (email, phone, SSN, etc.)
  3. Fills detected PII regions with solid color (configurable, default: red)

Custom Data Loader

Implement your own loader for custom storage formats:

from openadapt_privacy import RecordingLoader, Recording

class SQLiteRecordingLoader(RecordingLoader):
    def __init__(self, db_path: str):
        self.db_path = db_path

    def load(self, recording_id: str) -> Recording:
        # Load from SQLite database
        ...

    def save(self, recording: Recording, recording_id: str) -> None:
        # Save to SQLite database
        ...

# Usage
loader = SQLiteRecordingLoader("recordings.db")
scrubber = PresidioScrubbingProvider()

# Load, scrub, and save
scrubbed = loader.load_and_scrub("recording_001", scrubber)
loader.save(scrubbed, "recording_001_scrubbed")

Configuration

from openadapt_privacy.config import PrivacyConfig

custom_config = PrivacyConfig(
    SCRUB_CHAR="X",                    # Character for scrub_text_all
    SCRUB_FILL_COLOR=0xFF0000,         # Red for image redaction (BGR)
    SCRUB_KEYS_HTML=[                  # Keys to scrub in dicts
        "text", "value", "title", "tooltip", "custom_field"
    ],
    SCRUB_PRESIDIO_IGNORE_ENTITIES=[   # Entity types to skip
        "DATE_TIME",
    ],
)

Supported Entity Types

Entity Example Input Example Output
PERSON John Smith <PERSON>
EMAIL_ADDRESS john@example.com <EMAIL_ADDRESS>
PHONE_NUMBER 555-123-4567 <PHONE_NUMBER>
US_SSN 923-45-6789 <US_SSN>
CREDIT_CARD 4532-1234-5678-9012 <CREDIT_CARD>
US_BANK_NUMBER 635526789012 <US_BANK_NUMBER>
US_DRIVER_LICENSE A123-456-789-012 <US_DRIVER_LICENSE>
DATE_TIME 01/15/1985 <DATE_TIME>
LOCATION Toronto, ON <LOCATION>

Architecture

openadapt_privacy/
├── base.py           # ScrubbingProvider, TextScrubbingMixin
├── config.py         # PrivacyConfig dataclass
├── loaders.py        # Recording, Action, Screenshot, RecordingLoader
├── providers/
│   ├── __init__.py   # ScrubProvider registry
│   └── presidio.py   # PresidioScrubbingProvider
└── pipelines/
    └── dicts.py      # scrub_dict, scrub_list_dicts

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt_privacy-0.1.0.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openadapt_privacy-0.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file openadapt_privacy-0.1.0.tar.gz.

File metadata

  • Download URL: openadapt_privacy-0.1.0.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openadapt_privacy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4c214b6ae7085482287627c8e2a212896ac8a1901b456697d90b199ba11e7712
MD5 e77923412313eed494d1331a379183d8
BLAKE2b-256 58ea453a66d514d47ab66c178c201ae29f8f169d86509033f1c88e2341b0c78b

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_privacy-0.1.0.tar.gz:

Publisher: publish.yml on OpenAdaptAI/openadapt-privacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openadapt_privacy-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for openadapt_privacy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a899bedf77e0e80f43e9581e5df843214b9cd43ed768cf1c9442b882a3c07338
MD5 ba07f3ad34d8626b9cb7f0b56caa1840
BLAKE2b-256 6c6df765928a9646e0bbe3e9b3cc2e09018f8f0a4d7c734ed05c7dfc72fc25aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_privacy-0.1.0-py3-none-any.whl:

Publisher: publish.yml on OpenAdaptAI/openadapt-privacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page