Privacy scrubbing for GUI automation data - PII/PHI detection and redaction
Project description
openadapt-privacy
Privacy scrubbing for GUI automation data - PII/PHI detection and redaction.
Installation
pip install openadapt-privacy
For Presidio-based scrubbing (recommended):
pip install openadapt-privacy[presidio]
python -m spacy download en_core_web_trf
Quick Start
Text Scrubbing
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
text = "Contact John Smith at john.smith@example.com or 555-123-4567"
scrubbed = scrubber.scrub_text(text)
Input:
Contact John Smith at john.smith@example.com or 555-123-4567
Output:
Contact <PERSON> at <EMAIL_ADDRESS> or <PHONE_NUMBER>
Example Inputs & Outputs
| Input | Output |
|---|---|
My email is john@example.com |
My email is <EMAIL_ADDRESS> |
SSN: 923-45-6789 |
SSN: <US_SSN> |
Card: 4532-1234-5678-9012 |
Card: <CREDIT_CARD> |
Call me at 555-123-4567 |
Call me at <PHONE_NUMBER> |
DOB: 01/15/1985 |
DOB: <DATE_TIME> |
Contact John Smith |
Contact <PERSON> |
Dict Scrubbing
Scrub PII from nested dictionaries (e.g., GUI element trees):
from openadapt_privacy import scrub_dict
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
action = {
"text": "Email: john@example.com",
"metadata": {
"title": "User Profile - John Smith",
"tooltip": "Click to contact john@example.com",
},
"coordinates": {"x": 100, "y": 200},
}
scrubbed = scrub_dict(action, scrubber)
Input:
{
"text": "Email: john@example.com",
"metadata": {
"title": "User Profile - John Smith",
"tooltip": "Click to contact john@example.com"
},
"coordinates": {"x": 100, "y": 200}
}
Output:
{
"text": "Email: <EMAIL_ADDRESS>",
"metadata": {
"title": "User Profile - <PERSON>",
"tooltip": "Click to contact <EMAIL_ADDRESS>"
},
"coordinates": {"x": 100, "y": 200}
}
Recording Pipeline
Process complete GUI automation recordings:
from openadapt_privacy import DictRecordingLoader
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
loader = DictRecordingLoader()
recording = loader.load_from_dict({
"task_description": "Send email to John Smith at john@example.com",
"actions": [
{"id": 1, "action_type": "click", "text": "Compose", "timestamp": 1000},
{"id": 2, "action_type": "type", "text": "john@example.com", "timestamp": 2000},
{"id": 3, "action_type": "click", "text": "Send", "window_title": "Email to john@example.com", "timestamp": 3000},
],
})
scrubbed = recording.scrub(scrubber)
Input Recording:
task_description: "Send email to John Smith at john@example.com"
actions:
[1] click: "Compose"
[2] type: "john@example.com"
[3] click: "Send" (window: "Email to john@example.com")
Output Recording:
task_description: "Send email to <PERSON> at <EMAIL_ADDRESS>"
actions:
[1] click: "Compose"
[2] type: "<EMAIL_ADDRESS>"
[3] click: "Send" (window: "Email to <EMAIL_ADDRESS>")
Image Scrubbing
Redact PII from screenshots using OCR + NER:
from PIL import Image
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
image = Image.open("screenshot.png")
scrubbed_image = scrubber.scrub_image(image)
scrubbed_image.save("screenshot_scrubbed.png")
Input Screenshot:
Output Screenshot:
The image redactor:
- Runs OCR to detect text regions
- Analyzes text for PII entities (email, phone, SSN, etc.)
- Fills detected PII regions with solid color (configurable, default: red)
Custom Data Loader
Implement your own loader for custom storage formats:
from openadapt_privacy import RecordingLoader, Recording
class SQLiteRecordingLoader(RecordingLoader):
def __init__(self, db_path: str):
self.db_path = db_path
def load(self, recording_id: str) -> Recording:
# Load from SQLite database
...
def save(self, recording: Recording, recording_id: str) -> None:
# Save to SQLite database
...
# Usage
loader = SQLiteRecordingLoader("recordings.db")
scrubber = PresidioScrubbingProvider()
# Load, scrub, and save
scrubbed = loader.load_and_scrub("recording_001", scrubber)
loader.save(scrubbed, "recording_001_scrubbed")
Configuration
from openadapt_privacy.config import PrivacyConfig
custom_config = PrivacyConfig(
SCRUB_CHAR="X", # Character for scrub_text_all
SCRUB_FILL_COLOR=0xFF0000, # Red for image redaction (BGR)
SCRUB_KEYS_HTML=[ # Keys to scrub in dicts
"text", "value", "title", "tooltip", "custom_field"
],
SCRUB_PRESIDIO_IGNORE_ENTITIES=[ # Entity types to skip
"DATE_TIME",
],
)
Supported Entity Types
| Entity | Example Input | Example Output |
|---|---|---|
PERSON |
John Smith |
<PERSON> |
EMAIL_ADDRESS |
john@example.com |
<EMAIL_ADDRESS> |
PHONE_NUMBER |
555-123-4567 |
<PHONE_NUMBER> |
US_SSN |
923-45-6789 |
<US_SSN> |
CREDIT_CARD |
4532-1234-5678-9012 |
<CREDIT_CARD> |
US_BANK_NUMBER |
635526789012 |
<US_BANK_NUMBER> |
US_DRIVER_LICENSE |
A123-456-789-012 |
<US_DRIVER_LICENSE> |
DATE_TIME |
01/15/1985 |
<DATE_TIME> |
LOCATION |
Toronto, ON |
<LOCATION> |
Architecture
openadapt_privacy/
├── base.py # ScrubbingProvider, TextScrubbingMixin
├── config.py # PrivacyConfig dataclass
├── loaders.py # Recording, Action, Screenshot, RecordingLoader
├── providers/
│ ├── __init__.py # ScrubProvider registry
│ └── presidio.py # PresidioScrubbingProvider
└── pipelines/
└── dicts.py # scrub_dict, scrub_list_dicts
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openadapt_privacy-0.1.0.tar.gz.
File metadata
- Download URL: openadapt_privacy-0.1.0.tar.gz
- Upload date:
- Size: 45.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c214b6ae7085482287627c8e2a212896ac8a1901b456697d90b199ba11e7712
|
|
| MD5 |
e77923412313eed494d1331a379183d8
|
|
| BLAKE2b-256 |
58ea453a66d514d47ab66c178c201ae29f8f169d86509033f1c88e2341b0c78b
|
Provenance
The following attestation bundles were made for openadapt_privacy-0.1.0.tar.gz:
Publisher:
publish.yml on OpenAdaptAI/openadapt-privacy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openadapt_privacy-0.1.0.tar.gz -
Subject digest:
4c214b6ae7085482287627c8e2a212896ac8a1901b456697d90b199ba11e7712 - Sigstore transparency entry: 761705099
- Sigstore integration time:
-
Permalink:
OpenAdaptAI/openadapt-privacy@e65a5046edd72f757a6f832db32332f9e9278c52 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/OpenAdaptAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e65a5046edd72f757a6f832db32332f9e9278c52 -
Trigger Event:
push
-
Statement type:
File details
Details for the file openadapt_privacy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: openadapt_privacy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a899bedf77e0e80f43e9581e5df843214b9cd43ed768cf1c9442b882a3c07338
|
|
| MD5 |
ba07f3ad34d8626b9cb7f0b56caa1840
|
|
| BLAKE2b-256 |
6c6df765928a9646e0bbe3e9b3cc2e09018f8f0a4d7c734ed05c7dfc72fc25aa
|
Provenance
The following attestation bundles were made for openadapt_privacy-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on OpenAdaptAI/openadapt-privacy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openadapt_privacy-0.1.0-py3-none-any.whl -
Subject digest:
a899bedf77e0e80f43e9581e5df843214b9cd43ed768cf1c9442b882a3c07338 - Sigstore transparency entry: 761705104
- Sigstore integration time:
-
Permalink:
OpenAdaptAI/openadapt-privacy@e65a5046edd72f757a6f832db32332f9e9278c52 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/OpenAdaptAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e65a5046edd72f757a6f832db32332f9e9278c52 -
Trigger Event:
push
-
Statement type: