Lightning-fast PII detection and anonymization library with 190x performance advantage
Project description
DataFog Python
DataFog is a Python library for detecting and redacting personally identifiable information (PII).
It provides:
- Fast structured PII detection via regex
- Optional NER support via spaCy and GLiNER
- A simple agent-oriented API for LLM applications
- Backward-compatible
DataFogandTextServiceclasses
Installation
# Core install (regex engine)
pip install datafog
# Add spaCy support
pip install datafog[nlp]
# Add GLiNER + spaCy support
pip install datafog[nlp-advanced]
# Everything
pip install datafog[all]
Quick Start
import datafog
text = "Contact john@example.com or call (555) 123-4567"
clean = datafog.sanitize(text, engine="regex")
print(clean)
# Contact [EMAIL_1] or call [PHONE_1]
For LLM Applications
import datafog
# 1) Scan prompt text before sending to an LLM
prompt = "My SSN is 123-45-6789"
scan_result = datafog.scan_prompt(prompt, engine="regex")
if scan_result.entities:
print(f"Detected {len(scan_result.entities)} PII entities")
# 2) Redact model output before returning it
output = "Email me at jane.doe@example.com"
safe_result = datafog.filter_output(output, engine="regex")
print(safe_result.redacted_text)
# Email me at [EMAIL_1]
# 3) One-liner redaction
print(datafog.sanitize("Card: 4111-1111-1111-1111", engine="regex"))
# Card: [CREDIT_CARD_1]
Guardrails
import datafog
# Reusable guardrail object
guard = datafog.create_guardrail(engine="regex", on_detect="redact")
@guard
def call_llm() -> str:
return "Send to admin@example.com"
print(call_llm())
# Send to [EMAIL_1]
Engines
Use the engine that matches your accuracy and dependency constraints:
regex:- Fastest and always available.
- Best for structured entities:
EMAIL,PHONE,SSN,CREDIT_CARD,IP_ADDRESS,DATE,ZIP_CODE.
spacy:- Requires
pip install datafog[nlp]. - Useful for unstructured entities like person and organization names.
- Requires
gliner:- Requires
pip install datafog[nlp-advanced]. - Stronger NER coverage than regex for unstructured text.
- Requires
smart:- Cascades regex with optional NER engines.
- If optional deps are missing, it degrades gracefully and warns.
Backward-Compatible APIs
The existing public API remains available.
DataFog class
from datafog import DataFog
result = DataFog().scan_text("Email john@example.com")
print(result["EMAIL"])
TextService class
from datafog.services import TextService
service = TextService(engine="regex")
result = service.annotate_text_sync("Call (555) 123-4567")
print(result["PHONE"])
CLI
# Scan text
datafog scan-text "john@example.com"
# Redact text
datafog redact-text "john@example.com"
# Replace text with pseudonyms
datafog replace-text "john@example.com"
# Hash detected entities
datafog hash-text "john@example.com"
Telemetry
DataFog includes anonymous telemetry by default.
To opt out:
export DATAFOG_NO_TELEMETRY=1
# or
export DO_NOT_TRACK=1
Telemetry does not include input text or detected PII values.
Development
git clone https://github.com/datafog/datafog-python
cd datafog-python
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[all,dev]"
pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datafog-4.3.0.tar.gz.
File metadata
- Download URL: datafog-4.3.0.tar.gz
- Upload date:
- Size: 72.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06331ea195fe2761a0c051f6f0052baf2aa54578a22d93acb57f86d6f818e9c4
|
|
| MD5 |
91773ffd4d951ac05768105ed3e9f2d4
|
|
| BLAKE2b-256 |
5f17c34455bd50a7178bd53e7c27e16afe946e614a395313fb181d69ff7e07e9
|
File details
Details for the file datafog-4.3.0-py3-none-any.whl.
File metadata
- Download URL: datafog-4.3.0-py3-none-any.whl
- Upload date:
- Size: 60.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f27dadca6a6c05d436be1708958562d6028a3ba5b970de7e10105b65a50029a
|
|
| MD5 |
a1a96a93a57848ba4c02db94cc459898
|
|
| BLAKE2b-256 |
bdfad62cd25470e4705db00dec00907a92dbe0ff2b9c249583dc22d9f9e1358c
|