Skip to main content

Lightning-fast PII detection and anonymization library with 190x performance advantage

Project description

DataFog Python

DataFog is a Python library for detecting and redacting personally identifiable information (PII).

It provides:

  • Fast structured PII detection via regex
  • Optional NER support via spaCy and GLiNER
  • A simple agent-oriented API for LLM applications
  • Backward-compatible DataFog and TextService classes

Installation

# Core install (regex engine)
pip install datafog

# Add spaCy support
pip install datafog[nlp]

# Add GLiNER + spaCy support
pip install datafog[nlp-advanced]

# Everything
pip install datafog[all]

Python 3.13 support is certified for the core SDK and CLI. Optional extras such as nlp, nlp-advanced, ocr, distributed, and all are available but not yet certified on Python 3.13.

Quick Start

import datafog

text = "Contact john@example.com or call (555) 123-4567"
clean = datafog.sanitize(text, engine="regex")
print(clean)
# Contact [EMAIL_1] or call [PHONE_1]

For LLM Applications

import datafog

# 1) Scan prompt text before sending to an LLM
prompt = "My SSN is 123-45-6789"
scan_result = datafog.scan_prompt(prompt, engine="regex")
if scan_result.entities:
    print(f"Detected {len(scan_result.entities)} PII entities")

# 2) Redact model output before returning it
output = "Email me at jane.doe@example.com"
safe_result = datafog.filter_output(output, engine="regex")
print(safe_result.redacted_text)
# Email me at [EMAIL_1]

# 3) One-liner redaction
print(datafog.sanitize("Card: 4111-1111-1111-1111", engine="regex"))
# Card: [CREDIT_CARD_1]

Guardrails

import datafog

# Reusable guardrail object
guard = datafog.create_guardrail(engine="regex", on_detect="redact")

@guard
def call_llm() -> str:
    return "Send to admin@example.com"

print(call_llm())
# Send to [EMAIL_1]

Engines

Use the engine that matches your accuracy and dependency constraints:

  • regex:
    • Fastest and always available.
    • Best for structured entities: EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, DATE, ZIP_CODE.
  • spacy:
    • Requires pip install datafog[nlp].
    • Useful for unstructured entities like person and organization names.
  • gliner:
    • Requires pip install datafog[nlp-advanced].
    • Stronger NER coverage than regex for unstructured text.
  • smart:
    • Cascades regex with optional NER engines.
    • If optional deps are missing, it degrades gracefully and warns.

Backward-Compatible APIs

The existing public API remains available.

DataFog class

from datafog import DataFog

result = DataFog().scan_text("Email john@example.com")
print(result["EMAIL"])

TextService class

from datafog.services import TextService

service = TextService(engine="regex")
result = service.annotate_text_sync("Call (555) 123-4567")
print(result["PHONE"])

CLI

# Scan text
datafog scan-text "john@example.com"

# Redact text
datafog redact-text "john@example.com"

# Replace text with pseudonyms
datafog replace-text "john@example.com"

# Hash detected entities
datafog hash-text "john@example.com"

Telemetry

DataFog telemetry is disabled by default.

To opt in:

export DATAFOG_TELEMETRY=1

To force telemetry off:

export DATAFOG_NO_TELEMETRY=1
# or
export DO_NOT_TRACK=1

Telemetry does not include input text or detected PII values.

Development

git clone https://github.com/datafog/datafog-python
cd datafog-python
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e ".[all,dev]"
pip install -r requirements-dev.txt
pytest tests/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafog-4.4.0b2.tar.gz (77.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datafog-4.4.0b2-py3-none-any.whl (62.1 kB view details)

Uploaded Python 3

File details

Details for the file datafog-4.4.0b2.tar.gz.

File metadata

  • Download URL: datafog-4.4.0b2.tar.gz
  • Upload date:
  • Size: 77.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for datafog-4.4.0b2.tar.gz
Algorithm Hash digest
SHA256 ed038fd21f4def1ef4cd83308f62fc8224a02f304bfe4b3a4708c70e6c8bcf6c
MD5 4be0a9311b2ca8d48d97bce3a0a54ca4
BLAKE2b-256 345c221ddfa62a82dd3a300022f1032fdc7fb3c00004b3fb297b5b4b185360c7

See more details on using hashes here.

File details

Details for the file datafog-4.4.0b2-py3-none-any.whl.

File metadata

  • Download URL: datafog-4.4.0b2-py3-none-any.whl
  • Upload date:
  • Size: 62.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for datafog-4.4.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 b9023b1f0cfe885c26e2ca2a3d0ed61da4dbac48e73ad93df5af5e9c848718b9
MD5 3782e33319b1f07714042ca565540468
BLAKE2b-256 08ef8c2252cb8cb318befde24cfe2de8d13f6390abe456be4efa9981f63789a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page