Skip to main content

Lightweight, tiered, bidirectional PII sanitizer for LLM pipelines

Project description

prompt-sanitizer

PII and secret sanitization for Python LLM pipelines.

prompt-sanitizer provides a typed API for detecting, redacting, anonymizing, and restoring sensitive values before they reach a model, tool, middleware layer, log sink, or SDK wrapper. FAST mode has zero required dependencies. SMART and FULL add optional NLP, synthetic replacement, and audit logging.

Install

Python 3.10+.

pip install prompt-sanitizer
pip install "prompt-sanitizer[nlp]"
pip install "prompt-sanitizer[synthetic]"
pip install "prompt-sanitizer[integrations]"
pip install "prompt-sanitizer[all]"

Optional extras

Extra Adds Typical use
nlp transformers + torch NER in SMART/FULL mode
synthetic faker realistic fake replacements
integrations framework / SDK adapters LangChain, LlamaIndex, OpenAI, FastAPI, Django
all all extras full feature set

Quick start

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
result = s.sanitize("Contact Jane Doe at jane@example.com or 415-555-0112.")

print(result.text)
print(result.has_pii)
print(result.risk_score)
print(result.tokens)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Modes

Mode Pipeline Dependencies Notes
Mode.FAST regex + secret detectors none sub-ms, stdlib only
Mode.SMART FAST + Piiranha NER prompt-sanitizer[nlp] lazy-loads on first call
Mode.FULL SMART + synthetic replacement + audit log usually nlp + synthetic best for compliance-oriented flows

FAST mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
text = "SSN 078-05-1120, card 4111 1111 1111 1111, token sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
result = s.sanitize(text)

print(result.text)
print(result.entities)
print(result.tokens)

Use FAST for prompt pre-processing, log scrubbing, middleware guards, CI checks, and zero-dependency CLI tooling.

SMART mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.SMART)
result = s.sanitize(
    "Alice from Acme Corp met us in Berlin on 2025-02-14. Email alice@acme.example."
)

print(result.text)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.confidence)

Use SMART when prompts contain free-form prose with names, organizations, dates, or locations that regexes alone may miss.

FULL mode

from prompt_sanitizer import Sanitizer, Mode, SQLiteAuditLog

audit = SQLiteAuditLog("prompt_sanitizer_audit.db")
s = Sanitizer(mode=Mode.FULL, locale="en_US", on_detect="redact", audit_log=audit)

result = s.sanitize("Customer Jane Doe uses jane@example.com and 415-555-0112.")
print(result.text)
print(result.tokens)
print(s.audit.export(format="json"))

Use FULL when you want synthetic replacement plus an audit trail.

Public API

Sanitizer

Sanitizer(
    mode: Mode = Mode.FAST,
    locale: str = "en_US",
    entities: list[EntityType] | None = None,
    on_detect: str = "redact",
    audit_log: BaseAuditLog | None = None,
)
Parameter Type Description
mode Mode detection pipeline
locale str locale for synthetic replacement generation
entities list[EntityType] | None optional allowlist of entity types
on_detect str "redact", "warn", or "block"
audit_log BaseAuditLog | None optional audit backend
Method Signature Description
sanitize `sanitize(text: str, session_id: str None = None) -> SanitizeResult`
sanitize_batch sanitize_batch(texts: list[str]) -> list[SanitizeResult] sanitize multiple inputs
session `session(session_id: str None = None) -> Session`
add_entity add_entity(name: str, pattern: str, confidence: float = 0.85) -> None register a custom entity
stream `stream(source: AsyncIterable, session: Session None) -> AsyncGenerator[str, None]`
guard guard(on_detect: str) -> decorator decorate a function with sanitization logic
audit `.audit -> BaseAuditLog None`

Detection policy

on_detect value Behavior
"redact" rewrite the returned text
"warn" return original text, but populate entities and scores
"block" raise instead of returning sanitized text
results = s.sanitize_batch(["Email a@example.com", "No sensitive data here"])

@s.guard(on_detect="redact")
def call_model(prompt: str) -> str:
    return prompt

Mode, SanitizeResult, and DetectedEntity

Mode value Meaning
Mode.FAST regex + secrets, zero deps, sub-ms
Mode.SMART FAST + Piiranha NER, lazy loads on first call
Mode.FULL SMART + synthetic replacement + audit log
SanitizeResult attribute Type Description
text str sanitized text
entities list[DetectedEntity] detected spans
tokens dict[str, str] {original_value: replacement} map
risk_score float composite score from 0.0 to 1.0
has_pii bool whether sensitive data was found
DetectedEntity attribute Type Description
entity_type EntityType entity classification
value str original matched value
start int inclusive start offset
end int exclusive end offset
confidence float detection confidence
replacement str | None replacement value, if generated
result = s.sanitize("Contact me at sam@example.com")
assert result.has_pii is True
assert 0.0 <= result.risk_score <= 1.0
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Sessions and vaults

Use sessions when the model should never see raw values, but the final response should restore them.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
session = s.session(session_id="support-chat-001")
clean_prompt = session.anonymize("My name is Elena Ruiz and my email is elena@company.com")
llm_reply = "Confirmed. I will email [EMAIL_1] shortly."
final_reply = session.deanonymize(llm_reply)

print(clean_prompt)
print(final_reply)
Session API Description
session.anonymize(text: str) -> str replace PII with vault tokens
session.deanonymize(text: str) -> str restore originals from the vault
session.vault: Vault access the underlying vault
Vault API Description
vault.store(value: str, replacement: str) -> None store a mapping
vault.lookup(replacement: str) -> str | None resolve token to original
vault.reverse(value: str) -> str | None resolve original to replacement
vault.clear() -> None clear all mappings
vault = session.vault
vault.store("alice@example.com", "[EMAIL_1]")
print(vault.lookup("[EMAIL_1]"))
print(vault.reverse("alice@example.com"))
vault.clear()

Custom entities

Use add_entity() for internal identifiers, tenant-specific secrets, or domain-specific formats.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
s.add_entity(name="customer_id", pattern=r"\bCUS-\d{8}\b", confidence=0.90)
s.add_entity(name="invoice_no", pattern=r"\bINV-\d{6}-[A-Z]{2}\b", confidence=0.88)

result = s.sanitize("Customer CUS-12345678 opened invoice INV-882211-US")
print(result.text)
print(result.entities)

Filtering by entity type

from prompt_sanitizer import Sanitizer, EntityType

s = Sanitizer(entities=[EntityType.EMAIL, EntityType.API_KEY])
result = s.sanitize("Email a@b.com and SSN 123-45-6789")
print(result.text)

Audit logging

Audit backends are optional. Use them when you want structured records of detections.

MemoryAuditLog

from prompt_sanitizer import MemoryAuditLog, Mode, Sanitizer

audit = MemoryAuditLog()
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Email finance@example.com")

print(audit.events())
print(audit.export(format="json"))

SQLiteAuditLog

from prompt_sanitizer import SQLiteAuditLog, Mode, Sanitizer

audit = SQLiteAuditLog("audit.db")
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Call +1 415 555 0112", session_id="request-17")

print(audit.events())
print(audit.export(format="csv"))

Audit API

API Description
MemoryAuditLog() in-memory list of AuditEvent
SQLiteAuditLog(path: str) SQLite-backed persisted log
.events() -> list[AuditEvent] return recorded events
.export(format: "json" | "csv") -> str export audit records

Integrations

Install integration dependencies first:

pip install "prompt-sanitizer[integrations]"

LangChain

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.langchain import PromptSanitizerRunnable, SanitizedLLM

s = Sanitizer()
# As a runnable step in a chain
chain = PromptSanitizerRunnable(sanitizer=s) | llm | OutputParser()
result = chain.invoke("My email is dev@example.com")

# Or wrap the LLM directly
safe_llm = SanitizedLLM(llm, s)
reply = safe_llm.invoke("Contact alice@example.com with the summary.")

LlamaIndex

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.llamaindex import PromptSanitizerPostprocessor

s = Sanitizer()
postprocessor = PromptSanitizerPostprocessor(sanitizer=s)
query_engine = index.as_query_engine(node_postprocessors=[postprocessor])
response = query_engine.query("Summarize the contract for jane@example.com")

OpenAI SDK wrapper

import openai
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.openai import wrap

s = Sanitizer()
client = wrap(openai.OpenAI(), sanitizer=s)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "My card is 4111 1111 1111 1111"}],
)

FastAPI middleware

from fastapi import FastAPI
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.fastapi import SanitizerMiddleware

s = Sanitizer()
app = FastAPI()
app.add_middleware(SanitizerMiddleware, sanitizer=s, fields=["prompt", "message"])

Django middleware

MIDDLEWARE = ["prompt_sanitizer.integrations.django.SanitizerMiddleware"]
from prompt_sanitizer import Sanitizer

PROMPT_SANITIZER = {
    "sanitizer": Sanitizer(),
    "fields": ["prompt", "message"],
}

Entity types

Group Values
core PII EMAIL, PHONE, SSN, CREDIT_CARD, IBAN, IP_ADDRESS, URL, DATE
identity / org PERSON_NAME, ORGANIZATION, LOCATION
secrets API_KEY, JWT_TOKEN, SECRET_KEY, AWS_KEY, GITHUB_TOKEN, OPENAI_KEY, ANTHROPIC_KEY
extension CUSTOM

Operational notes

  • FAST mode is stdlib-only.
  • SMART lazy-loads NER on first use.
  • FULL is the best fit for synthetic replacement plus audit.
  • sanitize() is for one-shot calls.
  • session() is for reversible multi-turn workflows.
  • sanitize_batch() treats each input independently.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prompt_sanitizer-0.1.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_prompt_sanitizer-0.1.0-py3-none-any.whl (38.6 kB view details)

Uploaded Python 3

File details

Details for the file ai_prompt_sanitizer-0.1.0.tar.gz.

File metadata

  • Download URL: ai_prompt_sanitizer-0.1.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_prompt_sanitizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 da6dfff78da055f2d2a7e963851147466c9bf322814781597c045e265908914c
MD5 57d83d6c8d92ea74f2877e7dc3d4f3cb
BLAKE2b-256 4717b315b064d194b52fc0497c24f11ebf22941129fdffc269f3dbaf0bd0a4be

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-0.1.0.tar.gz:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_prompt_sanitizer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_prompt_sanitizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd9caa4e1f9c9816f0500b0fad7b8a3b5aead1e83a034f4092d162ec6def8a82
MD5 f8f16d2f6e3259aae8ad8bd4ab226503
BLAKE2b-256 feda7bb94f953f0c5022dd7be0b01952a3d60d91a851dceef094aaa4948ac35b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page