Skip to main content

Lightweight, tiered, bidirectional PII sanitizer for LLM pipelines

Project description

prompt-sanitizer

PII and secret sanitization for Python LLM pipelines.

prompt-sanitizer provides a typed API for detecting, redacting, anonymizing, and restoring sensitive values before they reach a model, tool, middleware layer, log sink, or SDK wrapper. FAST mode has zero required dependencies. SMART and FULL add optional NLP, synthetic replacement, and audit logging.

Install

Python 3.10+.

pip install ai-prompt-sanitizer
pip install "ai-prompt-sanitizer[nlp]"
pip install "ai-prompt-sanitizer[synthetic]"
pip install "ai-prompt-sanitizer[integrations]"
pip install "ai-prompt-sanitizer[all]"

Optional extras

Extra Adds Typical use
nlp transformers + torch NER in SMART/FULL mode
synthetic faker realistic fake replacements
integrations framework / SDK adapters LangChain, LlamaIndex, OpenAI, FastAPI, Django
all all extras full feature set

Quick start

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
result = s.sanitize("Contact Jane Doe at jane@example.com or 415-555-0112.")

print(result.text)
print(result.has_pii)
print(result.risk_score)
print(result.tokens)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Modes

Mode Pipeline Dependencies Notes
Mode.FAST regex + secret detectors none sub-ms, stdlib only
Mode.SMART FAST + Piiranha NER prompt-sanitizer[nlp] lazy-loads on first call
Mode.FULL SMART + synthetic replacement + audit log usually nlp + synthetic best for compliance-oriented flows

FAST mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
text = "SSN 078-05-1120, card 4111 1111 1111 1111, token sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
result = s.sanitize(text)

print(result.text)
print(result.entities)
print(result.tokens)

Use FAST for prompt pre-processing, log scrubbing, middleware guards, CI checks, and zero-dependency CLI tooling.

SMART mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.SMART)
result = s.sanitize(
    "Alice from Acme Corp met us in Berlin on 2025-02-14. Email alice@acme.example."
)

print(result.text)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.confidence)

Use SMART when prompts contain free-form prose with names, organizations, dates, or locations that regexes alone may miss.

FULL mode

from prompt_sanitizer import Sanitizer, Mode, SQLiteAuditLog

audit = SQLiteAuditLog("prompt_sanitizer_audit.db")
s = Sanitizer(mode=Mode.FULL, locale="en_US", on_detect="redact", audit_log=audit)

result = s.sanitize("Customer Jane Doe uses jane@example.com and 415-555-0112.")
print(result.text)
print(result.tokens)
print(s.audit.export(format="json"))

Use FULL when you want synthetic replacement plus an audit trail.

Public API

Sanitizer

Sanitizer(
    mode: Mode = Mode.FAST,
    locale: str = "en_US",
    entities: list[EntityType] | None = None,
    on_detect: str = "redact",
    audit_log: BaseAuditLog | None = None,
)
Parameter Type Description
mode Mode detection pipeline
locale str locale for synthetic replacement generation
entities list[EntityType] | None optional allowlist of entity types
on_detect str "redact", "warn", or "block"
audit_log BaseAuditLog | None optional audit backend

| Method | Signature | Description | | ---------------- | ----------------------------------------------------------------------- | ------------------------------------------- | --------------------------------------- | | sanitize | sanitize(text: str, session_id: str | None = None) -> SanitizeResult | sanitize one string | | sanitize_batch | sanitize_batch(texts: list[str]) -> list[SanitizeResult] | sanitize multiple inputs | | session | session(session_id: str | None = None) -> Session | create a reusable anonymization session | | add_entity | add_entity(name: str, pattern: str, confidence: float = 0.85) -> None | register a custom entity | | stream | stream(source: AsyncIterable, session: Session | None) -> AsyncGenerator[str, None] | restore streamed chunks | | guard | guard(on_detect: str) -> decorator | decorate a function with sanitization logic | | audit | .audit -> BaseAuditLog | None | access the configured audit log |

Detection policy

on_detect value Behavior
"redact" rewrite the returned text
"warn" return original text, but populate entities and scores
"block" raise instead of returning sanitized text
results = s.sanitize_batch(["Email a@example.com", "No sensitive data here"])

@s.guard(on_detect="redact")
def call_model(prompt: str) -> str:
    return prompt

Mode, SanitizeResult, and DetectedEntity

Mode value Meaning
Mode.FAST regex + secrets, zero deps, sub-ms
Mode.SMART FAST + Piiranha NER, lazy loads on first call
Mode.FULL SMART + synthetic replacement + audit log
SanitizeResult attribute Type Description
text str sanitized text
entities list[DetectedEntity] detected spans
tokens dict[str, str] {original_value: replacement} map
risk_score float composite score from 0.0 to 1.0
has_pii bool whether sensitive data was found
DetectedEntity attribute Type Description
entity_type EntityType entity classification
value str original matched value
start int inclusive start offset
end int exclusive end offset
confidence float detection confidence
replacement str | None replacement value, if generated
result = s.sanitize("Contact me at sam@example.com")
assert result.has_pii is True
assert 0.0 <= result.risk_score <= 1.0
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Sessions and vaults

Use sessions when the model should never see raw values, but the final response should restore them.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
session = s.session(session_id="support-chat-001")
clean_prompt = session.anonymize("My name is Elena Ruiz and my email is elena@company.com")
llm_reply = "Confirmed. I will email [EMAIL_1] shortly."
final_reply = session.deanonymize(llm_reply)

print(clean_prompt)
print(final_reply)
Session API Description
session.anonymize(text: str) -> str replace PII with vault tokens
session.deanonymize(text: str) -> str restore originals from the vault
session.vault: Vault access the underlying vault
Vault API Description
vault.store(value: str, replacement: str) -> None store a mapping
vault.lookup(replacement: str) -> str | None resolve token to original
vault.reverse(value: str) -> str | None resolve original to replacement
vault.clear() -> None clear all mappings
vault = session.vault
vault.store("alice@example.com", "[EMAIL_1]")
print(vault.lookup("[EMAIL_1]"))
print(vault.reverse("alice@example.com"))
vault.clear()

Custom entities

Use add_entity() for internal identifiers, tenant-specific secrets, or domain-specific formats.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
s.add_entity(name="customer_id", pattern=r"\bCUS-\d{8}\b", confidence=0.90)
s.add_entity(name="invoice_no", pattern=r"\bINV-\d{6}-[A-Z]{2}\b", confidence=0.88)

result = s.sanitize("Customer CUS-12345678 opened invoice INV-882211-US")
print(result.text)
print(result.entities)

Filtering by entity type

from prompt_sanitizer import Sanitizer, EntityType

s = Sanitizer(entities=[EntityType.EMAIL, EntityType.API_KEY])
result = s.sanitize("Email a@b.com and SSN 123-45-6789")
print(result.text)

Audit logging

Audit backends are optional. Use them when you want structured records of detections.

MemoryAuditLog

from prompt_sanitizer import MemoryAuditLog, Mode, Sanitizer

audit = MemoryAuditLog()
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Email finance@example.com")

print(audit.events())
print(audit.export(format="json"))

SQLiteAuditLog

from prompt_sanitizer import SQLiteAuditLog, Mode, Sanitizer

audit = SQLiteAuditLog("audit.db")
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Call +1 415 555 0112", session_id="request-17")

print(audit.events())
print(audit.export(format="csv"))

Audit API

API Description
MemoryAuditLog() in-memory list of AuditEvent
SQLiteAuditLog(path: str) SQLite-backed persisted log
.events() -> list[AuditEvent] return recorded events
.export(format: "json" | "csv") -> str export audit records

Integrations

Install integration dependencies first:

pip install "ai-prompt-sanitizer[integrations]"

LangChain

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.langchain import PromptSanitizerRunnable, SanitizedLLM

s = Sanitizer()
# As a runnable step in a chain
chain = PromptSanitizerRunnable(sanitizer=s) | llm | OutputParser()
result = chain.invoke("My email is dev@example.com")

# Or wrap the LLM directly
safe_llm = SanitizedLLM(llm, s)
reply = safe_llm.invoke("Contact alice@example.com with the summary.")

LlamaIndex

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.llamaindex import PromptSanitizerPostprocessor

s = Sanitizer()
postprocessor = PromptSanitizerPostprocessor(sanitizer=s)
query_engine = index.as_query_engine(node_postprocessors=[postprocessor])
response = query_engine.query("Summarize the contract for jane@example.com")

OpenAI SDK wrapper

import openai
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.openai import wrap

s = Sanitizer()
client = wrap(openai.OpenAI(), sanitizer=s)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "My card is 4111 1111 1111 1111"}],
)

FastAPI middleware

from fastapi import FastAPI
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.fastapi import SanitizerMiddleware

s = Sanitizer()
app = FastAPI()
app.add_middleware(SanitizerMiddleware, sanitizer=s, fields=["prompt", "message"])

Django middleware

MIDDLEWARE = ["prompt_sanitizer.integrations.django.SanitizerMiddleware"]
from prompt_sanitizer import Sanitizer

PROMPT_SANITIZER = {
    "sanitizer": Sanitizer(),
    "fields": ["prompt", "message"],
}

Entity types

Group Values
core PII EMAIL, PHONE, SSN, CREDIT_CARD, IBAN, IP_ADDRESS, URL, DATE
identity / org PERSON_NAME, ORGANIZATION, LOCATION
secrets API_KEY, JWT_TOKEN, SECRET_KEY, AWS_KEY, GITHUB_TOKEN, OPENAI_KEY, ANTHROPIC_KEY
extension CUSTOM

Operational notes

  • FAST mode is stdlib-only.
  • SMART lazy-loads NER on first use.
  • FULL is the best fit for synthetic replacement plus audit.
  • sanitize() is for one-shot calls.
  • session() is for reversible multi-turn workflows.
  • sanitize_batch() treats each input independently.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prompt_sanitizer-1.0.1.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_prompt_sanitizer-1.0.1-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file ai_prompt_sanitizer-1.0.1.tar.gz.

File metadata

  • Download URL: ai_prompt_sanitizer-1.0.1.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_prompt_sanitizer-1.0.1.tar.gz
Algorithm Hash digest
SHA256 17659811c24b7c4ab59129f3689971e5f84f86a88ae639109f3545cdb93230d0
MD5 a9964f2aee4dbdb6dfe63624554725f2
BLAKE2b-256 5d04cd197b352de8ce83b9dd335ee0ba58f8dc24cdde6f8aad0854a139ea93f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-1.0.1.tar.gz:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_prompt_sanitizer-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_prompt_sanitizer-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc8eaa975aaca747ade46b4088f37f788e676a1c941b92d6d85da559380ac836
MD5 aca7fd83cdd94de0199c9f2001bf1c66
BLAKE2b-256 235c764d11b7e6df1fa9413e4a8a21333795b9a5b11e891478b735299a0f86aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-1.0.1-py3-none-any.whl:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page