Skip to main content

Lightweight, tiered, bidirectional PII sanitizer for LLM pipelines

Project description

prompt-sanitizer

PII and secret sanitization for Python LLM pipelines.

prompt-sanitizer provides a typed API for detecting, redacting, anonymizing, and restoring sensitive values before they reach a model, tool, middleware layer, log sink, or SDK wrapper. FAST mode has zero required dependencies. SMART and FULL add optional NLP, synthetic replacement, and audit logging.

Install

Python 3.10+.

pip install ai-prompt-sanitizer
pip install "ai-prompt-sanitizer[nlp]"
pip install "ai-prompt-sanitizer[synthetic]"
pip install "ai-prompt-sanitizer[integrations]"
pip install "ai-prompt-sanitizer[all]"

Optional extras

Extra Adds Typical use
nlp transformers + torch NER in SMART/FULL mode
synthetic faker realistic fake replacements
integrations framework / SDK adapters LangChain, LlamaIndex, OpenAI, FastAPI, Django
all all extras full feature set

Quick start

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
result = s.sanitize("Contact Jane Doe at jane@example.com or 415-555-0112.")

print(result.text)
print(result.has_pii)
print(result.risk_score)
print(result.tokens)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Modes

Mode Pipeline Dependencies Notes
Mode.FAST regex + secret detectors none sub-ms, stdlib only
Mode.SMART FAST + Piiranha NER prompt-sanitizer[nlp] lazy-loads on first call
Mode.FULL SMART + synthetic replacement + audit log usually nlp + synthetic best for compliance-oriented flows

FAST mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
text = "SSN 078-05-1120, card 4111 1111 1111 1111, token sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
result = s.sanitize(text)

print(result.text)
print(result.entities)
print(result.tokens)

Use FAST for prompt pre-processing, log scrubbing, middleware guards, CI checks, and zero-dependency CLI tooling.

SMART mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.SMART)
result = s.sanitize(
    "Alice from Acme Corp met us in Berlin on 2025-02-14. Email alice@acme.example."
)

print(result.text)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.confidence)

Use SMART when prompts contain free-form prose with names, organizations, dates, or locations that regexes alone may miss.

FULL mode

from prompt_sanitizer import Sanitizer, Mode, SQLiteAuditLog

audit = SQLiteAuditLog("prompt_sanitizer_audit.db")
s = Sanitizer(mode=Mode.FULL, locale="en_US", on_detect="redact", audit_log=audit)

result = s.sanitize("Customer Jane Doe uses jane@example.com and 415-555-0112.")
print(result.text)
print(result.tokens)
print(s.audit.export(format="json"))

Use FULL when you want synthetic replacement plus an audit trail.

Public API

Sanitizer

Sanitizer(
    mode: Mode = Mode.FAST,
    locale: str = "en_US",
    entities: list[EntityType] | None = None,
    on_detect: str = "redact",
    audit_log: BaseAuditLog | None = None,
)
Parameter Type Description
mode Mode detection pipeline
locale str locale for synthetic replacement generation
entities list[EntityType] | None optional allowlist of entity types
on_detect str "redact", "warn", or "block"
audit_log BaseAuditLog | None optional audit backend
Method Signature Description
sanitize `sanitize(text: str, session_id: str None = None) -> SanitizeResult`
sanitize_batch sanitize_batch(texts: list[str]) -> list[SanitizeResult] sanitize multiple inputs
session `session(session_id: str None = None) -> Session`
add_entity add_entity(name: str, pattern: str, confidence: float = 0.85) -> None register a custom entity
stream `stream(source: AsyncIterable, session: Session None) -> AsyncGenerator[str, None]`
guard guard(on_detect: str) -> decorator decorate a function with sanitization logic
audit `.audit -> BaseAuditLog None`

Detection policy

on_detect value Behavior
"redact" rewrite the returned text
"warn" return original text, but populate entities and scores
"block" raise instead of returning sanitized text
results = s.sanitize_batch(["Email a@example.com", "No sensitive data here"])

@s.guard(on_detect="redact")
def call_model(prompt: str) -> str:
    return prompt

Mode, SanitizeResult, and DetectedEntity

Mode value Meaning
Mode.FAST regex + secrets, zero deps, sub-ms
Mode.SMART FAST + Piiranha NER, lazy loads on first call
Mode.FULL SMART + synthetic replacement + audit log
SanitizeResult attribute Type Description
text str sanitized text
entities list[DetectedEntity] detected spans
tokens dict[str, str] {original_value: replacement} map
risk_score float composite score from 0.0 to 1.0
has_pii bool whether sensitive data was found
DetectedEntity attribute Type Description
entity_type EntityType entity classification
value str original matched value
start int inclusive start offset
end int exclusive end offset
confidence float detection confidence
replacement str | None replacement value, if generated
result = s.sanitize("Contact me at sam@example.com")
assert result.has_pii is True
assert 0.0 <= result.risk_score <= 1.0
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Sessions and vaults

Use sessions when the model should never see raw values, but the final response should restore them.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
session = s.session(session_id="support-chat-001")
clean_prompt = session.anonymize("My name is Elena Ruiz and my email is elena@company.com")
llm_reply = "Confirmed. I will email [EMAIL_1] shortly."
final_reply = session.deanonymize(llm_reply)

print(clean_prompt)
print(final_reply)
Session API Description
session.anonymize(text: str) -> str replace PII with vault tokens
session.deanonymize(text: str) -> str restore originals from the vault
session.vault: Vault access the underlying vault
Vault API Description
vault.store(value: str, replacement: str) -> None store a mapping
vault.lookup(replacement: str) -> str | None resolve token to original
vault.reverse(value: str) -> str | None resolve original to replacement
vault.clear() -> None clear all mappings
vault = session.vault
vault.store("alice@example.com", "[EMAIL_1]")
print(vault.lookup("[EMAIL_1]"))
print(vault.reverse("alice@example.com"))
vault.clear()

Custom entities

Use add_entity() for internal identifiers, tenant-specific secrets, or domain-specific formats.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
s.add_entity(name="customer_id", pattern=r"\bCUS-\d{8}\b", confidence=0.90)
s.add_entity(name="invoice_no", pattern=r"\bINV-\d{6}-[A-Z]{2}\b", confidence=0.88)

result = s.sanitize("Customer CUS-12345678 opened invoice INV-882211-US")
print(result.text)
print(result.entities)

Filtering by entity type

from prompt_sanitizer import Sanitizer, EntityType

s = Sanitizer(entities=[EntityType.EMAIL, EntityType.API_KEY])
result = s.sanitize("Email a@b.com and SSN 123-45-6789")
print(result.text)

Audit logging

Audit backends are optional. Use them when you want structured records of detections.

MemoryAuditLog

from prompt_sanitizer import MemoryAuditLog, Mode, Sanitizer

audit = MemoryAuditLog()
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Email finance@example.com")

print(audit.events())
print(audit.export(format="json"))

SQLiteAuditLog

from prompt_sanitizer import SQLiteAuditLog, Mode, Sanitizer

audit = SQLiteAuditLog("audit.db")
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Call +1 415 555 0112", session_id="request-17")

print(audit.events())
print(audit.export(format="csv"))

Audit API

API Description
MemoryAuditLog() in-memory list of AuditEvent
SQLiteAuditLog(path: str) SQLite-backed persisted log
.events() -> list[AuditEvent] return recorded events
.export(format: "json" | "csv") -> str export audit records

Integrations

Install integration dependencies first:

pip install "ai-prompt-sanitizer[integrations]"

LangChain

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.langchain import PromptSanitizerRunnable, SanitizedLLM

s = Sanitizer()
# As a runnable step in a chain
chain = PromptSanitizerRunnable(sanitizer=s) | llm | OutputParser()
result = chain.invoke("My email is dev@example.com")

# Or wrap the LLM directly
safe_llm = SanitizedLLM(llm, s)
reply = safe_llm.invoke("Contact alice@example.com with the summary.")

LlamaIndex

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.llamaindex import PromptSanitizerPostprocessor

s = Sanitizer()
postprocessor = PromptSanitizerPostprocessor(sanitizer=s)
query_engine = index.as_query_engine(node_postprocessors=[postprocessor])
response = query_engine.query("Summarize the contract for jane@example.com")

OpenAI SDK wrapper

import openai
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.openai import wrap

s = Sanitizer()
client = wrap(openai.OpenAI(), sanitizer=s)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "My card is 4111 1111 1111 1111"}],
)

FastAPI middleware

from fastapi import FastAPI
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.fastapi import SanitizerMiddleware

s = Sanitizer()
app = FastAPI()
app.add_middleware(SanitizerMiddleware, sanitizer=s, fields=["prompt", "message"])

Django middleware

MIDDLEWARE = ["prompt_sanitizer.integrations.django.SanitizerMiddleware"]
from prompt_sanitizer import Sanitizer

PROMPT_SANITIZER = {
    "sanitizer": Sanitizer(),
    "fields": ["prompt", "message"],
}

Entity types

Group Values
core PII EMAIL, PHONE, SSN, CREDIT_CARD, IBAN, IP_ADDRESS, URL, DATE
identity / org PERSON_NAME, ORGANIZATION, LOCATION
secrets API_KEY, JWT_TOKEN, SECRET_KEY, AWS_KEY, GITHUB_TOKEN, OPENAI_KEY, ANTHROPIC_KEY
extension CUSTOM

Operational notes

  • FAST mode is stdlib-only.
  • SMART lazy-loads NER on first use.
  • FULL is the best fit for synthetic replacement plus audit.
  • sanitize() is for one-shot calls.
  • session() is for reversible multi-turn workflows.
  • sanitize_batch() treats each input independently.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prompt_sanitizer-1.0.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_prompt_sanitizer-1.0.0-py3-none-any.whl (38.6 kB view details)

Uploaded Python 3

File details

Details for the file ai_prompt_sanitizer-1.0.0.tar.gz.

File metadata

  • Download URL: ai_prompt_sanitizer-1.0.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_prompt_sanitizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f6bb62a045d629d6568479a5779454557e108c69649a96a144997294ab9e8f1a
MD5 96a548c04b05d61b5cc6da4d5242cd5b
BLAKE2b-256 2359b0c832fdba4d2f4ea1dc4e9342a04eece5901f56e6476f1a44ae7b83e04e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-1.0.0.tar.gz:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_prompt_sanitizer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_prompt_sanitizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b2b813c0327829a7169f82a9e90e4c02ba569ba4e9e91b96beda8390d5430ea
MD5 02fce6da607f98dce1403449a1fb0fe9
BLAKE2b-256 c9880e07df41fcb21aedad2e7bbd24f56d1f83b65161054929ce07ac8c416d5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page