Lightweight, tiered, bidirectional PII sanitizer for LLM pipelines

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jeslor-04_1

These details have not been verified by PyPI

Project description

prompt-sanitizer

PII and secret sanitization for Python LLM pipelines.

prompt-sanitizer provides a typed API for detecting, redacting, anonymizing, and restoring sensitive values before they reach a model, tool, middleware layer, log sink, or SDK wrapper. FAST mode has zero required dependencies. SMART and FULL add optional NLP, synthetic replacement, and audit logging.

Install

Python 3.10+.

pip install prompt-sanitizer
pip install "prompt-sanitizer[nlp]"
pip install "prompt-sanitizer[synthetic]"
pip install "prompt-sanitizer[integrations]"
pip install "prompt-sanitizer[all]"

Optional extras

Extra	Adds	Typical use
`nlp`	`transformers` + `torch`	NER in SMART/FULL mode
`synthetic`	`faker`	realistic fake replacements
`integrations`	framework / SDK adapters	LangChain, LlamaIndex, OpenAI, FastAPI, Django
`all`	all extras	full feature set

Quick start

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
result = s.sanitize("Contact Jane Doe at jane@example.com or 415-555-0112.")

print(result.text)
print(result.has_pii)
print(result.risk_score)
print(result.tokens)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Modes

Mode	Pipeline	Dependencies	Notes
`Mode.FAST`	regex + secret detectors	none	sub-ms, stdlib only
`Mode.SMART`	FAST + Piiranha NER	`prompt-sanitizer[nlp]`	lazy-loads on first call
`Mode.FULL`	SMART + synthetic replacement + audit log	usually `nlp` + `synthetic`	best for compliance-oriented flows

FAST mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.FAST)
text = "SSN 078-05-1120, card 4111 1111 1111 1111, token sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
result = s.sanitize(text)

print(result.text)
print(result.entities)
print(result.tokens)

Use FAST for prompt pre-processing, log scrubbing, middleware guards, CI checks, and zero-dependency CLI tooling.

SMART mode

from prompt_sanitizer import Sanitizer, Mode

s = Sanitizer(mode=Mode.SMART)
result = s.sanitize(
    "Alice from Acme Corp met us in Berlin on 2025-02-14. Email alice@acme.example."
)

print(result.text)
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.confidence)

Use SMART when prompts contain free-form prose with names, organizations, dates, or locations that regexes alone may miss.

FULL mode

from prompt_sanitizer import Sanitizer, Mode, SQLiteAuditLog

audit = SQLiteAuditLog("prompt_sanitizer_audit.db")
s = Sanitizer(mode=Mode.FULL, locale="en_US", on_detect="redact", audit_log=audit)

result = s.sanitize("Customer Jane Doe uses jane@example.com and 415-555-0112.")
print(result.text)
print(result.tokens)
print(s.audit.export(format="json"))

Use FULL when you want synthetic replacement plus an audit trail.

Public API

`Sanitizer`

Sanitizer(
    mode: Mode = Mode.FAST,
    locale: str = "en_US",
    entities: list[EntityType] | None = None,
    on_detect: str = "redact",
    audit_log: BaseAuditLog | None = None,
)

Parameter	Type	Description
`mode`	`Mode`	detection pipeline
`locale`	`str`	locale for synthetic replacement generation
`entities`	`list[EntityType] \| None`	optional allowlist of entity types
`on_detect`	`str`	`"redact"`, `"warn"`, or `"block"`
`audit_log`	`BaseAuditLog \| None`	optional audit backend

Method	Signature	Description
`sanitize`	`sanitize(text: str, session_id: str	None = None) -> SanitizeResult`
`sanitize_batch`	`sanitize_batch(texts: list[str]) -> list[SanitizeResult]`	sanitize multiple inputs
`session`	`session(session_id: str	None = None) -> Session`
`add_entity`	`add_entity(name: str, pattern: str, confidence: float = 0.85) -> None`	register a custom entity
`stream`	`stream(source: AsyncIterable, session: Session	None) -> AsyncGenerator[str, None]`
`guard`	`guard(on_detect: str) -> decorator`	decorate a function with sanitization logic
`audit`	`.audit -> BaseAuditLog	None`

Detection policy

`on_detect` value	Behavior
`"redact"`	rewrite the returned text
`"warn"`	return original text, but populate entities and scores
`"block"`	raise instead of returning sanitized text

results = s.sanitize_batch(["Email a@example.com", "No sensitive data here"])

@s.guard(on_detect="redact")
def call_model(prompt: str) -> str:
    return prompt

`Mode`, `SanitizeResult`, and `DetectedEntity`

`Mode` value	Meaning
`Mode.FAST`	regex + secrets, zero deps, sub-ms
`Mode.SMART`	FAST + Piiranha NER, lazy loads on first call
`Mode.FULL`	SMART + synthetic replacement + audit log

`SanitizeResult` attribute	Type	Description
`text`	`str`	sanitized text
`entities`	`list[DetectedEntity]`	detected spans
`tokens`	`dict[str, str]`	`{original_value: replacement}` map
`risk_score`	`float`	composite score from `0.0` to `1.0`
`has_pii`	`bool`	whether sensitive data was found

`DetectedEntity` attribute	Type	Description
`entity_type`	`EntityType`	entity classification
`value`	`str`	original matched value
`start`	`int`	inclusive start offset
`end`	`int`	exclusive end offset
`confidence`	`float`	detection confidence
`replacement`	`str \| None`	replacement value, if generated

result = s.sanitize("Contact me at sam@example.com")
assert result.has_pii is True
assert 0.0 <= result.risk_score <= 1.0
for entity in result.entities:
    print(entity.entity_type, entity.value, entity.replacement)

Sessions and vaults

Use sessions when the model should never see raw values, but the final response should restore them.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
session = s.session(session_id="support-chat-001")
clean_prompt = session.anonymize("My name is Elena Ruiz and my email is elena@company.com")
llm_reply = "Confirmed. I will email [EMAIL_1] shortly."
final_reply = session.deanonymize(llm_reply)

print(clean_prompt)
print(final_reply)

`Session` API	Description
`session.anonymize(text: str) -> str`	replace PII with vault tokens
`session.deanonymize(text: str) -> str`	restore originals from the vault
`session.vault: Vault`	access the underlying vault

`Vault` API	Description
`vault.store(value: str, replacement: str) -> None`	store a mapping
`vault.lookup(replacement: str) -> str \| None`	resolve token to original
`vault.reverse(value: str) -> str \| None`	resolve original to replacement
`vault.clear() -> None`	clear all mappings

vault = session.vault
vault.store("alice@example.com", "[EMAIL_1]")
print(vault.lookup("[EMAIL_1]"))
print(vault.reverse("alice@example.com"))
vault.clear()

Custom entities

Use add_entity() for internal identifiers, tenant-specific secrets, or domain-specific formats.

from prompt_sanitizer import Sanitizer

s = Sanitizer()
s.add_entity(name="customer_id", pattern=r"\bCUS-\d{8}\b", confidence=0.90)
s.add_entity(name="invoice_no", pattern=r"\bINV-\d{6}-[A-Z]{2}\b", confidence=0.88)

result = s.sanitize("Customer CUS-12345678 opened invoice INV-882211-US")
print(result.text)
print(result.entities)

Filtering by entity type

from prompt_sanitizer import Sanitizer, EntityType

s = Sanitizer(entities=[EntityType.EMAIL, EntityType.API_KEY])
result = s.sanitize("Email a@b.com and SSN 123-45-6789")
print(result.text)

Audit logging

Audit backends are optional. Use them when you want structured records of detections.

`MemoryAuditLog`

from prompt_sanitizer import MemoryAuditLog, Mode, Sanitizer

audit = MemoryAuditLog()
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Email finance@example.com")

print(audit.events())
print(audit.export(format="json"))

`SQLiteAuditLog`

from prompt_sanitizer import SQLiteAuditLog, Mode, Sanitizer

audit = SQLiteAuditLog("audit.db")
s = Sanitizer(mode=Mode.FULL, audit_log=audit)
s.sanitize("Call +1 415 555 0112", session_id="request-17")

print(audit.events())
print(audit.export(format="csv"))

Audit API

API	Description
`MemoryAuditLog()`	in-memory list of `AuditEvent`
`SQLiteAuditLog(path: str)`	SQLite-backed persisted log
`.events() -> list[AuditEvent]`	return recorded events
`.export(format: "json" \| "csv") -> str`	export audit records

Integrations

Install integration dependencies first:

pip install "prompt-sanitizer[integrations]"

LangChain

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.langchain import PromptSanitizerRunnable, SanitizedLLM

s = Sanitizer()
# As a runnable step in a chain
chain = PromptSanitizerRunnable(sanitizer=s) | llm | OutputParser()
result = chain.invoke("My email is dev@example.com")

# Or wrap the LLM directly
safe_llm = SanitizedLLM(llm, s)
reply = safe_llm.invoke("Contact alice@example.com with the summary.")

LlamaIndex

from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.llamaindex import PromptSanitizerPostprocessor

s = Sanitizer()
postprocessor = PromptSanitizerPostprocessor(sanitizer=s)
query_engine = index.as_query_engine(node_postprocessors=[postprocessor])
response = query_engine.query("Summarize the contract for jane@example.com")

OpenAI SDK wrapper

import openai
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.openai import wrap

s = Sanitizer()
client = wrap(openai.OpenAI(), sanitizer=s)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "My card is 4111 1111 1111 1111"}],
)

FastAPI middleware

from fastapi import FastAPI
from prompt_sanitizer import Sanitizer
from prompt_sanitizer.integrations.fastapi import SanitizerMiddleware

s = Sanitizer()
app = FastAPI()
app.add_middleware(SanitizerMiddleware, sanitizer=s, fields=["prompt", "message"])

Django middleware

MIDDLEWARE = ["prompt_sanitizer.integrations.django.SanitizerMiddleware"]

from prompt_sanitizer import Sanitizer

PROMPT_SANITIZER = {
    "sanitizer": Sanitizer(),
    "fields": ["prompt", "message"],
}

Entity types

Group	Values
core PII	`EMAIL`, `PHONE`, `SSN`, `CREDIT_CARD`, `IBAN`, `IP_ADDRESS`, `URL`, `DATE`
identity / org	`PERSON_NAME`, `ORGANIZATION`, `LOCATION`
secrets	`API_KEY`, `JWT_TOKEN`, `SECRET_KEY`, `AWS_KEY`, `GITHUB_TOKEN`, `OPENAI_KEY`, `ANTHROPIC_KEY`
extension	`CUSTOM`

Operational notes

FAST mode is stdlib-only.
SMART lazy-loads NER on first use.
FULL is the best fit for synthetic replacement plus audit.
sanitize() is for one-shot calls.
session() is for reversible multi-turn workflows.
sanitize_batch() treats each input independently.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jeslor-04_1

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.1

May 18, 2026

1.0.0

May 7, 2026

This version

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prompt_sanitizer-0.1.0.tar.gz (37.8 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_prompt_sanitizer-0.1.0-py3-none-any.whl (38.6 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file ai_prompt_sanitizer-0.1.0.tar.gz.

File metadata

Download URL: ai_prompt_sanitizer-0.1.0.tar.gz
Upload date: May 7, 2026
Size: 37.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_prompt_sanitizer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`da6dfff78da055f2d2a7e963851147466c9bf322814781597c045e265908914c`
MD5	`57d83d6c8d92ea74f2877e7dc3d4f3cb`
BLAKE2b-256	`4717b315b064d194b52fc0497c24f11ebf22941129fdffc269f3dbaf0bd0a4be`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-0.1.0.tar.gz:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_prompt_sanitizer-0.1.0.tar.gz
- Subject digest: da6dfff78da055f2d2a7e963851147466c9bf322814781597c045e265908914c
- Sigstore transparency entry: 1460833331
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: jeslor/prompt-sanitizer@842eeb106691feac502bf9822724094a1c6a5f24
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/jeslor
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@842eeb106691feac502bf9822724094a1c6a5f24
- Trigger Event: push

File details

Details for the file ai_prompt_sanitizer-0.1.0-py3-none-any.whl.

File metadata

Download URL: ai_prompt_sanitizer-0.1.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 38.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_prompt_sanitizer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dd9caa4e1f9c9816f0500b0fad7b8a3b5aead1e83a034f4092d162ec6def8a82`
MD5	`f8f16d2f6e3259aae8ad8bd4ab226503`
BLAKE2b-256	`feda7bb94f953f0c5022dd7be0b01952a3d60d91a851dceef094aaa4948ac35b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_prompt_sanitizer-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on jeslor/prompt-sanitizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_prompt_sanitizer-0.1.0-py3-none-any.whl
- Subject digest: dd9caa4e1f9c9816f0500b0fad7b8a3b5aead1e83a034f4092d162ec6def8a82
- Sigstore transparency entry: 1460833398
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: jeslor/prompt-sanitizer@842eeb106691feac502bf9822724094a1c6a5f24
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/jeslor
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@842eeb106691feac502bf9822724094a1c6a5f24
- Trigger Event: push

ai-prompt-sanitizer 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

prompt-sanitizer

Install

Optional extras

Quick start

Modes

FAST mode

SMART mode

FULL mode

Public API

Sanitizer

Detection policy

Mode, SanitizeResult, and DetectedEntity

Sessions and vaults

Custom entities

Filtering by entity type

Audit logging

MemoryAuditLog

SQLiteAuditLog

Audit API

Integrations

LangChain

LlamaIndex

OpenAI SDK wrapper

FastAPI middleware

Django middleware

Entity types

Operational notes

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`Sanitizer`

`Mode`, `SanitizeResult`, and `DetectedEntity`

`MemoryAuditLog`

`SQLiteAuditLog`