Skip to main content

Sub-millisecond PII redaction for prompts before they reach an LLM.

Project description

maskprompt

Sub-millisecond PII redaction for prompts before they reach an LLM. Rust core, Python frontend.

The problem

Compliance wants you to scrub PII before it leaves your VPC and hits an external LLM API. Pure-Python regex passes are not nothing, but they cost hundreds of microseconds per prompt at the p99 and they make the security team nervous because the patterns drift.

maskprompt runs the standard PII detectors (email, phone, credit card with Luhn, SSN, IP addresses, AWS keys, GitHub PATs, JWT) plus your own keyword/phrase lists in a single Aho-Corasick + regex pass, and replaces matches with one of four configurable strategies. The whole thing runs in microseconds at typical prompt sizes.

Install

pip install maskprompt

30-second quickstart

from maskprompt import Masker, BuiltinRule, Strategy

masker = Masker(
    builtins=[BuiltinRule.EMAIL, BuiltinRule.CREDIT_CARD, BuiltinRule.US_SSN],
    custom={"customer": ["Acme Corp"]},  # your own labels
)

text = "Email me at alice@example.com about Acme Corp invoice 4111-1111-1111-1111."
result = masker.mask(text, strategy=Strategy.TAG)

print(result.masked)
# Email me at <EMAIL> about <CUSTOMER> invoice <CREDIT_CARD>.

for m in result.matches:
    print(m.kind, m.start, m.end)

Strategies

Strategy Replacement Use it when
Strategy.TAG <EMAIL> Default. The LLM sees the type but not the value.
Strategy.HASH <EMAIL:abc12345> You need to track "the same redacted value showed up again" without recovering it. blake3 over the original, truncated to 8 hex chars.
Strategy.FIXED ███████ Length-preserving for visual cues.
Strategy.REMOVE (empty) When even the type is too much information.

Built-in detectors

Rule Catches
EMAIL RFC-5322-ish addresses
US_PHONE US 10-digit and +1 formats
US_SSN XXX-XX-XXXX
IPV4 dotted quad
IPV6 :: and full forms
CREDIT_CARD 13–19-digit candidates that pass Luhn
AWS_ACCESS_KEY AKIA… 20-char keys
GITHUB_TOKEN ghp_/gho_/ghu_/ghr_/ghs_…
JWT three base64url segments separated by .

Pick the subset you want; unmentioned detectors are off.

Custom keywords

custom={"label": ["needle1", "needle2"]} labels are case-insensitive and match on word-character boundaries. Pass several labels to tag distinct groups ({"customer": [...], "internal_project": [...]}).

License

Dual-licensed under MIT or Apache-2.0 at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maskprompt-0.1.1-cp310-abi3-macosx_11_0_arm64.whl (735.3 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file maskprompt-0.1.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maskprompt-0.1.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0ce30f02a5a79be21d40ae8f19d5e13a11d57ad97a6a725cfcc126bfdc6eaa68
MD5 5ce28afdf82279189fd35b6ee387a904
BLAKE2b-256 4da6ce616859e9f75ae3065d9196db376d969b5b1227e49206f41afd89bbf09e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page