Rust-backed PII detection and masking plugin for MCP Gateway
Project description
PII Filter (Rust)
High-performance PII detection and masking library for ContextForge.
Features
- Detects 12+ PII types (SSN, email, credit cards, phone numbers, and more)
- Built-in detectors default to
redactmasking - Multiple masking strategies (redact, partial, hash, tokenize, remove)
- Parallel regex matching with RegexSet (5-10x faster than Python)
- Zero-copy operations for nested JSON/dict traversal
- Whitelist support for false positive filtering
- Deterministic overlap resolution: earliest match wins, then the longest match wins
- Structural validation for SSNs and common card issuer ranges to reduce false positives
- Explicit guardrails for oversized inputs and pathological custom patterns
Build
make install
Usage
The Python plugin requires the compiled Rust extension and uses it for all detection and masking operations.
Migration Note
Version 0.2.0 intentionally changes the built-in default masking policy from partial masking to redact. Set default_mask_strategy: "partial" explicitly if you need the previous behavior.
Version 0.2.0 also tightens the default privacy posture for observability: detection logging and detection-detail metadata are now disabled unless you opt in with log_detections: true or include_detection_details: true.
Detection Coverage
This section describes the current Rust detector behavior so users know what is intentionally matched and what is intentionally left alone. The detector is optimized to reduce noisy false positives, which means some generic identifiers are only matched when they appear with clear context labels.
Social Security Numbers (SSN)
Covers
- Dashed US SSNs such as
123-45-6789 - Compact 9-digit SSNs only when they appear with SSN-specific context such as
SSN,Social Security, orSocial Security Number - Structural validation that rejects impossible values such as
000-12-3456,666-12-3456,123-00-4567, and123-45-0000
Does not cover
- Bare 9-digit values without SSN context
- Real-world identity verification or SSA-backed validation
- Country-specific national identifiers outside the US SSN patterns
BSN (Dutch Citizen Service Number)
Covers
- 9-digit BSNs when they appear with explicit Dutch/BSN-style context such as
BSN,Citizen ID,Citizen Service Number, orBurgerservicenummer - Phrases such as
My BSN is 123456789
Does not cover
- Generic unlabeled 9-digit numbers
- Generic business identifiers such as order numbers, invoice numbers, or tracking numbers unless they also use BSN-specific wording
- Validation against authoritative Dutch registries
Credit Card Numbers
Covers
- Common 13-19 digit card numbers with spaces or dashes
- Luhn-valid numbers from the major issuer families currently recognized by the detector, including Visa, Mastercard, American Express, Discover, Diners Club, JCB, UnionPay, and Maestro
Does not cover
- Numbers that fail Luhn validation
- Arbitrary long digit strings that do not match a recognized card-prefix family
- Full issuer-specific business rules beyond prefix and Luhn checks
Email Addresses
Covers
- Standard email addresses such as
alice@example.com - Full redaction by default, or partial masking when explicitly configured
Does not cover
- Full RFC-complete email parsing
- Mailbox ownership verification or domain reachability checks
- Obfuscated emails such as
alice at example dot com
Phone Numbers
Covers
- Common US phone number formats such as
555-123-4567,(555) 123-4567, and1 555 123 4567 - International numbers with an explicit leading
+and enough digits to look like an E.164-style value
Does not cover
- Short local extensions or ambiguous local-only numbers
- International numbers without a leading
+ - Country-by-country numbering-plan validation
IP Addresses
Covers
- Standard IPv4 dotted-quad addresses
- Fully expanded IPv6 addresses in the eight-group hexadecimal form
Does not cover
- Shorthand IPv6 forms such as
2001:db8::1 - Hostnames, URLs, or CIDR ranges
- Private/public classification or network reachability checks
Dates of Birth
Covers
- Explicitly labeled date-of-birth phrases such as
DOB: 01/15/1990 - Unlabeled dates in
MM/DD/YYYYorMM-DD-YYYYform within the configured year range
Does not cover
- Locale-specific date parsing beyond the built-in patterns
- Natural-language dates such as
15 January 1990 - Any proof that a matched date is actually a birth date when no DOB-style label is present
Passport Numbers
Covers
- Passport identifiers only when they appear with explicit passport context such as
Passport,Passport No, orPassport Number - Label-plus-value matches such as
Passport Number: AB123456
Does not cover
- Standalone alphanumeric IDs without passport wording
- Country-specific passport validation rules
- Broader travel-document types that do not use passport labels
Driver's License Numbers
Covers
- Driver's license values with explicit labels such as
DL,License, orDriver's License
Does not cover
- Unlabeled alphanumeric identifiers
- State-by-state or country-by-country license validation rules
- Vehicle registration numbers or other transport-related IDs
Bank Account Numbers
Covers
- Account numbers when they appear with explicit account-style context such as
Account,Acct,Bank Account, orAccount Number - IBAN-like values that match the built-in pattern
Does not cover
- Bare 8-17 digit values without account context
- Full IBAN country validation or checksum verification
- Routing-number-only detection
Medical Record Numbers
Covers
- Explicitly labeled medical record identifiers such as
MRNorMedical Record
Does not cover
- Unlabeled healthcare identifiers
- Insurance member IDs, prescription IDs, or other healthcare-adjacent identifiers unless added through custom patterns
- Validation against provider or hospital systems
Custom Patterns
Covers
- User-defined regex patterns for organization-specific identifiers
- Explicit per-pattern masking strategies
- Guardrails that reject patterns that are too long or too complex for maintainable admin-authored configuration
Does not cover
- Unlimited regex expressiveness
- Automatic tuning of custom patterns for precision or recall
- Protection against poor pattern choices that are syntactically valid but semantically too broad
Custom patterns are intended for trusted operators editing plugin configuration, not untrusted end-user input. The Rust implementation relies on the regex crate, which avoids catastrophic backtracking during matching, and then applies additional length and complexity limits to keep custom expressions readable and cheap to compile.
Security Notes
- Detection logging is disabled by default. Enable it only if your logging pipeline is allowed to receive derived PII metadata.
- Detection-detail metadata is disabled by default. Enable
include_detection_detailsonly if downstream consumers are allowed to inspect detected type/count summaries. - Whitelist patterns are compiled case-insensitively.
- Custom patterns must stay within basic length and complexity limits and are meant for trusted admin-authored configuration.
- Very large strings and oversized nested collections are rejected instead of being scanned indefinitely.
Masking Notes
HASHmasking emits the first 16 hexadecimal characters of a salted SHA-256 digest, for example[HASH:8f434346648f6b96].- Earlier releases emitted 8 hexadecimal characters. Update downstream parsers if they assumed the shorter fixed-width placeholder.
Testing
# Rust unit tests
make test
# Python tests
make test-python
# Benchmarks
make bench
Performance
Expected 5-10x speedup over Python implementation for typical payloads.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cpex_pii_filter-0.2.0.tar.gz.
File metadata
- Download URL: cpex_pii_filter-0.2.0.tar.gz
- Upload date:
- Size: 58.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9e004fe7bb30ed0b91bfd24ec76cc37df5ebeca7abd2a3d3c1e1218e1a864e2
|
|
| MD5 |
2cb101f8e65a4c4d93a198d5041bac72
|
|
| BLAKE2b-256 |
4c390d2413537c6ba52eff1c51e9746b59d93fc8cc9149c9569666f658fec59a
|
Provenance
The following attestation bundles were made for cpex_pii_filter-0.2.0.tar.gz:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_pii_filter-0.2.0.tar.gz -
Subject digest:
e9e004fe7bb30ed0b91bfd24ec76cc37df5ebeca7abd2a3d3c1e1218e1a864e2 - Sigstore transparency entry: 1226876258
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file cpex_pii_filter-0.2.0-cp311-abi3-win_amd64.whl.
File metadata
- Download URL: cpex_pii_filter-0.2.0-cp311-abi3-win_amd64.whl
- Upload date:
- Size: 842.6 kB
- Tags: CPython 3.11+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d649490dbd48852298522f749a19c7bc2519f35f1be432d0a29be8ccaf0c841f
|
|
| MD5 |
718a6a479357907816661414115cfa4b
|
|
| BLAKE2b-256 |
4ce6eec077e39de80024b7cd00876b11e8eebe66533f80c44c9d5cf26172c40d
|
Provenance
The following attestation bundles were made for cpex_pii_filter-0.2.0-cp311-abi3-win_amd64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_pii_filter-0.2.0-cp311-abi3-win_amd64.whl -
Subject digest:
d649490dbd48852298522f749a19c7bc2519f35f1be432d0a29be8ccaf0c841f - Sigstore transparency entry: 1226876345
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 914.2 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4e93ece0dee4832a5240cfdf7ce96242f02fd2f10fc299d0a640bf98ffd7d70
|
|
| MD5 |
73e2461aa3aa3614fc337b46200f9121
|
|
| BLAKE2b-256 |
93aa494adca465038729c3644ce74259bffa997cf9ec2be924cd2639e4813968
|
Provenance
The following attestation bundles were made for cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
e4e93ece0dee4832a5240cfdf7ce96242f02fd2f10fc299d0a640bf98ffd7d70 - Sigstore transparency entry: 1226876384
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl.
File metadata
- Download URL: cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl
- Upload date:
- Size: 960.9 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ s390x
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ecdeec96eae1d2bb8a5c1e5f4f18c30e818aad692ad4c6b03875cb19c660235
|
|
| MD5 |
5269b6ad9edb7ddc034f00f7d309ec90
|
|
| BLAKE2b-256 |
8be0a020e8bf52db114ad500d150f89f5843edf1e0e7306c00d658f9071de862
|
Provenance
The following attestation bundles were made for cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl -
Subject digest:
5ecdeec96eae1d2bb8a5c1e5f4f18c30e818aad692ad4c6b03875cb19c660235 - Sigstore transparency entry: 1246212936
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@6ca511b767c133544dffd979cb56d03c90ad8417 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@6ca511b767c133544dffd979cb56d03c90ad8417 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl.
File metadata
- Download URL: cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl
- Upload date:
- Size: 945.8 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ ppc64le
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2ac55be93b60248b6669882f58df2a1f466f4c6371739ed39a615ea2dd7cdfd
|
|
| MD5 |
ae7d87a02b7b853198dddd41a06f11fe
|
|
| BLAKE2b-256 |
e1960604ebbc19835a0b8dadf85e890c9feabf9bb8cf5e6f5b6f70ba5c3ff3fe
|
Provenance
The following attestation bundles were made for cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl -
Subject digest:
e2ac55be93b60248b6669882f58df2a1f466f4c6371739ed39a615ea2dd7cdfd - Sigstore transparency entry: 1246212958
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@6ca511b767c133544dffd979cb56d03c90ad8417 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@6ca511b767c133544dffd979cb56d03c90ad8417 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl.
File metadata
- Download URL: cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl
- Upload date:
- Size: 847.4 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eeade33884053a7b8a3f527f48d713e1e4a54faa5e9021dcf69231fb6957d90
|
|
| MD5 |
c372d4f1883d1c96cd0fd641845ec411
|
|
| BLAKE2b-256 |
d98431487a81be14123113b7d6170e37da0915aecab904bc4529d6e9a77550a4
|
Provenance
The following attestation bundles were made for cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_pii_filter-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl -
Subject digest:
4eeade33884053a7b8a3f527f48d713e1e4a54faa5e9021dcf69231fb6957d90 - Sigstore transparency entry: 1246212950
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@6ca511b767c133544dffd979cb56d03c90ad8417 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@6ca511b767c133544dffd979cb56d03c90ad8417 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file cpex_pii_filter-0.2.0-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: cpex_pii_filter-0.2.0-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 805.0 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cd21889ebec41d5bb8081496457c55771543cd644f7ab4c06b96c6625e9ff3a
|
|
| MD5 |
db5da18d4031b5fa41e0b5d17e5a6281
|
|
| BLAKE2b-256 |
9bb096d254f77fc3c52f4eff5a3e160a77b12280f665d526e1537bf27aba3b26
|
Provenance
The following attestation bundles were made for cpex_pii_filter-0.2.0-cp311-abi3-macosx_11_0_arm64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_pii_filter-0.2.0-cp311-abi3-macosx_11_0_arm64.whl -
Subject digest:
1cd21889ebec41d5bb8081496457c55771543cd644f7ab4c06b96c6625e9ff3a - Sigstore transparency entry: 1226876302
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@eb3cb52be485be00f65dc71fbbc4aa359414cec1 -
Trigger Event:
workflow_dispatch
-
Statement type: