Skip to main content

Rust-backed PII detection and masking plugin for MCP Gateway

Project description

PII Filter (Rust)

High-performance PII detection and masking library for ContextForge.

Features

  • Detects 12+ PII types (SSN, email, credit cards, phone numbers, and more)
  • Built-in detectors follow default_mask_strategy and default to redact
  • Multiple masking strategies (redact, partial, hash, tokenize, remove)
  • Parallel regex matching with RegexSet (5-10x faster than Python)
  • Zero-copy operations for nested JSON/dict traversal
  • Whitelist support for false positive filtering
  • Deterministic overlap resolution: earliest match wins, then the longest match wins
  • Structural validation for SSNs and common card issuer ranges to reduce false positives
  • Explicit guardrails for oversized inputs and pathological custom patterns

Build

make install

Runtime Requirements

This plugin depends on cpex>=0.1.0rc1,<0.2 and imports hook models from cpex.framework. The compiled Rust extension is mandatory; there is no Python fallback implementation.

Usage

The Python plugin requires the compiled Rust extension and uses it for all detection and masking operations.

Migration Note

Version 0.2.0 intentionally changes the built-in default masking policy from partial masking to redact. Set default_mask_strategy: "partial" explicitly if you need the previous behavior.

Version 0.2.1 changes custom-pattern inheritance: when custom_patterns[].mask_strategy is omitted or set to null/None, the pattern inherits default_mask_strategy instead of forcing redact.

Version 0.2.0 also tightens the default privacy posture for observability: detection logging and detection-detail metadata are now disabled unless you opt in with log_detections: true or include_detection_details: true.

Version 0.2.1 validates default_mask_strategy and custom_patterns[].mask_strategy strictly. Invalid values that older builds silently treated as redact now fail fast during plugin initialization.

Detection Coverage

This section describes the current Rust detector behavior so users know what is intentionally matched and what is intentionally left alone. The detector is optimized to reduce noisy false positives, which means some generic identifiers are only matched when they appear with clear context labels.

Social Security Numbers (SSN)

Covers

  • Dashed US SSNs such as 123-45-6789
  • Compact 9-digit SSNs only when they appear with SSN-specific context such as SSN, Social Security, or Social Security Number
  • Structural validation that rejects impossible values such as 000-12-3456, 666-12-3456, 123-00-4567, and 123-45-0000

Does not cover

  • Bare 9-digit values without SSN context
  • Real-world identity verification or SSA-backed validation
  • Country-specific national identifiers outside the US SSN patterns

BSN (Dutch Citizen Service Number)

Covers

  • 9-digit BSNs when they appear with explicit Dutch/BSN-style context such as BSN, Citizen ID, Citizen Service Number, or Burgerservicenummer
  • Phrases such as My BSN is 123456789

Does not cover

  • Generic unlabeled 9-digit numbers
  • Generic business identifiers such as order numbers, invoice numbers, or tracking numbers unless they also use BSN-specific wording
  • Validation against authoritative Dutch registries

Credit Card Numbers

Covers

  • Common 13-19 digit card numbers with spaces or dashes
  • Luhn-valid numbers from the major issuer families currently recognized by the detector, including Visa, Mastercard, American Express, Discover, Diners Club, JCB, UnionPay, and Maestro

Does not cover

  • Numbers that fail Luhn validation
  • Arbitrary long digit strings that do not match a recognized card-prefix family
  • Full issuer-specific business rules beyond prefix and Luhn checks

Email Addresses

Covers

  • Standard email addresses such as alice@example.com
  • Full redaction by default, or partial masking when explicitly configured

Does not cover

  • Full RFC-complete email parsing
  • Mailbox ownership verification or domain reachability checks
  • Obfuscated emails such as alice at example dot com

Phone Numbers

Covers

  • Common US phone number formats such as 555-123-4567, (555) 123-4567, and 1 555 123 4567
  • International numbers with an explicit leading + and enough digits to look like an E.164-style value

Does not cover

  • Short local extensions or ambiguous local-only numbers
  • International numbers without a leading +
  • Country-by-country numbering-plan validation

IP Addresses

Covers

  • Standard IPv4 dotted-quad addresses
  • Fully expanded IPv6 addresses in the eight-group hexadecimal form

Does not cover

  • Shorthand IPv6 forms such as 2001:db8::1
  • Hostnames, URLs, or CIDR ranges
  • Private/public classification or network reachability checks

Dates of Birth

Covers

  • Explicitly labeled date-of-birth phrases such as DOB: 01/15/1990
  • Unlabeled dates in MM/DD/YYYY or MM-DD-YYYY form within the configured year range

Does not cover

  • Locale-specific date parsing beyond the built-in patterns
  • Natural-language dates such as 15 January 1990
  • Any proof that a matched date is actually a birth date when no DOB-style label is present

Passport Numbers

Covers

  • Passport identifiers only when they appear with explicit passport context such as Passport, Passport No, or Passport Number
  • Label-plus-value matches such as Passport Number: AB123456

Does not cover

  • Standalone alphanumeric IDs without passport wording
  • Country-specific passport validation rules
  • Broader travel-document types that do not use passport labels

Driver's License Numbers

Covers

  • Driver's license values with explicit labels such as DL, License, or Driver's License

Does not cover

  • Unlabeled alphanumeric identifiers
  • State-by-state or country-by-country license validation rules
  • Vehicle registration numbers or other transport-related IDs

Bank Account Numbers

Covers

  • Account numbers when they appear with explicit account-style context such as Account, Acct, Bank Account, or Account Number
  • IBAN-like values that match the built-in pattern

Does not cover

  • Bare 8-17 digit values without account context
  • Full IBAN country validation or checksum verification
  • Routing-number-only detection

Medical Record Numbers

Covers

  • Explicitly labeled medical record identifiers such as MRN or Medical Record

Does not cover

  • Unlabeled healthcare identifiers
  • Insurance member IDs, prescription IDs, or other healthcare-adjacent identifiers unless added through custom patterns
  • Validation against provider or hospital systems

Custom Patterns

Covers

  • User-defined regex patterns for organization-specific identifiers
  • Explicit per-pattern masking strategies
  • Guardrails that reject patterns that are too long or too complex for maintainable admin-authored configuration

Does not cover

  • Unlimited regex expressiveness
  • Automatic tuning of custom patterns for precision or recall
  • Protection against poor pattern choices that are syntactically valid but semantically too broad

Custom patterns are intended for trusted operators editing plugin configuration, not untrusted end-user input. The Rust implementation relies on the regex crate, which avoids catastrophic backtracking during matching, and then applies additional length and complexity limits to keep custom expressions readable and cheap to compile.

Security Notes

  • Detection logging is disabled by default. Enable it only if your logging pipeline is allowed to receive derived PII metadata.
  • Detection-detail metadata is disabled by default. Enable include_detection_details only if downstream consumers are allowed to inspect detected type/count summaries.
  • Whitelist patterns are compiled case-insensitively.
  • Custom patterns must stay within basic length and complexity limits and are meant for trusted admin-authored configuration.
  • Very large strings and oversized nested collections are rejected instead of being scanned indefinitely.

Masking Notes

  • HASH masking emits the first 16 hexadecimal characters of a salted SHA-256 digest, for example [HASH:8f434346648f6b96].
  • Earlier releases emitted 8 hexadecimal characters. Update downstream parsers if they assumed the shorter fixed-width placeholder.

Testing

# Rust unit tests
make test

# Python tests
make test-python

# Benchmarks
make bench

Performance

Expected 5-10x speedup over Python implementation for typical payloads.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_pii_filter-0.3.0.tar.gz (115.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_pii_filter-0.3.0-cp311-abi3-win_amd64.whl (845.8 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl (917.8 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl (961.5 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl (950.0 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl (851.7 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_pii_filter-0.3.0-cp311-abi3-macosx_11_0_arm64.whl (812.5 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_pii_filter-0.3.0.tar.gz.

File metadata

  • Download URL: cpex_pii_filter-0.3.0.tar.gz
  • Upload date:
  • Size: 115.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_pii_filter-0.3.0.tar.gz
Algorithm Hash digest
SHA256 81806f253303ce79ad7ec8944b717c2a504fe36807ec104d17935f615fcd93b4
MD5 ded72dd20a18213ad9b7cee755875ea9
BLAKE2b-256 57b12249e03a37ce2f177d6ffe9e773b3930261253e308a54a96c7f034ba7c65

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_pii_filter-0.3.0.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_pii_filter-0.3.0-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_pii_filter-0.3.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 9558f2d1cedc3aaee6058d02e02fe5ddf8bbf2dfc4ed275d7a90e4a9ad41ceb1
MD5 5528b053cd8a1b3ce0288ffbae06533b
BLAKE2b-256 431395ee2da4be361dc27313f72136047ee08dfce3b64494a6b598917a3d59fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_pii_filter-0.3.0-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ed4be744534679e5d56ef67079ff99057d8e74fe345a5768b8b90227b3eebc79
MD5 fae436f4ccb6dfb1c1d5133c8bf8d8ce
BLAKE2b-256 2f5f4c2c59bf297295cf2a72edf1ee165fa4f44bbd77471964b9d71f71ed0d09

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 1250f16bc1b82d6926dfefc1795d454f7bb30e8f51b03413b99c02f075df80c0
MD5 347cd18abe10dd1ae086976ad0dcfab9
BLAKE2b-256 2cceb0390f2696511fca4e5c1802a3387337f9ec6716dcbad43939d1b5dc0db7

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 42cd4c97c05bc9605cbc601b6e8fd1c48951161cd91dc860d8ce4eb85b67915b
MD5 bad658d63c799ca19920dc4f5c500b83
BLAKE2b-256 a8f843576da2b2d7e58c75e0da26bbcb47656a6c456035467b3c7c0dab93a1ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 7df41ce2feb6b9e9263c3df28019cd5af515d20efa96f9986712b9b737a16e18
MD5 00f83a90d158316d5415f7b7b9f46acc
BLAKE2b-256 67134710f9d05a14c6b568ee7174f811bebab3147a5a3177004bb1c847049573

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_pii_filter-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_pii_filter-0.3.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_pii_filter-0.3.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cd7003dfe1cf8a2e68aa7c4bd5a261e05829d299558cfa2739df1779080241ee
MD5 a23c1a34ca89db21e6a628c6dc771bf9
BLAKE2b-256 067d9c118895e4d27c1b09ee9e22d36f03a0906a113a6fb4ebc31f9157e91f2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_pii_filter-0.3.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page