Multi-language PII detection with regex and checksum validation. Pure Python, zero runtime dependencies.
Project description
pii-core
Multi-language PII detection with regex and checksum validation. Pure Python, zero runtime dependencies.
The foundation library that pii-veil (reversible anonymization) and pii-presidio (Microsoft Presidio plugin) build on.
Install
pip install pii-core
What's in v0.1.0
Polish identifiers with regex + checksum validation:
- PESEL (national ID), NIP (tax ID), REGON (business registry — 9 and 14 digit) — weighted-sum checksums.
- Polish IBAN (
PLprefix, mod-97 validated). - Polish ID card (dowód osobisty), passport — regex only (official checksums not yet implemented).
- Polish mobile phone (
+48, optional separators). - Opt-in: KRS court register number (10 digits, no checksum), postal code (
XX-XXX). Excluded fromDEFAULT_DETECTORSbecause their raw patterns collide with ordinary text — pair them with a context-word filter.
Cross-language detectors:
- Email addresses (practical subset, not strict RFC 5322).
- Credit-card numbers (Luhn-validated; bare, dashed, and spaced shapes for Visa / MC / Amex / Discover / Diners).
Multi-country IBAN validator (is_valid_iban) covering ~80 countries via the published SWIFT registry. The PlIbanDetector regex is Polish-only, but the checksum function is general.
Quick usage
from pii_core import DEFAULT_DETECTORS
text = "Mój PESEL: 44051401358, kontakt: jan@example.pl."
for detector in DEFAULT_DETECTORS:
for match in detector.detect(text):
print(f"{match.detector}: {match.value!r} at {match.start}-{match.end}")
from pii_core import is_valid_pesel, is_valid_iban, is_valid_luhn
is_valid_pesel("44051401358") # True
is_valid_iban("DE89370400440532013000") # True
is_valid_luhn("4111111111111111") # True
Opt-in detectors (KRS, postal code) live in pii_core.pl:
from pii_core import DEFAULT_DETECTORS
from pii_core.pl import PlKrsDetector, PlPostalCodeDetector
# Add them only when you have context-word filtering elsewhere in your pipeline.
my_detectors = [*DEFAULT_DETECTORS, PlKrsDetector(), PlPostalCodeDetector()]
API stability
PIIType value strings, detector .name strings, and the order of DEFAULT_DETECTORS are SemVer-stable: downstream consumers persist them in serialized mappings or use them as overlap-resolution priority keys. Internal changes (regex tweaks, helper renames) can churn within minor versions.
Sibling packages
pii-veil— reversible anonymization with persisted mapping and CLI, built onpii-core.pii-presidio— Microsoft Presidio plugin wrappingpii-corerecognizers with optional reversible anonymization.
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pii_core-0.1.0.tar.gz.
File metadata
- Download URL: pii_core-0.1.0.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f27e66c42f4ef4948a1da751c677e214ccc083c322b38b9a4111a194f171c860
|
|
| MD5 |
a5017244d6280e9040cb2ea3ad19b9c1
|
|
| BLAKE2b-256 |
087d79ea3bb93960c7f0cd7f6f4e44480ff9253a8f3be554bac6bb0d8b7a7511
|
File details
Details for the file pii_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pii_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
859d6e4fb7c0b3093687b5d09b68e287d39a65fc494b54eada96d07d2523a222
|
|
| MD5 |
4bfb30e39b76d885acc666116c75ecc8
|
|
| BLAKE2b-256 |
a02284ac97c58607acb1da8ad255a66277a58103e604d91de9516f20f437777a
|