Skip to main content

Multi-language PII detection with regex and checksum validation. Pure Python, zero runtime dependencies.

Project description

pii-core

Multi-language PII detection with regex and checksum validation. Pure Python, zero runtime dependencies.

The foundation library that pii-veil (reversible anonymization) and pii-presidio (Microsoft Presidio plugin) build on.

Install

pip install pii-core

What's in v0.1.0

Polish identifiers with regex + checksum validation:

  • PESEL (national ID), NIP (tax ID), REGON (business registry — 9 and 14 digit) — weighted-sum checksums.
  • Polish IBAN (PL prefix, mod-97 validated).
  • Polish ID card (dowód osobisty), passport — regex only (official checksums not yet implemented).
  • Polish mobile phone (+48, optional separators).
  • Opt-in: KRS court register number (10 digits, no checksum), postal code (XX-XXX). Excluded from DEFAULT_DETECTORS because their raw patterns collide with ordinary text — pair them with a context-word filter.

Cross-language detectors:

  • Email addresses (practical subset, not strict RFC 5322).
  • Credit-card numbers (Luhn-validated; bare, dashed, and spaced shapes for Visa / MC / Amex / Discover / Diners).

Multi-country IBAN validator (is_valid_iban) covering ~80 countries via the published SWIFT registry. The PlIbanDetector regex is Polish-only, but the checksum function is general.

Quick usage

from pii_core import DEFAULT_DETECTORS

text = "Mój PESEL: 44051401358, kontakt: jan@example.pl."

for detector in DEFAULT_DETECTORS:
    for match in detector.detect(text):
        print(f"{match.detector}: {match.value!r} at {match.start}-{match.end}")
from pii_core import is_valid_pesel, is_valid_iban, is_valid_luhn

is_valid_pesel("44051401358")            # True
is_valid_iban("DE89370400440532013000")  # True
is_valid_luhn("4111111111111111")        # True

Opt-in detectors (KRS, postal code) live in pii_core.pl:

from pii_core import DEFAULT_DETECTORS
from pii_core.pl import PlKrsDetector, PlPostalCodeDetector

# Add them only when you have context-word filtering elsewhere in your pipeline.
my_detectors = [*DEFAULT_DETECTORS, PlKrsDetector(), PlPostalCodeDetector()]

API stability

PIIType value strings, detector .name strings, and the order of DEFAULT_DETECTORS are SemVer-stable: downstream consumers persist them in serialized mappings or use them as overlap-resolution priority keys. Internal changes (regex tweaks, helper renames) can churn within minor versions.

Sibling packages

  • pii-veil — reversible anonymization with persisted mapping and CLI, built on pii-core.
  • pii-presidio — Microsoft Presidio plugin wrapping pii-core recognizers with optional reversible anonymization.

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_core-0.1.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_core-0.1.0-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file pii_core-0.1.0.tar.gz.

File metadata

  • Download URL: pii_core-0.1.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pii_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f27e66c42f4ef4948a1da751c677e214ccc083c322b38b9a4111a194f171c860
MD5 a5017244d6280e9040cb2ea3ad19b9c1
BLAKE2b-256 087d79ea3bb93960c7f0cd7f6f4e44480ff9253a8f3be554bac6bb0d8b7a7511

See more details on using hashes here.

File details

Details for the file pii_core-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pii_core-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pii_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 859d6e4fb7c0b3093687b5d09b68e287d39a65fc494b54eada96d07d2523a222
MD5 4bfb30e39b76d885acc666116c75ecc8
BLAKE2b-256 a02284ac97c58607acb1da8ad255a66277a58103e604d91de9516f20f437777a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page