Skip to main content

Detection and redaction of Indian-specific Personally Identifiable Information (PII)

Project description

indic-pii

indic-pii is a lightweight Python library for detecting and redacting Indian-specific personally identifiable information (PII) from free-form text.

It focuses on common identifiers used in India and exposes a small API for:

  • Detecting PII spans with confidence scores
  • Redacting matched values with default or custom labels
  • Optionally using spaCy NER for names, locations, and organisations

Supported PII Types

The library currently detects:

  • Aadhaar numbers
  • PAN numbers
  • UPI IDs
  • Indian mobile numbers
  • Bank account numbers
  • IFSC codes
  • Passport numbers
  • Voter IDs
  • Driving licence numbers
  • Email addresses
  • Dates of birth

With optional NER support enabled, it can also detect:

  • Person names
  • Locations
  • Organisations

Installation

Install the base package:

pip install indic-pii

Install with optional NER support:

pip install "indic-pii[ner]"

If you want NER detection to work, you will also need a spaCy model, for example:

python -m spacy download en_core_web_sm

Quick Start

from indic_pii import PIIDetector

text = (
    "Rahul's Aadhaar is 3043 3218 1964, PAN is ABCDE1234F, "
    "and phone number is +91 9876543210."
)

detector = PIIDetector(use_ner=False)

matches = detector.detect(text)
for match in matches:
    print(match.pii_type, match.value, match.confidence)

print(detector.redact(text))

You can also use the functional API:

import indic_pii

matches = indic_pii.detect("Send to rahul@upi")
redacted = indic_pii.redact("Send to rahul@upi")

Example Output

[
    PIIMatch(type='AADHAAR', value='3043 3218 1964', span=(18, 32), confidence=1.0),
    PIIMatch(type='PAN', value='ABCDE1234F', span=(41, 51), confidence=0.9),
    PIIMatch(type='PHONE', value='+91 9876543210', span=(72, 86), confidence=0.9),
]

Redacted text:

Rahul's Aadhaar is [AADHAAR_REDACTED], PAN is [PAN_REDACTED], and phone number is [PHONE_REDACTED].

Custom Redaction Labels

from indic_pii import PIIDetector

detector = PIIDetector(use_ner=False)
text = "Email user@example.com or call 9876543210"

redacted = detector.redact(
    text,
    custom_labels={
        "EMAIL": "***",
        "PHONE": "<hidden-phone>",
    },
)

Notes

  • Aadhaar detection can validate candidates using the Verhoeff checksum to reduce false positives.
  • Bank account numbers are context-aware and require nearby account-related keywords.
  • NER support is optional and degrades gracefully if spaCy or a compatible model is unavailable.
  • This library is regex-first and intended for practical text sanitisation workflows, not as a formal compliance guarantee.

Public API

Main exports:

  • PIIDetector
  • PIIMatch
  • indic_pii.detect(...)
  • indic_pii.redact(...)
  • indic_pii.ner

Python Support

indic-pii supports Python 3.8 and newer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indic_pii-0.1.1.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indic_pii-0.1.1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file indic_pii-0.1.1.tar.gz.

File metadata

  • Download URL: indic_pii-0.1.1.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for indic_pii-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c5ecfef9f776a5ce0c08634f8e7fe97767fe856fe68a0e753ca5285d437ee23b
MD5 b75d88a7d782176f52782dd208467e14
BLAKE2b-256 34bf7ff0543bcf9e50619e788592147336a05644da24830637a714e73306dc6b

See more details on using hashes here.

File details

Details for the file indic_pii-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: indic_pii-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for indic_pii-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f36677c18a55c30e7f8801098b60a4e438fe5ad9fc0e4ca5f2f265d101fe60a
MD5 21fe9d49eaaca0d9e3a84b622002077c
BLAKE2b-256 8436e2399d8c9904ab5adcbb3165c16d22c73f276023fe80ebc45a4825834290

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page