Detection and redaction of Indian-specific Personally Identifiable Information (PII)
Project description
indic-pii
indic-pii is a lightweight Python library for detecting and redacting Indian-specific personally identifiable information (PII) from free-form text.
It focuses on common identifiers used in India and exposes a small API for:
- Detecting PII spans with confidence scores
- Redacting matched values with default or custom labels
- Optionally using spaCy NER for names, locations, and organisations
Supported PII Types
The library currently detects:
- Aadhaar numbers
- PAN numbers
- UPI IDs
- Indian mobile numbers
- Bank account numbers
- IFSC codes
- Passport numbers
- Voter IDs
- Driving licence numbers
- Email addresses
- Dates of birth
With optional NER support enabled, it can also detect:
- Person names
- Locations
- Organisations
Installation
Install the base package:
pip install indic-pii
Install with optional NER support:
pip install "indic-pii[ner]"
If you want NER detection to work, you will also need a spaCy model, for example:
python -m spacy download en_core_web_sm
Quick Start
from indic_pii import PIIDetector
text = (
"Rahul's Aadhaar is 3043 3218 1964, PAN is ABCDE1234F, "
"and phone number is +91 9876543210."
)
detector = PIIDetector(use_ner=False)
matches = detector.detect(text)
for match in matches:
print(match.pii_type, match.value, match.confidence)
print(detector.redact(text))
You can also use the functional API:
import indic_pii
matches = indic_pii.detect("Send to rahul@upi")
redacted = indic_pii.redact("Send to rahul@upi")
Example Output
[
PIIMatch(type='AADHAAR', value='3043 3218 1964', span=(18, 32), confidence=1.0),
PIIMatch(type='PAN', value='ABCDE1234F', span=(41, 51), confidence=0.9),
PIIMatch(type='PHONE', value='+91 9876543210', span=(72, 86), confidence=0.9),
]
Redacted text:
Rahul's Aadhaar is [AADHAAR_REDACTED], PAN is [PAN_REDACTED], and phone number is [PHONE_REDACTED].
Custom Redaction Labels
from indic_pii import PIIDetector
detector = PIIDetector(use_ner=False)
text = "Email user@example.com or call 9876543210"
redacted = detector.redact(
text,
custom_labels={
"EMAIL": "***",
"PHONE": "<hidden-phone>",
},
)
Notes
- Aadhaar detection can validate candidates using the Verhoeff checksum to reduce false positives.
- Bank account numbers are context-aware and require nearby account-related keywords.
- NER support is optional and degrades gracefully if spaCy or a compatible model is unavailable.
- This library is regex-first and intended for practical text sanitisation workflows, not as a formal compliance guarantee.
Public API
Main exports:
PIIDetectorPIIMatchindic_pii.detect(...)indic_pii.redact(...)indic_pii.ner
Python Support
indic-pii supports Python 3.8 and newer.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indic_pii-0.1.1.tar.gz.
File metadata
- Download URL: indic_pii-0.1.1.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5ecfef9f776a5ce0c08634f8e7fe97767fe856fe68a0e753ca5285d437ee23b
|
|
| MD5 |
b75d88a7d782176f52782dd208467e14
|
|
| BLAKE2b-256 |
34bf7ff0543bcf9e50619e788592147336a05644da24830637a714e73306dc6b
|
File details
Details for the file indic_pii-0.1.1-py3-none-any.whl.
File metadata
- Download URL: indic_pii-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f36677c18a55c30e7f8801098b60a4e438fe5ad9fc0e4ca5f2f265d101fe60a
|
|
| MD5 |
21fe9d49eaaca0d9e3a84b622002077c
|
|
| BLAKE2b-256 |
8436e2399d8c9904ab5adcbb3165c16d22c73f276023fe80ebc45a4825834290
|