KVKK uyumlu Türkçe PII detection kütüphanesi
Project description
kvkk-pii
KVKK-compliant Turkish PII detection library — fully on-premise, no cloud.
Detect, anonymize, and protect personally identifiable information in Turkish text. Built for KVKK (Turkish data protection law) compliance, with a 3-layer architecture that combines regex, NER, and zero-shot classification.
from kvkk_pii import PiiDetector
detector = PiiDetector()
result = detector.analyze("Ali Veli, TC: 10000000146, tel: 0532 123 45 67")
for e in result.entities:
print(e)
# PiiEntity(type='TC_KIMLIK', text='10000000146', start=12, end=23, score=1.00, layer='regex')
# PiiEntity(type='TELEFON_TR', text='0532 123 45 67', start=30, end=44, score=1.00, layer='regex')
Features
- Zero cloud — all models run locally, no data leaves your machine
- 3-layer detection: Regex + checksum → XLM-RoBERTa NER → GLiNER zero-shot
- KVKK Madde 6 support — special categories: health, religion, biometrics, political opinion
- LLM proxy — mask PII before sending to AI, restore in the response, detect leakage
- Compliance report — maps detected entities to KVKK articles and risk levels
- Pluggable — add custom recognizers, tune thresholds per entity type
- Async —
AsyncPiiDetectorfor FastAPI / async applications - CLI —
kvkk-pii scan,kvkk-pii anonymize
Installation
# Layer 1 only — regex + checksum (no dependencies)
pip install kvkk-pii
# + Layer 2 — XLM-RoBERTa NER (~450 MB, Turkish NER)
pip install kvkk-pii[ner]
# + Layer 3 — GLiNER zero-shot (~180 MB, KVKK Madde 6)
pip install kvkk-pii[full]
Models are downloaded from HuggingFace on first use and cached at ~/.cache/huggingface/hub.
Quickstart
Detect & Anonymize
from kvkk_pii import PiiDetector
detector = PiiDetector() # regex only (Layer 1)
text = "Müşteri Ali Veli, IBAN: TR33 0006 1005 1978 6457 8413 26, e-posta: ali@example.com"
result = detector.analyze(text)
print(result.entities)
# [PiiEntity(type='IBAN_TR', ...), PiiEntity(type='EMAIL', ...)]
print(detector.anonymize(text))
# "Müşteri Ali Veli, IBAN: [IBAN_TR], e-posta: [EMAIL]"
With NER (Person, Location, Organization)
detector = PiiDetector(layers=["regex", "ner"])
# First run: prompts to download akdeniz27/xlm-roberta-base-turkish-ner (~450 MB)
result = detector.analyze("Ahmet Yılmaz, İstanbul'daki Türk Telekom şubesine gitti.")
# Detects: KISI_ADI (Ahmet Yılmaz), KONUM (İstanbul), KURUM (Türk Telekom)
With GLiNER — KVKK Madde 6 Special Categories
detector = PiiDetector(layers=["regex", "ner", "gliner"])
result = detector.analyze("Hasta diyabet tedavisi görüyor, Sünni mezhebine mensup.")
# Detects: SAGLIK_VERISI, DINI_INANC
Ready-Made Presets
from kvkk_pii import presets
detector = presets.turkish() # Regex + NER (TR) + GLiNER — full KVKK coverage
detector = presets.german() # Regex (DE) + GLiNER — DSGVO
detector = presets.french() # Regex (FR) + GLiNER — RGPD
detector = presets.multilingual() # TR + DE + FR together
Layer Architecture
| Layer | Method | Model | Speed | Detects |
|---|---|---|---|---|
| 1 | Regex + checksum | — | <1ms | TC Kimlik, IBAN, VKN, phone, plate, email, passport |
| 2 | NER | akdeniz27/xlm-roberta-base-turkish-ner |
~30ms | Person, Location, Organization |
| 3 | Zero-shot NER | urchade/gliner_multi-v2.1 |
~80ms | KVKK Madde 6 special categories |
Each layer only processes spans not already found by a previous layer, avoiding double-detection.
LLM Proxy
Protect PII when sending text to external AI services. Mask before sending, restore after, detect any leakage.
Session-Based Masking
detector = PiiDetector(layers=["regex", "ner"])
session = detector.create_session("Ali Veli TC: 10000000146 hakkında bilgi ver.")
masked = session.mask()
# → "[KISI_ADI_x7k] TC: [TC_KIMLIK_a3f] hakkında bilgi ver."
ai_response = call_openai(masked) # your AI call
restored = session.restore(ai_response)
# Placeholders in AI response replaced back with originals
Two-Way Proxy (mask → AI → leakage check → restore)
result = detector.two_way(
prompt="Ali Veli'nin TC numarası 10000000146, özet çıkar.",
call_fn=lambda masked: openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": masked}]
).choices[0].message.content,
on_leak="warn", # "raise" | "warn" | "ignore"
)
print(result.output) # restored AI response
print(result.report.safe) # True if no PII leaked
print(result.report.summary()) # leakage summary
Leakage Detection
from kvkk_pii import LeakageAnalyzer
analyzer = detector.leakage_analyzer()
report = analyzer.analyze(session, raw_ai_response)
report.safe # bool
report.leaked # entities that leaked through placeholders
report.new_pii # PII in AI output not present in input (hallucination?)
report.risk_score # 0.0–1.0
print(report.summary())
Compliance Report
Maps detected entities to KVKK articles with risk levels and recommendations.
report = detector.compliance_report(text)
print(report.summary())
# KVKK Uyum Raporu — 4 veri, genel risk: YÜKSEK
# KVKK Madde 6 (Özel Nitelikli Veri) tespit edildi!
#
# [KRİTİK] SAGLIK_VERISI x 1
# Dayanak: KVKK Madde 6 — Özel Nitelikli Kişisel Veri
# Öneri : Açık rıza zorunlu. Yetkili kurum olmadan işlenemez.
# [YÜKSEK] TC_KIMLIK x 1
# ...
report.has_madde6 # True if KVKK Article 6 data found
report.overall_risk # "düşük" | "orta" | "yüksek" | "kritik"
report.to_dict() # JSON-serializable
Async
from kvkk_pii import AsyncPiiDetector
detector = AsyncPiiDetector(layers=["regex", "ner"])
# FastAPI example
@app.post("/scan")
async def scan(text: str):
result = await detector.analyze(text)
return [e.__dict__ for e in result.entities]
# Parallel processing
import asyncio
results = await asyncio.gather(*[detector.analyze(t) for t in texts])
# Async two_way
result = await detector.two_way(prompt, async_call_fn)
CLI
# Scan text
kvkk-pii scan "Ali Veli TC: 10000000146"
# Scan file
kvkk-pii scan belge.txt
# Pipe
cat belge.txt | kvkk-pii scan
# With NER layer
kvkk-pii scan --layer ner "Ahmet Yılmaz İstanbul'da"
# JSON output
kvkk-pii scan --format json "TC: 10000000146"
# Anonymize
kvkk-pii anonymize "Ali Veli TC: 10000000146"
# → "Ali Veli TC: [TC_KIMLIK]"
# Version
kvkk-pii version
Custom Recognizers
from kvkk_pii import BaseRecognizer, PiiEntity
class SicilNoRecognizer(BaseRecognizer):
entity_type = "SICIL_NO"
def find(self, text: str) -> list[PiiEntity]:
import re
return [
self._entity(m.group(), m.start(), m.end(), score=1.0)
for m in re.finditer(r"\bSCL-\d{6}\b", text)
]
from kvkk_pii.layers.regex_layer import DEFAULT_RECOGNIZERS
detector = PiiDetector(recognizers=DEFAULT_RECOGNIZERS + [SicilNoRecognizer()])
Configuration
Fine-tune recognizer strictness via config dataclasses:
from kvkk_pii import PiiDetector
from kvkk_pii.config import NerConfig, GlinerConfig, TcKimlikConfig
from kvkk_pii.recognizers.tc_kimlik import TcKimlikRecognizer
from kvkk_pii.layers.regex_layer import DEFAULT_RECOGNIZERS
detector = PiiDetector(
layers=["regex", "ner", "gliner"],
recognizers=DEFAULT_RECOGNIZERS + [
TcKimlikRecognizer(TcKimlikConfig(allow_spaced=True, require_checksum=True))
],
download_policy="auto", # "confirm" (default) | "auto" | "never"
ner_config=NerConfig(
min_score=0.85, # higher = fewer false positives
chunk_size=400, # chars per chunk for long texts
),
gliner_config=GlinerConfig(
threshold=0.5,
),
)
Detected Entity Types
Layer 1 — Regex
| Entity | Description | Validation |
|---|---|---|
TC_KIMLIK |
Turkish national ID (11 digits) | Checksum |
VKN |
Tax ID (10 digits) | Checksum |
IBAN_TR |
IBAN (all country codes) | Mod97 |
KREDI_KARTI |
Credit card number | Luhn |
TELEFON_TR |
Turkish phone numbers | — |
EMAIL |
Email address | — |
IP_ADRESI |
IPv4 address | — |
PLAKA_TR |
Turkish license plate | — |
PASAPORT_TR |
Turkish passport | — |
SGK_NO |
Social security number | — |
ADRES |
Street address | — |
TARIH |
Date | — |
KISI_ADI |
Person name (title-based) | — |
Layer 2 — NER (akdeniz27/xlm-roberta-base-turkish-ner)
| Entity | Description |
|---|---|
KISI_ADI |
Person name |
KONUM |
Location |
KURUM |
Organization |
Layer 3 — GLiNER (urchade/gliner_multi-v2.1, KVKK Madde 6)
| Entity | KVKK Article |
|---|---|
SAGLIK_VERISI |
Health data |
DINI_INANC |
Religious belief |
SIYASI_GORUS |
Political opinion |
SENDIKA_UYELIGII |
Trade union membership |
BIYOMETRIK_VERI |
Biometric / genetic data |
Requirements
- Python 3.10+
pip install kvkk-pii— no dependencies (regex only)pip install kvkk-pii[ner]—transformers,torch,huggingface-hubpip install kvkk-pii[full]— above +glinerpip install kvkk-pii[server]— above +fastapi,uvicorn
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kvkk_pii-0.1.0.tar.gz.
File metadata
- Download URL: kvkk_pii-0.1.0.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
073a939164b073126a4902788f99bde8c83e141fa8da40819ab1ad1ae073029e
|
|
| MD5 |
86fefa1e816b9ebba2f5c66ecd5be06c
|
|
| BLAKE2b-256 |
e2d74a3e6abe105149e14ff0b223ce3fca899697366ca521a8492f087e213ccf
|
File details
Details for the file kvkk_pii-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kvkk_pii-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d21485832317098f7fa5f52256738bf711146fd2365a7d45692da68c5b96bbc8
|
|
| MD5 |
80f9789a082ee38dad689e30a819df63
|
|
| BLAKE2b-256 |
0e12915a08909930ded8856381e2dbc797f989f53e46586f6790832a43a5ea93
|